[Ncep.hmon] Fix for Rocoto's temporarily "unavailable" jobs

Samuel Trahan - NOAA Affiliate samuel.trahan at noaa.gov
Thu Apr 25 20:38:48 UTC 2019


HWRF/HMON people,

Recently, scontrol has sporadically taken longer than Rocoto's built-in
limit of 30 seconds to run.  That leads to jobs being in an "unavailable"
state until scontrol speeds up.  I have a modified version of Rocoto that
has an 80 second timeout.  This fix is on top of the one that detects the
"OUT_OF_MEMORY" state jobs.

Please let us know if this fixes the problems:

module use /mnt/lfs3/projects/hwrf-vd/soft/modulefiles/
For RC4:     module load rocoto/1.3.0-RC4-morestates-longtimeout
For RC3:     module load rocoto/1.3.0-RC3-morestates-longtimeout
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.hmon/attachments/20190425/90454095/attachment.html 


More information about the Ncep.hmon mailing list