[Ncep.hmon] Fix for Rocoto's temporarily "unavailable" jobs
Samuel Trahan - NOAA Affiliate
samuel.trahan at noaa.gov
Thu Apr 25 20:38:48 UTC 2019
HWRF/HMON people,
Recently, scontrol has sporadically taken longer than Rocoto's built-in
limit of 30 seconds to run. That leads to jobs being in an "unavailable"
state until scontrol speeds up. I have a modified version of Rocoto that
has an 80 second timeout. This fix is on top of the one that detects the
"OUT_OF_MEMORY" state jobs.
Please let us know if this fixes the problems:
module use /mnt/lfs3/projects/hwrf-vd/soft/modulefiles/
For RC4: module load rocoto/1.3.0-RC4-morestates-longtimeout
For RC3: module load rocoto/1.3.0-RC3-morestates-longtimeout
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.hmon/attachments/20190425/90454095/attachment.html
More information about the Ncep.hmon
mailing list