[Ncep.hmon] Fix for Rocoto's temporarily "unavailable" jobs
    Samuel Trahan - NOAA Affiliate 
    samuel.trahan at noaa.gov
       
    Thu Apr 25 20:38:48 UTC 2019
    
    
  
HWRF/HMON people,
Recently, scontrol has sporadically taken longer than Rocoto's built-in
limit of 30 seconds to run.  That leads to jobs being in an "unavailable"
state until scontrol speeds up.  I have a modified version of Rocoto that
has an 80 second timeout.  This fix is on top of the one that detects the
"OUT_OF_MEMORY" state jobs.
Please let us know if this fixes the problems:
module use /mnt/lfs3/projects/hwrf-vd/soft/modulefiles/
For RC4:     module load rocoto/1.3.0-RC4-morestates-longtimeout
For RC3:     module load rocoto/1.3.0-RC3-morestates-longtimeout
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.hmon/attachments/20190425/90454095/attachment.html 
    
    
More information about the Ncep.hmon
mailing list