<div dir="ltr">Zhan,<div><br></div><div>I haven't updated the "qac" suite of tools for squeue yet. I was focused on getting Rocoto working because all workflows will break without that.</div><div><br></div><div>Sincerely,</div><div>Sam Trahan</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 26 Apr 2019 at 14:07, Zhan Zhang - NOAA Affiliate <<a href="mailto:zhan.zhang@noaa.gov">zhan.zhang@noaa.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-size:small">Sam,</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Please ignore my first question, I made a mistake in my cron job. But "qac" is still very slow.</div><div class="gmail_default" style="font-size:small">Thanks.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">-Zhan<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 26, 2019 at 10:20 AM Zhan Zhang - NOAA Affiliate <<a href="mailto:zhan.zhang@noaa.gov" target="_blank">zhan.zhang@noaa.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small">Sam,</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">I am testing slurm on jets, and encountered two problems:</div><div class="gmail_default" style="font-size:small">1. Each time (except for the initial time) when I submit rocoto job, the following question:</div><div class="gmail_default" style="font-size:small">"Rocoto cycles: <cycledef>201809010000 201809010000 06:00:00</cycledef><br>ALERT! /mnt/lfs3/projects/hwrf-vd/Zhan.Zhang/trunk_slurm/rocoto/hwrf-trunk_slurm-06L-2018090100.xml: XML file exists. Overwrite (y/n)?"</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">I tried both "rocoto/1.3.0-RC4" and "rocoto/1.3.0-RC4-morestates-longtimeout", they all behave the same.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">2. The "qac" command responded very every slow (>30sec) after the system is switched to slurm.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Thanks.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">-Zhan<br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 25, 2019 at 4:56 PM Samuel Trahan - NOAA Affiliate <<a href="mailto:samuel.trahan@noaa.gov" target="_blank">samuel.trahan@noaa.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Raghu,<div><br></div><div>I just submitted a ticket, RDHPCS#2019042554000248</div><div><br></div><div>Sincerely,</div><div>Sam Trahan</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 25 Apr 2019 at 16:52, <<a href="mailto:raghu.reddy@noaa.gov" target="_blank">raghu.reddy@noaa.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div class="gmail-m_-3756240650548881569gmail-m_4062545158730123463gmail-m_8176638089928599711gmail-m_-581372253350263157WordSection1"><p class="MsoNormal">Hi Sam,<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Thank you for this information! <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Can you please let me know what is the exact command that is used by Rocoto that is causing this time out?<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Is it “scontrol show job …”?<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">It will be useful to create stand alone tests (which you may already have).<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Thanks!<u></u><u></u></p><p class="MsoNormal">Raghu<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal"><b>From:</b> Samuel Trahan - NOAA Affiliate <<a href="mailto:samuel.trahan@noaa.gov" target="_blank">samuel.trahan@noaa.gov</a>> <br><b>Sent:</b> Thursday, April 25, 2019 4:39 PM<br><b>To:</b> NCEP.EMC.hwrf <<a href="mailto:NCEP.hwrf@noaa.gov" target="_blank">NCEP.hwrf@noaa.gov</a>>; _Ncep.hmon <<a href="mailto:ncep.hmon@noaa.gov" target="_blank">ncep.hmon@noaa.gov</a>><br><b>Cc:</b> Ghassan Alaka - NOAA Affiliate <<a href="mailto:ghassan.alaka@noaa.gov" target="_blank">ghassan.alaka@noaa.gov</a>>; Guoqing Ge - NOAA Affiliate <<a href="mailto:guoqing.ge@noaa.gov" target="_blank">guoqing.ge@noaa.gov</a>>; Christopher Harrop <<a href="mailto:Christopher.W.Harrop@noaa.gov" target="_blank">Christopher.W.Harrop@noaa.gov</a>>; Raghu Reddy <<a href="mailto:raghu.reddy@noaa.gov" target="_blank">raghu.reddy@noaa.gov</a>><br><b>Subject:</b> Fix for Rocoto's temporarily "unavailable" jobs<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><div><div><p class="MsoNormal">HWRF/HMON people,<u></u><u></u></p><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">Recently, scontrol has sporadically taken longer than Rocoto's built-in limit of 30 seconds to run. That leads to jobs being in an "unavailable" state until scontrol speeds up. I have a modified version of Rocoto that has an 80 second timeout. This fix is on top of the one that detects the "OUT_OF_MEMORY" state jobs.<u></u><u></u></p></div><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">Please let us know if this fixes the problems:<u></u><u></u></p></div><div><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">module use /mnt/lfs3/projects/hwrf-vd/soft/modulefiles/<u></u><u></u></p></div><div><p class="MsoNormal">For RC4: module load rocoto/1.3.0-RC4-morestates-longtimeout<u></u><u></u></p></div><div><p class="MsoNormal">For RC3: module load rocoto/1.3.0-RC3-morestates-longtimeout<u></u><u></u></p></div></div><div><p class="MsoNormal"><u></u> <u></u></p></div></div></div></div></div></blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>