<div dir="ltr"><div><div><div>Lucas,<br><br></div>I have seen that happen when one MPI rank prematurely exits when another rank was sending it information (global or local communication). A segfault or spurious "STOP" command would cause it. If you look through the gfs physics, there are a number of calls to "mpi_finalize" or "stop" when assumptions are violated or errors are detected.<br><br></div>Sincerely,<br></div>Sam Trahan<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 9, 2017 at 2:48 PM, Lucas Harris - NOAA Federal <span dir="ltr"><<a href="mailto:lucas.harris@noaa.gov" target="_blank">lucas.harris@noaa.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi, Moorthi. Sometimes I have seen a model crash due to numerical instability trigger the MPI error, without there actually being a problem with MPI.<span class="HOEnZb"><font color="#888888"><div><br></div><div>Lucas</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 9, 2017 at 1:44 PM, Shrinivas Moorthi <span dir="ltr"><<a href="mailto:shrinivas.moorthi@noaa.gov" target="_blank">shrinivas.moorthi@noaa.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_8178264769986117140m_3906292584064122073moz-cite-prefix">Rusty,<br>
My be the following might help. I was trying to test "k_split"
and "n_split".<br>
The run that crashed had "k_split=1" and "n_split=8".<br>
If I set "n_split=0" (i.e. default), it works fine (code sets it
to 15); in fact it is running right now.<br>
The dt_atmos for this run is 360s.<br>
Thanks<span class="m_8178264769986117140HOEnZb"><font color="#888888"><br>
Moorthi</font></span><div><div class="m_8178264769986117140h5"><br>
On 05/09/2017 02:33 PM, Rusty Benson - NOAA Federal wrote:<br>
</div></div></div><div><div class="m_8178264769986117140h5">
<blockquote type="cite">
<div dir="ltr">Moorthi,
<div><br>
</div>
<div>We have not encountered this error on the gaea Cray system.</div>
<div><br>
</div>
<div>It would be very helpful to have a traceback to see where
this is occurring. There are two ways to do this, the first
is to compile in "repro" mode which is identical to the "prod"
mode with only differences being the addition of "-g
-traceback" and the removal of aggressive prefetching
"-qopt-prefetch=3".</div>
<div><br>
</div>
<div>The other choice is to temporarily add "-g -traceback" to
FFLAGS_OPT in conf/configure.*</div>
<div><br>
</div>
<div>Once you run this and provide a traceback, there is more of
a chance to debug and understand the cause.</div>
<div><br>
</div>
<div>I know you are running with different physical
parameterizations which may have a number of condensate
species for which the logic in the dycore is not yet generic
enough to handle (2, 3, and 6 are fully supported). I am in
the process of working through the logic with the FV3 team to
ensure support for generic numbers of condensate species. The
areas needing logic enhancements have been identified and
mostly rectified. I am slowly learning the EMC procedures for
svn and the trac issue reporting system.</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="m_8178264769986117140m_3906292584064122073gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div style="font-size:12.8px">Rusty</div>
<div style="font-size:12.8px">--</div>
<div style="font-size:12.8px">Rusty Benson, PhD</div>
<div style="font-size:12.8px">Modeling Systems
Group</div>
<div style="font-size:12.8px">NOAA Geophysical
Fluid Dynamics Lab</div>
<div style="font-size:12.8px">Princeton, NJ</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">On Tue, May 9, 2017 at 2:16 PM,
Shrinivas Moorthi <span dir="ltr"><<a href="mailto:shrinivas.moorthi@noaa.gov" target="_blank">shrinivas.moorthi@noaa.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919moz-cite-prefix">Not true
Jun, my successful runs have the same "aprun".<br>
This is something GFDL folks can hopefully answer why
this can happen.<br>
Thanks<br>
Moorhti<br>
On 05/09/2017 12:56 PM, Jun Wang - NOAA Affiliate wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Moorthi,<br>
<br>
</div>
I just got this error yesterday. Please see the job
submit line, you should have "aprun... executable",
without aprun (cray) you will get that error. <br>
<br>
</div>
Jun<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 9, 2017 at 12:54
PM, Shrinivas Moorthi <span dir="ltr"><<a href="mailto:shrinivas.moorthi@noaa.gov" target="_blank">shrinivas.moorthi@noaa.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015moz-cite-prefix">Does
anyone know what might cause the following
error right off the bat?<br>
"FATAL from PE 92: mpp_sync_self: size_recv
does not match of data received<br>
<br>
Rank 145 [Tue May 9 16:38:33 2017]
[c7-0c2s6n0] Fatal error in PMPI_Wait: Message
truncated, error stack:<br>
PMPI_Wait(186)................<wbr>...:
MPI_Wait(request=0xbf3e7c4, status=0xb30ac68)
failed<br>
MPIR_Wait_impl(79)............<wbr>...:<br>
MPID_nem_gni_lmt_start_recv(15<wbr>80):
Message from rank 89 and tag 2 truncated;
33528 bytes received but buffer size is 5880"<br>
Thanks<br>
Moorthi
<div>
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919h5"><br>
On 05/09/2017 11:48 AM, Dusan Jovic wrote:<br>
</div>
</div>
</div>
<div>
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919h5">
<blockquote type="cite">
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015moz-cite-prefix">Jun,<br>
<br>
Here's what I get when I try to checkout
that branch:<br>
<br>
slogin2:/gpfs/hps/emc/meso/sav<wbr>e/Dusan.Jovic/FV3>
<b>svn co <a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/b<wbr>ranches/fv3.v0release</a></b><br>
<p>A fv3.v0release/parm<br>
A fv3.v0release/parm/model_confi<wbr>gure.IN<br>
</p>
<p>. . . .</p>
<p>A fv3.v0release/compsets/fv3.inp<wbr>ut<br>
A fv3.v0release/standaloneFV3.ap<wbr>pBuilder<br>
U fv3.v0release<br>
<br>
Fetching external item into '<b>fv3.v0release/NEMS</b>':<br>
A fv3.v0release/NEMS/exe<br>
A fv3.v0release/NEMS/exe/mkDepen<a href="http://ds.pl" target="_blank"><wbr>ds.pl</a><br>
</p>
<p> . . . . <br>
</p>
<p>A fv3.v0release/NEMS/NEMSAppBuil<wbr>der<br>
A fv3.v0release/NEMS/OldCompsetR<wbr>un<br>
U fv3.v0release/NEMS<br>
<br>
Fetching external item into '<b>fv3.v0release/NEMS/tests/prod<wbr>uti</b>l':<br>
A fv3.v0release/NEMS/tests/produ<wbr>til/ush<br>
A fv3.v0release/NEMS/tests/produ<wbr>til/ush/testgen.py<br>
. . . .</p>
<p>A fv3.v0release/NEMS/tests/produ<wbr>til/ush/forcetest.py<br>
A fv3.v0release/NEMS/tests/produ<wbr>til/ush/testme.py<br>
Checked out external at revision
90581.<br>
<br>
Checked out revision 90943.<br>
<font color="#ff0000">svn: warning:
W200000: Error handling externals
definition for '<b>fv3.v0release/FV3</b>':<br>
svn: warning: W175013: Unable to
connect to a repository at URL '<b><a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/fv3/trunk" target="_blank">https://svnemc.ncep.noaa.gov/<wbr>projects/fv3/trunk</a></b>'<br>
</font>Checked out revision 92536.</p>
<p>Again, is this somehow related to
last week's problems with the svn
server?</p>
<p>Dusan<br>
</p>
<br>
On 05/09/2017 11:22 AM, Jun Wang - NOAA
Affiliate wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Dusan,<br>
<br>
</div>
Thanks for looking at the release
readme file! The readme file is
prepared for the targeted release,
which is a final nemsfv3gfs tag,
currently it does not exist yet.
Please check out the current
branch instead:<br>
<br>
<b><a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015gmail-m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release/release/v0/readme.txt" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/b<wbr>ranches/fv3.v0release</a><br>
<br>
</b></div>
then following the readme file to
see if you can run the code on theia
and wcoss, please let me know if you
find any mistakes/errors. Thank you
very much for the first person for
beta testing!<br>
<br>
<br>
</div>
Jun<b><br>
</b></div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May
9, 2017 at 11:10 AM, Dusan Jovic <span dir="ltr"><<a href="mailto:dusan.jovic@noaa.gov" target="_blank">dusan.jovic@noaa.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-cite-prefix">Is
this readme.txt file the
latest version available?<br>
<br>
<b><a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release/release/v0/readme.txt" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/b<wbr>ranches/fv3.v0release/release/<wbr>v0/readme.txt</a></b><b><br>
</b><br>
If it is then the URL of the
v0 repository is either wrong
or I am not allowed to check
it out (is this related to
recent issues with our svn
server?)<br>
<br>
<pre>1. check out release version at:
<a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/t<wbr>ags/fv3_release.v0</a>,
%svn co <a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/t<wbr>ags/fv3_release.v0</a>
%cd fv3_release.v0
svn co <a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/t<wbr>ags/fv3_release.v0</a>
svn: E170000: URL '<a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0" target="_blank">https://svnemc.ncep.noaa.gov/<wbr>projects/nems/apps/NEMSfv3gfs/<wbr>tags/fv3_release.v0</a>' doesn't exist
</pre><span class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015HOEnZb"><font color="#888888">
Dusan</font></span><div><div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015h5">
On 05/03/2017 03:45 PM, FV3GFS Trac Ticket wrote:
</div></div></div><div><div class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015h5">
<blockquote type="cite">
<pre>#15: FV3 May 15 Release
------------------------------<wbr>+------------------------
Reporter: samuel.trahan@… | Owner: somebody
Type: task | Status: new
Priority: critical | Milestone: milestone1
Component: component1 | Version:
Resolution: | Keywords: release
------------------------------<wbr>+------------------------
Comment (by jun.wang@…):
Following Vijay's suggestion, a nemsfv3 branch is created at:
<b><a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-freetext" href="https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release" target="_blank">https://svnemc.ncep.noaa.gov/p<wbr>rojects/nems/apps/NEMSfv3gfs/b<wbr>ranches/fv3.v0release</a></b>
This branch will contain required script and instruction to run the test
cases Fanglin set up.
A new directory release/v0 is added under fv3.v0release, following files
are copied from Fanglin's fv3gfs branch: FV3GFS_V0_RELEASE:
exec exp modulefiles parm readme.txt scripts sorc ush
where:
exec: hold fregrid executables
exp: contains build.sh and runjob_cray.sh, runjob_theia.sh, run script to
start the experiment
modulefiles : contain fregrid model files
readme.txt: instructions on running release cases
scripts: exglobal_fcst_nemsfv3gfs.sh forecast scripts to set up
namelist,etc.
sorc: contain fregrid source code
ush: utility script to run remapping
A new script runjob_theia.sh is created and run the test cases on theia.
--
Ticket URL: <a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-rfc2396E" href="https://svnemc.ncep.noaa.gov/trac/fv3gfs/ticket/15#comment:2" target="_blank"><https://svnemc.ncep.noaa.gov/<wbr>trac/fv3gfs/ticket/15#comment:<wbr>2></a>
fv3gfs <a class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015m_6373680597087819793moz-txt-link-rfc2396E" href="https://svnemc.ncep.noaa.gov/trac/fv3gfs" target="_blank"><https://svnemc.ncep.noaa.gov/<wbr>trac/fv3gfs></a>
NGGPS FV3GFS Development
</pre><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></blockquote><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
<p>
</p>
</font></span></div></div></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></blockquote></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></blockquote><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888"><p>
</p>
</font></span></blockquote><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
<p>
</p></font></span></div></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888"><span class="m_8178264769986117140m_3906292584064122073m_3734193775566884919HOEnZb"><font color="#888888"><pre class="m_8178264769986117140m_3906292584064122073m_3734193775566884919m_-7061746746540679015moz-signature" cols="72">--
Dr. Shrinivas Moorthi
Research Meteorologist
Global Climate and Weather Modeling Branch
Environmental Modeling Center / National Centers for Environmental Prediction
5830 University Research Court - (W/NP23), College Park MD 20740 USA
Tel:<a href="tel:%28301%29%20683-3718" value="+13016833718" target="_blank">(301)683-3718</a></pre></font></span></font></span></div></blockquote></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></div><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
</font></span></blockquote><span class="m_8178264769986117140m_3906292584064122073HOEnZb"><font color="#888888">
<p>
</p><pre class="m_8178264769986117140m_3906292584064122073m_3734193775566884919moz-signature" cols="72">--
Dr. Shrinivas Moorthi
Research Meteorologist
Global Climate and Weather Modeling Branch
Environmental Modeling Center / National Centers for Environmental Prediction
5830 University Research Court - (W/NP23), College Park MD 20740 USA
Tel:<a href="tel:%28301%29%20683-3718" value="+13016833718" target="_blank">(301)683-3718</a></pre></font></span></div></blockquote></div>
</div>
</blockquote>
<p>
</p><pre class="m_8178264769986117140m_3906292584064122073moz-signature" cols="72">--
Dr. Shrinivas Moorthi
Research Meteorologist
Global Climate and Weather Modeling Branch
Environmental Modeling Center / National Centers for Environmental Prediction
5830 University Research Court - (W/NP23), College Park MD 20740 USA
Tel:<a href="tel:(301)%20683-3718" value="+13016833718" target="_blank">(301)683-3718</a></pre></div></div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>