[Ncep.list.emc.fv3gfs_tickets] FV3GFS Ticket #15: FV3 May 15 Release

Samuel Trahan - NOAA Affiliate samuel.trahan at noaa.gov
Tue May 9 18:47:27 UTC 2017


Moorthi,

Your error is different than Jun's.  I've seen it in code that has
point-to-point communication (mpi_send, mpi_recv) but with send and receive
buffers that have mismatched sizes.

Sincerely,
Sam Trahan

On Tue, May 9, 2017 at 2:44 PM, Shrinivas Moorthi <
shrinivas.moorthi at noaa.gov> wrote:

> Rusty,
>    My be the following might help.  I was trying to test "k_split" and
> "n_split".
> The run that crashed had "k_split=1" and "n_split=8".
> If I set "n_split=0" (i.e. default), it works fine (code sets it to 15);
> in fact it is running right now.
> The dt_atmos for this run  is 360s.
> Thanks
> Moorthi
>
> On 05/09/2017 02:33 PM, Rusty Benson - NOAA Federal wrote:
>
> Moorthi,
>
> We have not encountered this error on the gaea Cray system.
>
> It would be very helpful to have a traceback to see where this is
> occurring.  There are two ways to do this, the first is to compile in
> "repro" mode which is identical to the "prod" mode with only differences
> being the addition of "-g -traceback" and the removal of aggressive
> prefetching "-qopt-prefetch=3".
>
> The other choice is to temporarily add "-g -traceback" to FFLAGS_OPT in
> conf/configure.*
>
> Once you run this and provide a traceback, there is more of a chance to
> debug and understand the cause.
>
> I know you are running with different physical parameterizations which may
> have a number of condensate species for which the logic in the dycore is
> not yet generic enough to handle (2, 3, and 6 are fully supported).  I am
> in the process of working through the logic with the FV3 team to ensure
> support for generic numbers of condensate species.  The areas needing logic
> enhancements have been identified and mostly rectified.  I am slowly
> learning the EMC procedures for svn and the trac issue reporting system.
>
> Rusty
> --
> Rusty Benson, PhD
> Modeling Systems Group
> NOAA Geophysical Fluid Dynamics Lab
> Princeton, NJ
>
> On Tue, May 9, 2017 at 2:16 PM, Shrinivas Moorthi <
> shrinivas.moorthi at noaa.gov> wrote:
>
>> Not true Jun, my successful runs have the same "aprun".
>> This is something GFDL folks can hopefully answer why this can happen.
>> Thanks
>> Moorhti
>> On 05/09/2017 12:56 PM, Jun Wang - NOAA Affiliate wrote:
>>
>> Moorthi,
>>
>> I just got this error yesterday. Please see the job submit line, you
>> should have "aprun... executable", without aprun (cray) you will get that
>> error.
>>
>> Jun
>>
>> On Tue, May 9, 2017 at 12:54 PM, Shrinivas Moorthi <
>> shrinivas.moorthi at noaa.gov> wrote:
>>
>>> Does anyone know what might cause the following error right off the bat?
>>> "FATAL from PE    92: mpp_sync_self: size_recv does not match of data
>>> received
>>>
>>> Rank 145 [Tue May  9 16:38:33 2017] [c7-0c2s6n0] Fatal error in
>>> PMPI_Wait: Message truncated, error stack:
>>> PMPI_Wait(186)...................: MPI_Wait(request=0xbf3e7c4,
>>> status=0xb30ac68) failed
>>> MPIR_Wait_impl(79)...............:
>>> MPID_nem_gni_lmt_start_recv(1580): Message from rank 89 and tag 2
>>> truncated; 33528 bytes received but buffer size is 5880"
>>> Thanks
>>> Moorthi
>>>
>>> On 05/09/2017 11:48 AM, Dusan Jovic wrote:
>>>
>>> Jun,
>>>
>>> Here's what I get when I try to checkout that branch:
>>>
>>> slogin2:/gpfs/hps/emc/meso/save/Dusan.Jovic/FV3> *svn co
>>> https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release
>>> <https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release>*
>>>
>>> A    fv3.v0release/parm
>>> A    fv3.v0release/parm/model_configure.IN
>>>
>>> . . . .
>>>
>>> A    fv3.v0release/compsets/fv3.input
>>> A    fv3.v0release/standaloneFV3.appBuilder
>>>  U   fv3.v0release
>>>
>>> Fetching external item into '*fv3.v0release/NEMS*':
>>> A    fv3.v0release/NEMS/exe
>>> A    fv3.v0release/NEMS/exe/mkDepends.pl
>>>
>>>  . . . .
>>>
>>> A    fv3.v0release/NEMS/NEMSAppBuilder
>>> A    fv3.v0release/NEMS/OldCompsetRun
>>>  U   fv3.v0release/NEMS
>>>
>>> Fetching external item into '*fv3.v0release/NEMS/tests/produti*l':
>>> A    fv3.v0release/NEMS/tests/produtil/ush
>>> A    fv3.v0release/NEMS/tests/produtil/ush/testgen.py
>>> . . . .
>>>
>>> A    fv3.v0release/NEMS/tests/produtil/ush/forcetest.py
>>> A    fv3.v0release/NEMS/tests/produtil/ush/testme.py
>>> Checked out external at revision 90581.
>>>
>>> Checked out revision 90943.
>>> svn: warning: W200000: Error handling externals definition for '
>>> *fv3.v0release/FV3*':
>>> svn: warning: W175013: Unable to connect to a repository at URL '*https://svnemc.ncep.noaa.gov/projects/fv3/trunk
>>> <https://svnemc.ncep.noaa.gov/projects/fv3/trunk>*'
>>> Checked out revision 92536.
>>>
>>> Again, is this somehow related to last week's problems with the svn
>>> server?
>>>
>>> Dusan
>>>
>>> On 05/09/2017 11:22 AM, Jun Wang - NOAA Affiliate wrote:
>>>
>>> Dusan,
>>>
>>> Thanks for looking at the release readme file! The readme file is
>>> prepared for the targeted release, which is a  final nemsfv3gfs tag,
>>> currently it does not exist yet. Please check out the current branch
>>> instead:
>>>
>>>
>>>
>>> *https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release
>>> <https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release/release/v0/readme.txt>
>>> *
>>> then following the readme file to see if you can run the code on theia
>>> and wcoss, please let me know if you find any mistakes/errors. Thank you
>>> very much for the first person for beta testing!
>>>
>>>
>>> Jun
>>>
>>> On Tue, May 9, 2017 at 11:10 AM, Dusan Jovic <dusan.jovic at noaa.gov>
>>> wrote:
>>>
>>>> Is this readme.txt file the latest version available?
>>>>
>>>> *https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release/release/v0/readme.txt
>>>> <https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release/release/v0/readme.txt>*
>>>>
>>>> If it is then the URL of the v0 repository is either wrong or I am not
>>>> allowed to check it out (is this related to recent issues with our svn
>>>> server?)
>>>>
>>>> 1.  check out release version at:
>>>>       https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0,
>>>>
>>>>     %svn co https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0
>>>>     %cd fv3_release.v0
>>>>
>>>>
>>>> svn co https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0
>>>> svn: E170000: URL 'https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/tags/fv3_release.v0' doesn't exist
>>>>
>>>>
>>>> Dusan
>>>> On 05/03/2017 03:45 PM, FV3GFS Trac Ticket wrote:
>>>>
>>>> #15: FV3 May 15 Release
>>>> ------------------------------+------------------------
>>>>   Reporter:  samuel.trahan@…  |      Owner:  somebody
>>>>       Type:  task             |     Status:  new
>>>>   Priority:  critical         |  Milestone:  milestone1
>>>>  Component:  component1       |    Version:
>>>> Resolution:                   |   Keywords:  release
>>>> ------------------------------+------------------------
>>>>
>>>> Comment (by jun.wang@…):
>>>>
>>>>  Following Vijay's suggestion, a nemsfv3 branch is created at:
>>>>
>>>>  *https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release <https://svnemc.ncep.noaa.gov/projects/nems/apps/NEMSfv3gfs/branches/fv3.v0release>*
>>>>
>>>>  This branch will contain required script and instruction to run the test
>>>>  cases Fanglin set up.
>>>>
>>>>  A new directory release/v0 is added under fv3.v0release, following files
>>>>  are copied from Fanglin's fv3gfs branch: FV3GFS_V0_RELEASE:
>>>>
>>>>  exec  exp  modulefiles  parm  readme.txt  scripts  sorc  ush
>>>>
>>>>  where:
>>>>  exec: hold fregrid executables
>>>>  exp: contains build.sh and runjob_cray.sh, runjob_theia.sh, run script to
>>>>  start the experiment
>>>>  modulefiles : contain fregrid model files
>>>>  readme.txt: instructions on running release cases
>>>>  scripts: exglobal_fcst_nemsfv3gfs.sh forecast scripts to set up
>>>>  namelist,etc.
>>>>  sorc: contain fregrid source code
>>>>  ush: utility script to run remapping
>>>>
>>>>  A new script runjob_theia.sh is created and run the test cases on theia.
>>>>
>>>> --
>>>> Ticket URL: <https://svnemc.ncep.noaa.gov/trac/fv3gfs/ticket/15#comment:2> <https://svnemc.ncep.noaa.gov/trac/fv3gfs/ticket/15#comment:2>
>>>> fv3gfs <https://svnemc.ncep.noaa.gov/trac/fv3gfs> <https://svnemc.ncep.noaa.gov/trac/fv3gfs>
>>>> NGGPS FV3GFS Development
>>>>
>>>> --
>>> Dr. Shrinivas Moorthi
>>> Research Meteorologist
>>> Global Climate and Weather Modeling Branch
>>> Environmental Modeling Center / National Centers for Environmental Prediction
>>> 5830 University Research Court - (W/NP23), College Park MD 20740 USA
>>> Tel:(301)683-3718 <%28301%29%20683-3718>
>>>
>>> --
>> Dr. Shrinivas Moorthi
>> Research Meteorologist
>> Global Climate and Weather Modeling Branch
>> Environmental Modeling Center / National Centers for Environmental Prediction
>> 5830 University Research Court - (W/NP23), College Park MD 20740 USA
>> Tel:(301)683-3718 <%28301%29%20683-3718>
>>
>> --
> Dr. Shrinivas Moorthi
> Research Meteorologist
> Global Climate and Weather Modeling Branch
> Environmental Modeling Center / National Centers for Environmental Prediction
> 5830 University Research Court - (W/NP23), College Park MD 20740 USA
> Tel:(301)683-3718
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.emc.fv3gfs_tickets/attachments/20170509/4fc6317e/attachment-0001.html 


More information about the Ncep.list.emc.fv3gfs_tickets mailing list