[Ncep.list.fv3-announce] Upcoming NEMS Commit

Samuel Trahan - NOAA Affiliate samuel.trahan at noaa.gov
Fri Feb 8 17:23:59 UTC 2019


Hi all,

A NEMS master commit is coming in soon; this is a purely technical one.
The NCEPLIBS-pyprodutil master and Supported apps' masters will be updated
as well.  The relevant branch is called "slurm" in NEMS and NEMSfv3gfs; and
"slurm-v2" in NCEPLIBS-pyprodutil.

1. SLURM support for NEMSfv3gfs app's NEMSCompsetRun on uJet and Theia (see
notes below).  Results match the Moab/Torque baselines.

2. Bug fix from Dusan Jovic to eliminate error messages when cleaning FMS,
and remove one temporary file created during the cleaning process.

3. Major bug fix to the multi-app test system to allow multiple, multi-app,
tests, to happen at the same time.  This bug was causing the nightly test
website to incorrectly report some branch-specific tests people were doing
as the nightly test results.  The change adds a "test id" that is passed
around; the nightly test uses "ngt".


SLURM porting details:


1. From now on, when running NEMSfv3gfs NEMSCompsetRun on Theia, you will
have to specify whether you want a MOAB or SLURM test.  The NEMSCompsetRun
will complain if you don't.

To run with Moab/Torque: NEMSCompsetRun --platform theia.intel ...
To run with SLURM: NEMSCompsetRun --platform theia.slurm.intel ...

Once Moab/Torque are gone, the theia.slurm.intel will be removed, and
theia.intel will use SLURM.

2. On Jet, only uJet has SLURM.  We're expecting parts of xJet to be
SLURMified soon, at which point we can add that target.

3. On Theia, the SLURM is misconfigured to think there are only 12 cores
per node instead of 24 when task geometries are requested.  I've
compensated by telling the nightly tests that there are only 12 cores per
node, which doubles the number of nodes we use.  To avoid pounding the
machine TOO hard, the Theia SLURM "nightly" tests will only run once a
week.  This can be changed once the admins fix the SLURM misconfiguration.

4. For now, we're putting the GAEA SLURM port on hold.  This is because
GAEA's SLURM configuration may be undergoing a major change in the near
future.  Presently it has a very non-standard configuration which would
require extra effort to support.  The new configuration may require very
different extra effort, and we don't want to do that twice.

Sincerely,
Sam Trahan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.fv3-announce/attachments/20190208/456c5345/attachment.html 


More information about the Ncep.list.fv3-announce mailing list