[Ncep.list.fv3-announce] Update: global-workflow/FV3GFS on Hera

Kate Friedman - NOAA Federal Kate.Friedman at noaa.gov
Mon Oct 21 19:45:53 UTC 2019


All,

*FV3GFS via global-workflow is now ready on Hera for free-forecast mode.*
Cycled mode is essentially ready as well however I am waiting for a bug fix
to come back to the GSI master before giving the go-ahead for running
cycled experiments on Hera. Regardless of whether you are looking to run
either free-forecast or cycled experiments on Hera please read the entirety
of this email. Thanks!

*How do I get setup on Hera?*

If you have not attended a Hera training session please see the following
quick-start guide:

https://docs.google.com/presentation/d/1f2eEMNSyTa1PpXxtDiJgKFdxIf6qLAP3N3x9dVWEwRA/edit?usp=sharing

*What do I use to run the FV3GFS on Hera?*

Use the global-workflow "port2hera" branch to run on Hera:

> git clone gerrit:global-workflow
> cd global-workflow
> *git checkout port2hera*
> cd sorc
> sh checkout.sh
> sh build_all.sh
> sh link_fv3gfs.sh emc *hera*


To stay up-to-date on final changes for Hera please become a watcher on
this global-workflow issue:

https://vlab.ncep.noaa.gov/redmine/issues/67188

*What resolutions can I run on Hera?*

The following resolutions have been tested on Hera (all L64): C48, C96,
C192, C384, C768

I have not tested 127 layers (L127) but I have been told that it works.
Please use the appropriate component branches and speak with other GFSv16
developers who have run with L127 if you wish to run L127.

*What resolutions should I run on Hera?*

*As with Theia we urge users to not run high resolution (C768+) unless
necessary.* Recommended highest resolution is C384. Hera is only a little
larger than Theia and there is no scrubbing so resource concerns still
exist sadly. *Please be very cautious of your space utilization on Hera!*

*What versions of FV3GFS components does port2hera run?*

The port2hera branch currently runs the latest masters/develops/tags of the
various FV3GFS system components. Some components are currently the
master/develop branch but will soon be tags:

   - NEMSfv3gfs - "gfs.v16_PhysicsUpdate" tag
   - ProdGSI - master branch (tag coming soon)
   - UFS_UTILS - develop branch (tag coming soon)
   - EMC_post - develop branch (tag coming soon)
   - EMC_gfs_wafs - "gfs_wafs.v5.0.11" tag
   - EMC_verif-global (METplus) - "verif_global_v1.2.2" tag

Pieces outside of global-workflow (installed under glopara account):

   - obsproc
      - OT-obsproc_prep.v5.2.0-20190614 tag
      - OT-obsproc_global.v3.2.1-20190613 tag
      - tracker/genesis - ens_tracker.v1.1.15.1
   - verification
      - VSDB
      - Fit2Obs

*How do I set up an experiment on Hera?*

Just like you would on the other platforms with the setup scripts found
under the ush/rocoto folder. Follow the same setup instructions that you're
used to, just remember to use "hera" where needed during the process. See
the setup section of the global-workflow wiki for instructions:

https://vlab.ncep.noaa.gov/redmine/projects/global-workflow/wiki/Wiki#section-15

*Is the global dump archive (GDA) on Hera?*

Yes, a GDA has been established here on Hera:

DMPDIR = /scratch1/NCEPDEV/global/glopara/dump


*The Hera GDA currently holds observation dump files for 2017090100 to
present.* It currently shares the main global space on Hera so I'm limiting
it to the past two years. I am still working to get a dedicated space on
Hera for the GDA so it stops using our global space allocation. Please be
careful about your non-stmp global space usage until I can free up those
resources.

The Hera GDA now follows a new directory structure that matches the
production environment. The global-workflow system has the new structure
embedded within so no change is needed on the users part. The new format is:

${DMPDIR}*/*${CDUMP}${DUMP_SUFFIX}.${PDY}*/*${CYC}*/*${CDUMP}.t${CYC}z.$FILE

...where DUMP_SUFFIX is empty for production data and a value of either nr,
p, x, or y for experimental or pre-production dump data.

Additionally, I have been working over the past months to establish a new
WCOSS GDA on the Dells that matches this new directory structure. When the
port2hera branch is merged back to the global-workflow develop branch the
new WCOSS-Dell GDA will become the primary GDA on WCOSS. The current phase
1 GDA will remain until phase 1 and 2 are retired early next year. All of
this should be seamless to you.

*When will the Hera changes come back to the global-workflow develop
branch?*

Mid-November. After discussions with the FV3GFS implementation team last
week it was decided to hold the Hera changes out of the global-workflow
develop branch until the GFSv15.2 implementation has been completed and the
changes for it can come back to the develop branch. This order of merges is
to maintain the ability to reproduce operations with a tag of the develop
branch. The changes incoming for Hera will bring the global-workflow
develop branch past v15.2 and closer to v16, therefore they have to come
back after the v15.2 changes.

Thus the plan is to have you all use the global-workflow port2hera branch
until it can be merged back to the develop branch next month. The final
changes for GFSv15.2 should make their way to the develop branch once the
implementation of v15.2 concludes in early November (currently scheduled
for November 5th). The next few weeks will allow time for the remaining
component tags to be created and some compute resource optimization to
occur, as well as final I-dotting and T-crossing.

*Troubleshooting on Hera*

We are still breaking Hera in so do not be surprised if you run into a
machine hiccup at some point. If you have a job fail that previously ran
fine please try to rewind it at least once to see if it was a machine issue
before reaching out to me for troubleshooting help. Definitely take a peek
at the log for the failed job before doing anything to see if you recognize
the cause for the failure. If you suspect you're dealing with a machine
issue greater than a hiccup please contact the Hera helpdesk and provide
details so they can troubleshoot (rdhpcs.hera.help at noaa.gov).

As always I'm here to help troubleshoot your FV3GFS runs on supported
platforms. Please report any issues with the port2hera branch to me and
become a watcher on the global-workflow Hera port issue to keep up-to-date
on final updates and possible bug fixes:

https://vlab.ncep.noaa.gov/redmine/issues/67188

*Running on WCOSS*

*Do not use the port2hera branch to run on WCOSS.* Please continue to use
the develop branch (or your own branches) for your experiments on WCOSS.
The port2hera branch is undergoing pre-commit testing on WCOSS currently to
ensure the Hera changes did not break functionality there.

*Verification*

The VSDB package is available and has been tested on Hera. It is turned on
by default in config.vrfy.

The METplus package is available and has been tested on Hera...however,
there are significant timing issues as you get further than a few days into
a run so it is turned off by default in config.vrfy and is not yet
recommended for use. Work to speed up METplus on both Hera and WCOSS are
ongoing and should be done in the coming months (if not sooner). A separate
announcement will be made when it is ready for use.

*Final notes*

I will be sending another email once cycling is ready on Hera, should be
this week barring any issues with the incoming GSI bug fix. Stay tuned!

Thank you to my beta-testers who ran the system on Hera and helped uncover
issues!

Thank you all for your patience! The FV3GFS is a huge and complex system
that takes time to get tested fully on a new machine so thank you very much
to all of the code managers and EIB support staff who helped overcome some
porting issues and get us here!

Kate Friedman
NOAA/NWS/NCEP/EMC Engineering and Implementation Branch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.fv3-announce/attachments/20191021/157c1510/attachment.html 


More information about the Ncep.list.fv3-announce mailing list