[Ncep.list.fv3-announce] Update: global-workflow/FV3GFS on Hera - cycling is ready!

Kate Friedman - NOAA Federal Kate.Friedman at noaa.gov
Fri Oct 25 17:15:24 UTC 2019


The global-workflow port2hera branch is now ready for cycled experiments on
Hera. Please see the prior email (below) for details about running on Hera.

*Management has asked that users refrain from running high resolution
experiments on Hera due to resource availability. Therefore, please do not
run C768 or higher on Hera unless your supervisor asks you to.* As with
Theia the recommended resolutions are C384 and lower. I have gotten nice
throughput when running C192C96.

There will be some final changes committed to the port2hera branch as it
goes through pre-commit testing on all supported platforms. Any bugs
discovered during his process will be addressed and noted in the
global-workflow Redmine issue for this port. If you plan to run the FV3GFS
on Hera I urge you to become a watcher on the issue to keep up-to-date on
final changes that may impact your run. You can do that on the issue page
by clicking the "Watch" text on the top right with the star next to it:


Please let me know if you have any issues or questions with running the
global-workflow port2hera branch on Hera. Additional announcements will be
sent as the port2hera branch is merged back to the develop branch next
month. Look for announcements about GFSv15.2 before that.

Thanks everyone! Have good weekends! :)

Kate Friedman
NOAA/NWS/NCEP/EMC Engineering and Implementation Branch

On Mon, Oct 21, 2019 at 3:45 PM Kate Friedman - NOAA Federal <
Kate.Friedman at noaa.gov> wrote:

> All,
> *FV3GFS via global-workflow is now ready on Hera for free-forecast mode.*
> Cycled mode is essentially ready as well however I am waiting for a bug fix
> to come back to the GSI master before giving the go-ahead for running
> cycled experiments on Hera. Regardless of whether you are looking to run
> either free-forecast or cycled experiments on Hera please read the entirety
> of this email. Thanks!
> *How do I get setup on Hera?*
> If you have not attended a Hera training session please see the following
> quick-start guide:
> https://docs.google.com/presentation/d/1f2eEMNSyTa1PpXxtDiJgKFdxIf6qLAP3N3x9dVWEwRA/edit?usp=sharing
> *What do I use to run the FV3GFS on Hera?*
> Use the global-workflow "port2hera" branch to run on Hera:
> > git clone gerrit:global-workflow
> > cd global-workflow
> > *git checkout port2hera*
> > cd sorc
> > sh checkout.sh
> > sh build_all.sh
> > sh link_fv3gfs.sh emc *hera*
> To stay up-to-date on final changes for Hera please become a watcher on
> this global-workflow issue:
> https://vlab.ncep.noaa.gov/redmine/issues/67188
> *What resolutions can I run on Hera?*
> The following resolutions have been tested on Hera (all L64): C48, C96,
> C192, C384, C768
> I have not tested 127 layers (L127) but I have been told that it works.
> Please use the appropriate component branches and speak with other GFSv16
> developers who have run with L127 if you wish to run L127.
> *What resolutions should I run on Hera?*
> *As with Theia we urge users to not run high resolution (C768+) unless
> necessary.* Recommended highest resolution is C384. Hera is only a little
> larger than Theia and there is no scrubbing so resource concerns still
> exist sadly. *Please be very cautious of your space utilization on Hera!*
> *What versions of FV3GFS components does port2hera run?*
> The port2hera branch currently runs the latest masters/develops/tags of
> the various FV3GFS system components. Some components are currently the
> master/develop branch but will soon be tags:
>    - NEMSfv3gfs - "gfs.v16_PhysicsUpdate" tag
>    - ProdGSI - master branch (tag coming soon)
>    - UFS_UTILS - develop branch (tag coming soon)
>    - EMC_post - develop branch (tag coming soon)
>    - EMC_gfs_wafs - "gfs_wafs.v5.0.11" tag
>    - EMC_verif-global (METplus) - "verif_global_v1.2.2" tag
> Pieces outside of global-workflow (installed under glopara account):
>    - obsproc
>       - OT-obsproc_prep.v5.2.0-20190614 tag
>       - OT-obsproc_global.v3.2.1-20190613 tag
>       - tracker/genesis - ens_tracker.v1.1.15.1
>    - verification
>       - VSDB
>       - Fit2Obs
> *How do I set up an experiment on Hera?*
> Just like you would on the other platforms with the setup scripts found
> under the ush/rocoto folder. Follow the same setup instructions that you're
> used to, just remember to use "hera" where needed during the process. See
> the setup section of the global-workflow wiki for instructions:
> https://vlab.ncep.noaa.gov/redmine/projects/global-workflow/wiki/Wiki#section-15
> *Is the global dump archive (GDA) on Hera?*
> Yes, a GDA has been established here on Hera:
> DMPDIR = /scratch1/NCEPDEV/global/glopara/dump
> *The Hera GDA currently holds observation dump files for 2017090100 to
> present.* It currently shares the main global space on Hera so I'm
> limiting it to the past two years. I am still working to get a dedicated
> space on Hera for the GDA so it stops using our global space allocation.
> Please be careful about your non-stmp global space usage until I can free
> up those resources.
> The Hera GDA now follows a new directory structure that matches the
> production environment. The global-workflow system has the new structure
> embedded within so no change is needed on the users part. The new format is:
> ${DMPDIR}*/*${CDUMP}${DUMP_SUFFIX}.${PDY}*/*${CYC}*/*${CDUMP}.t${CYC}
> z.$FILE
> ...where DUMP_SUFFIX is empty for production data and a value of either
> nr, p, x, or y for experimental or pre-production dump data.
> Additionally, I have been working over the past months to establish a new
> WCOSS GDA on the Dells that matches this new directory structure. When the
> port2hera branch is merged back to the global-workflow develop branch the
> new WCOSS-Dell GDA will become the primary GDA on WCOSS. The current phase
> 1 GDA will remain until phase 1 and 2 are retired early next year. All of
> this should be seamless to you.
> *When will the Hera changes come back to the global-workflow develop
> branch?*
> Mid-November. After discussions with the FV3GFS implementation team last
> week it was decided to hold the Hera changes out of the global-workflow
> develop branch until the GFSv15.2 implementation has been completed and the
> changes for it can come back to the develop branch. This order of merges is
> to maintain the ability to reproduce operations with a tag of the develop
> branch. The changes incoming for Hera will bring the global-workflow
> develop branch past v15.2 and closer to v16, therefore they have to come
> back after the v15.2 changes.
> Thus the plan is to have you all use the global-workflow port2hera branch
> until it can be merged back to the develop branch next month. The final
> changes for GFSv15.2 should make their way to the develop branch once the
> implementation of v15.2 concludes in early November (currently scheduled
> for November 5th). The next few weeks will allow time for the remaining
> component tags to be created and some compute resource optimization to
> occur, as well as final I-dotting and T-crossing.
> *Troubleshooting on Hera*
> We are still breaking Hera in so do not be surprised if you run into a
> machine hiccup at some point. If you have a job fail that previously ran
> fine please try to rewind it at least once to see if it was a machine issue
> before reaching out to me for troubleshooting help. Definitely take a peek
> at the log for the failed job before doing anything to see if you recognize
> the cause for the failure. If you suspect you're dealing with a machine
> issue greater than a hiccup please contact the Hera helpdesk and provide
> details so they can troubleshoot (rdhpcs.hera.help at noaa.gov).
> As always I'm here to help troubleshoot your FV3GFS runs on supported
> platforms. Please report any issues with the port2hera branch to me and
> become a watcher on the global-workflow Hera port issue to keep up-to-date
> on final updates and possible bug fixes:
> https://vlab.ncep.noaa.gov/redmine/issues/67188
> *Running on WCOSS*
> *Do not use the port2hera branch to run on WCOSS.* Please continue to use
> the develop branch (or your own branches) for your experiments on WCOSS.
> The port2hera branch is undergoing pre-commit testing on WCOSS currently to
> ensure the Hera changes did not break functionality there.
> *Verification*
> The VSDB package is available and has been tested on Hera. It is turned on
> by default in config.vrfy.
> The METplus package is available and has been tested on Hera...however,
> there are significant timing issues as you get further than a few days into
> a run so it is turned off by default in config.vrfy and is not yet
> recommended for use. Work to speed up METplus on both Hera and WCOSS are
> ongoing and should be done in the coming months (if not sooner). A separate
> announcement will be made when it is ready for use.
> *Final notes*
> I will be sending another email once cycling is ready on Hera, should be
> this week barring any issues with the incoming GSI bug fix. Stay tuned!
> Thank you to my beta-testers who ran the system on Hera and helped uncover
> issues!
> Thank you all for your patience! The FV3GFS is a huge and complex system
> that takes time to get tested fully on a new machine so thank you very much
> to all of the code managers and EIB support staff who helped overcome some
> porting issues and get us here!
> Kate Friedman
> NOAA/NWS/NCEP/EMC Engineering and Implementation Branch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.fv3-announce/attachments/20191025/cbcfbc96/attachment-0001.html 

More information about the Ncep.list.fv3-announce mailing list