[Ncep.list.fv3-announce] Update: global-workflow/FV3GFS on Hera - cycling is ready!

Kate Friedman - NOAA Federal Kate.Friedman at noaa.gov
Wed Oct 30 15:00:00 UTC 2019


The port2hera branch is now protected. Let me know if you encounter any
issues after this change.

Kate Friedman
NOAA/NWS/NCEP/EMC Engineering and Implementation Branch

On Wed, Oct 30, 2019 at 10:37 AM Kate Friedman - NOAA Federal <
Kate.Friedman at noaa.gov> wrote:

> All,
> A note of caution...if you're using the port2hera branch on Hera and make
> changes for your experiments *please make sure not to commit anything
> back to it*. Normally we would all be running with the develop branch (or
> tag), which is protected against accidental commits, but for Hera right now
> we're all using a non-protected branch so don't push anything back to it.
> I'm working with the VLab admins to add protections to the port2hera branch
> so in the meantime, be extra careful.
> You're welcome to make your own development branches off of the port2hera
> branch. If you do I recommend becoming a watcher on the Hera port issue so
> you can know if changes are committed to it and do a sync merge of
> port2hera into your branch. That issue is:
> https://vlab.ncep.noaa.gov/redmine/issues/67188
> Thanks all!
> Kate Friedman
> NOAA/NWS/NCEP/EMC Engineering and Implementation Branch
> On Fri, Oct 25, 2019 at 1:15 PM Kate Friedman - NOAA Federal <
> Kate.Friedman at noaa.gov> wrote:
>> All,
>> The global-workflow port2hera branch is now ready for cycled experiments
>> on Hera. Please see the prior email (below) for details about running on
>> Hera.
>> *Management has asked that users refrain from running high resolution
>> experiments on Hera due to resource availability. Therefore, please do not
>> run C768 or higher on Hera unless your supervisor asks you to.* As with
>> Theia the recommended resolutions are C384 and lower. I have gotten nice
>> throughput when running C192C96.
>> There will be some final changes committed to the port2hera branch as it
>> goes through pre-commit testing on all supported platforms. Any bugs
>> discovered during his process will be addressed and noted in the
>> global-workflow Redmine issue for this port. If you plan to run the FV3GFS
>> on Hera I urge you to become a watcher on the issue to keep up-to-date on
>> final changes that may impact your run. You can do that on the issue page
>> by clicking the "Watch" text on the top right with the star next to it:
>> https://vlab.ncep.noaa.gov/redmine/issues/67188
>> Please let me know if you have any issues or questions with running the
>> global-workflow port2hera branch on Hera. Additional announcements will be
>> sent as the port2hera branch is merged back to the develop branch next
>> month. Look for announcements about GFSv15.2 before that.
>> Thanks everyone! Have good weekends! :)
>> Kate Friedman
>> NOAA/NWS/NCEP/EMC Engineering and Implementation Branch
>> On Mon, Oct 21, 2019 at 3:45 PM Kate Friedman - NOAA Federal <
>> Kate.Friedman at noaa.gov> wrote:
>>> All,
>>> *FV3GFS via global-workflow is now ready on Hera for free-forecast mode.*
>>> Cycled mode is essentially ready as well however I am waiting for a bug fix
>>> to come back to the GSI master before giving the go-ahead for running
>>> cycled experiments on Hera. Regardless of whether you are looking to run
>>> either free-forecast or cycled experiments on Hera please read the entirety
>>> of this email. Thanks!
>>> *How do I get setup on Hera?*
>>> If you have not attended a Hera training session please see the
>>> following quick-start guide:
>>> https://docs.google.com/presentation/d/1f2eEMNSyTa1PpXxtDiJgKFdxIf6qLAP3N3x9dVWEwRA/edit?usp=sharing
>>> *What do I use to run the FV3GFS on Hera?*
>>> Use the global-workflow "port2hera" branch to run on Hera:
>>> > git clone gerrit:global-workflow
>>> > cd global-workflow
>>> > *git checkout port2hera*
>>> > cd sorc
>>> > sh checkout.sh
>>> > sh build_all.sh
>>> > sh link_fv3gfs.sh emc *hera*
>>> To stay up-to-date on final changes for Hera please become a watcher on
>>> this global-workflow issue:
>>> https://vlab.ncep.noaa.gov/redmine/issues/67188
>>> *What resolutions can I run on Hera?*
>>> The following resolutions have been tested on Hera (all L64): C48, C96,
>>> C192, C384, C768
>>> I have not tested 127 layers (L127) but I have been told that it works.
>>> Please use the appropriate component branches and speak with other GFSv16
>>> developers who have run with L127 if you wish to run L127.
>>> *What resolutions should I run on Hera?*
>>> *As with Theia we urge users to not run high resolution (C768+) unless
>>> necessary.* Recommended highest resolution is C384. Hera is only a
>>> little larger than Theia and there is no scrubbing so resource concerns
>>> still exist sadly. *Please be very cautious of your space utilization
>>> on Hera!*
>>> *What versions of FV3GFS components does port2hera run?*
>>> The port2hera branch currently runs the latest masters/develops/tags of
>>> the various FV3GFS system components. Some components are currently the
>>> master/develop branch but will soon be tags:
>>>    - NEMSfv3gfs - "gfs.v16_PhysicsUpdate" tag
>>>    - ProdGSI - master branch (tag coming soon)
>>>    - UFS_UTILS - develop branch (tag coming soon)
>>>    - EMC_post - develop branch (tag coming soon)
>>>    - EMC_gfs_wafs - "gfs_wafs.v5.0.11" tag
>>>    - EMC_verif-global (METplus) - "verif_global_v1.2.2" tag
>>> Pieces outside of global-workflow (installed under glopara account):
>>>    - obsproc
>>>       - OT-obsproc_prep.v5.2.0-20190614 tag
>>>       - OT-obsproc_global.v3.2.1-20190613 tag
>>>       - tracker/genesis - ens_tracker.v1.1.15.1
>>>    - verification
>>>       - VSDB
>>>       - Fit2Obs
>>> *How do I set up an experiment on Hera?*
>>> Just like you would on the other platforms with the setup scripts found
>>> under the ush/rocoto folder. Follow the same setup instructions that you're
>>> used to, just remember to use "hera" where needed during the process. See
>>> the setup section of the global-workflow wiki for instructions:
>>> https://vlab.ncep.noaa.gov/redmine/projects/global-workflow/wiki/Wiki#section-15
>>> *Is the global dump archive (GDA) on Hera?*
>>> Yes, a GDA has been established here on Hera:
>>> DMPDIR = /scratch1/NCEPDEV/global/glopara/dump
>>> *The Hera GDA currently holds observation dump files for 2017090100 to
>>> present.* It currently shares the main global space on Hera so I'm
>>> limiting it to the past two years. I am still working to get a dedicated
>>> space on Hera for the GDA so it stops using our global space allocation.
>>> Please be careful about your non-stmp global space usage until I can free
>>> up those resources.
>>> The Hera GDA now follows a new directory structure that matches the
>>> production environment. The global-workflow system has the new structure
>>> embedded within so no change is needed on the users part. The new format is:
>>> ${DMPDIR}*/*${CDUMP}${DUMP_SUFFIX}.${PDY}*/*${CYC}*/*${CDUMP}.t${CYC}
>>> z.$FILE
>>> ...where DUMP_SUFFIX is empty for production data and a value of either
>>> nr, p, x, or y for experimental or pre-production dump data.
>>> Additionally, I have been working over the past months to establish a
>>> new WCOSS GDA on the Dells that matches this new directory structure. When
>>> the port2hera branch is merged back to the global-workflow develop branch
>>> the new WCOSS-Dell GDA will become the primary GDA on WCOSS. The current
>>> phase 1 GDA will remain until phase 1 and 2 are retired early next year.
>>> All of this should be seamless to you.
>>> *When will the Hera changes come back to the global-workflow develop
>>> branch?*
>>> Mid-November. After discussions with the FV3GFS implementation team last
>>> week it was decided to hold the Hera changes out of the global-workflow
>>> develop branch until the GFSv15.2 implementation has been completed and the
>>> changes for it can come back to the develop branch. This order of merges is
>>> to maintain the ability to reproduce operations with a tag of the develop
>>> branch. The changes incoming for Hera will bring the global-workflow
>>> develop branch past v15.2 and closer to v16, therefore they have to come
>>> back after the v15.2 changes.
>>> Thus the plan is to have you all use the global-workflow port2hera
>>> branch until it can be merged back to the develop branch next month. The
>>> final changes for GFSv15.2 should make their way to the develop branch once
>>> the implementation of v15.2 concludes in early November (currently
>>> scheduled for November 5th). The next few weeks will allow time for the
>>> remaining component tags to be created and some compute resource
>>> optimization to occur, as well as final I-dotting and T-crossing.
>>> *Troubleshooting on Hera*
>>> We are still breaking Hera in so do not be surprised if you run into a
>>> machine hiccup at some point. If you have a job fail that previously ran
>>> fine please try to rewind it at least once to see if it was a machine issue
>>> before reaching out to me for troubleshooting help. Definitely take a peek
>>> at the log for the failed job before doing anything to see if you recognize
>>> the cause for the failure. If you suspect you're dealing with a machine
>>> issue greater than a hiccup please contact the Hera helpdesk and provide
>>> details so they can troubleshoot (rdhpcs.hera.help at noaa.gov).
>>> As always I'm here to help troubleshoot your FV3GFS runs on supported
>>> platforms. Please report any issues with the port2hera branch to me and
>>> become a watcher on the global-workflow Hera port issue to keep up-to-date
>>> on final updates and possible bug fixes:
>>> https://vlab.ncep.noaa.gov/redmine/issues/67188
>>> *Running on WCOSS*
>>> *Do not use the port2hera branch to run on WCOSS.* Please continue to
>>> use the develop branch (or your own branches) for your experiments on
>>> WCOSS. The port2hera branch is undergoing pre-commit testing on WCOSS
>>> currently to ensure the Hera changes did not break functionality there.
>>> *Verification*
>>> The VSDB package is available and has been tested on Hera. It is turned
>>> on by default in config.vrfy.
>>> The METplus package is available and has been tested on Hera...however,
>>> there are significant timing issues as you get further than a few days into
>>> a run so it is turned off by default in config.vrfy and is not yet
>>> recommended for use. Work to speed up METplus on both Hera and WCOSS are
>>> ongoing and should be done in the coming months (if not sooner). A separate
>>> announcement will be made when it is ready for use.
>>> *Final notes*
>>> I will be sending another email once cycling is ready on Hera, should be
>>> this week barring any issues with the incoming GSI bug fix. Stay tuned!
>>> Thank you to my beta-testers who ran the system on Hera and helped
>>> uncover issues!
>>> Thank you all for your patience! The FV3GFS is a huge and complex system
>>> that takes time to get tested fully on a new machine so thank you very much
>>> to all of the code managers and EIB support staff who helped overcome some
>>> porting issues and get us here!
>>> Kate Friedman
>>> NOAA/NWS/NCEP/EMC Engineering and Implementation Branch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.fv3-announce/attachments/20191030/f7a41774/attachment-0001.html 

More information about the Ncep.list.fv3-announce mailing list