[Ncep.hmon] Running Complex Jobs Under Slurm

Samuel Trahan - NOAA Affiliate samuel.trahan at noaa.gov
Thu Apr 18 15:26:46 UTC 2019


Zhan,

I'm going to start another email chain with fewer people to discuss this.
Most of the people on this email chain don't develop pyprodutil.  It is
sufficient for them to know that a solution exists, that is being
resurrected.

Sincerely,
Sam Trahan

On Thu, 18 Apr 2019 at 11:24, Zhan Zhang - NOAA Affiliate <
zhan.zhang at noaa.gov> wrote:

> Sam,
>
> Could you point me where is the version of  produtil that supports slurm
> with --multi-prog?
> Thanks.
>
> -Zhan
>
> On Thu, Apr 18, 2019 at 10:59 AM Samuel Trahan - NOAA Affiliate <
> samuel.trahan at noaa.gov> wrote:
>
>> Hi all,
>>
>> A few months ago, the pyprodutil package supported the --multi-prog
>> option that the admins are now requesting we use (look inside their two
>> perl scripts).  We deliberately removed --multi-prog support from
>> pyprodutil and replaced it with pack groups, since the admins requested
>> that.  Now that the admins found out that pack groups don't work, they're
>> asking us to use --multi-prog again.  It may be easiest to resurrect the
>> old slurm support code.
>>
>> Sincerely,
>> Sam Trahan
>>
>> On Wed, 17 Apr 2019 at 19:56, Avichal Mehra - NOAA Federal <
>> avichal.mehra at noaa.gov> wrote:
>>
>>> FYI.
>>>
>>> ---------- Forwarded message ---------
>>> From: Leslie Hart - NOAA Federal <leslie.b.hart at noaa.gov>
>>> Date: Wed, Apr 17, 2019 at 7:23 PM
>>> Subject: Running Complex Jobs Under Slurm
>>> To: _OAR RDHPCS theia-notify <rdhpcs.theia.notify at noaa.gov>
>>>
>>>
>>> Dear Users,
>>>
>>> We consider a complex job to be any job that requires more than one
>>> executable within a single MPI execution, a job that requires a varying
>>> number of tasks per node or a combination of both. During the Quick Start
>>> Training, we suggested using Heterogeneous Job submission to create a
>>> complex job request. After working with this method for a while, we believe
>>> this is the wrong approach. We now have another recommended method for
>>> working with complex jobs under Slurm.
>>>
>>> We have created two scripts in the /contrib area of Jet and Theia. (We
>>> believe these methods will also function well on Gaea, but have only done
>>> limited testing there.) One is called arbitrary.pl which allows for
>>> varying numbers of tasks per node. The other is layout.pl which allows
>>> for multiple executables within a single MPI execution. They are accessible
>>> by "module load contrib sutils".
>>>
>>> We have updated the April training slides to have examples of various
>>> situations that a user may encounter. The updated material starts around
>>> slide 34. The new slide deck is available at
>>> https://docs.google.com/presentation/d/1OhGP1j7Irx61iqDq0jagTWCMRXIGgnUiwNrGfk9NMmM/edit?usp=sharing.
>>> (These slides are still under development but are pretty close to final.)
>>>
>>> We will have a short (approximately 30 minute) training session on
>>> Wednesday, April 24th at 11AM EDT that just discusses these updates.
>>> Details regarding location and webinar information will follow in the next
>>> few days.  In early- to mid-May we will repeat the entire quick start
>>> training session.
>>>
>>> Thanks,
>>> Leslie Hart & Raghu Reddy (and many others)
>>>
>>>
>>>
>>> --
>>>     Dr. Avichal Mehra
>>> Avichal.Mehra at noaa.gov
>>>     Lead Physical Scientist                      NOAA/NWS/NCEP/EMC
>>>     5830 University Research Court           Room 2104
>>>     College Park                                      Ph.   301-683-3746
>>>     MD 20740                                          Fax: 301-683-3703
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.hmon/attachments/20190418/420efb24/attachment.html 


More information about the Ncep.hmon mailing list