[Ncep.list.nems.announce] Proposed NEMS commit for Performance Improvements to RRTM-G

John Michalakes john.michalakes at noaa.gov
Mon Jul 21 17:39:36 UTC 2014


Hello Developers,
 
Following up with some additional information on the proposed RRTM-G
performance changes:
 
1.       I've generated a series of difference plots (attached) for output
for the GFS_32 case, comparing 03h, 12h, 24h, and 48h output from the
top-of-repository code with the output from the code with the proposed
updates to RRTMG and with the -fno-alias changes (see #3 below). Differences
are small compared with the magnitudes of the fields, but there is also some
structure to the differences that is visible along the day-night boundary.
2.       Contrary to the information in my original email, on WCOSS the GFS
code is also faster with the RRTM changes (both NMMB and GFS now use the
same underlying RRTM LW and SW routines).  The earlier results were gathered
on Zeus with older "Westmere" processors.  On WCOSS, with Xeon Sandybridges
and AVX vector instructions, the RRTM mods improve performance for the
GFS_16_32 regtest by 8.6%, from 30 sec to 27.4 sec.  
3.       I tested the effect of compiling NEMS with the -fno-alias flag.
This improves both the NMMB and GFS times, even without the RRTMG changes
that I'm proposing.  
a.       For the GFS_16_32 regression test (32 threads on 16 cores on
WCOSS):
                                                               i.      Top
of Trunk compiled without the -fno-alias flag:            34.2s
                                                             ii.      Top of
Trunk compiled with the -fno-alias flag:                   30.0s
                                                            iii.      Top of
Trunk with RRTM changes and -fno-alias:                27.4s
b.      The reason that -fno-alias improves performance is because it allows
the Intel Compiler (ifort 12.1.5 on Tide) to recognize and vectorize 260
loops across 36 source files that it could not vectorize before because it
couldn't assume there was no address aliasing going on.  The -fno-alias
option asserts that there is none of this going on and lets the compiler
vectorize.  The list of source files with loops that -fno-alias allows to
vectorize is appended, below.
c.       I would like to also update the configure.nems files to add this
-fno-alias option
 
Please review and let me know if you have any concerns about this proposed
commit.  
 
Thanks,
 
John 
 
--- List of source files with loops that vectorize with -fno-alias ---
 
module_BL_MYJPBL.f90
module_MP_ETANEW.f90
module_MP_FER_HIRES.f90
module_MP_GFS.f90
module_RA_RRTM.f90
module_SF_JSFC.f90
idea_o2_o3.f
sfcsub.f
radlw_main_nmmb.f
radsw_main_nmmb.f
module_CONTROL.f90
module_DIGITAL_FILTER_NMM.f90
module_FLTBNDS.f90
module_DYNAMICS_ROUTINES.f90
module_TURBULENCE.f90
module_MICROPHYSICS.f90
module_RADIATION.f90
module_NESTING.f90
module_SOLVER_GRID_COMP.f90
module_WRITE_ROUTINES.f90
module_DOMAIN_GRID_COMP.f90
module_PARENT_CHILD_CPL_COMP.f90
module_NMM_GRID_COMP.f90
module_WRITE_ROUTINES_GFS.f90
do_dynamics_mod.f90
sigio_module.f90
grid_collect.f90
slgscan_all_redgg.f90
do_dynamics_slg_loop.f90
do_dynamics_one_loop.f90
twriteg_rst.f90
grid_to_spect_inp.f90
spect_to_grid_inp.f90
gloopb.f90
wrtout_physics.f90
ENS_Sto_Per_Scheme_Step2.f90
 
 
 
From: John Michalakes [mailto:john.michalakes at noaa.gov] 
Sent: Monday, July 14, 2014 5:15 PM
To: 'ncep.list.nems.announce at lstsrv.ncep.noaa.gov'
Subject: Proposed NEMS commit for Performance Improvements to RRTM-G
 
Hi all,
 
I would like to start the process for obtaining approval to commit
performance related changes to the RRTMG in the NEMS trunk.
 
Performance improvement:
 
Performance improvement on Zeus for then NMM_CNTRL workload:
 
                Original (revision 42943)  :    93.5s      (0.794s)
 
                  With optimized RRTMG:     89.2s      (0.739s)
 
The first times is the time to run the atmospheric component from beginning
to end (that is, the time spent in the ATM_RUN subroutine).  In parentheses
is the time per radiation call averaged over the 1125 calls (25 calls on 45
MPI task).
 
Performance improvement is configuration and workload dependent and will be
the subject of ongoing work. With regard to the GFS cases, the RRTM changes
degrade performance a little bit, but assuming output is okay, I'm
requesting approval to commit the changes into the trunk now and then work
more on improving performance afterwards.  (I've been spending a lot of time
and energy keeping up with changes to the trunk).
 
 
Verification:  
 
The attached pdf file is a small sampling of "old-new" grads difference
plots at 03, 12, and 48 hours.  There are differences but generally
"snow-like" (more or less random) especially at the 03hr time period.
 
Since the new output does not agree bit-for-bit, I modified the regression
test script to report the difference and continue on through the next tests.
All 13 tests run to completion:
 
GFS_16_1_dfi
GFS_16_32
GFS_32
GFS_GOCART_NEMSIO
NMM_CNTRL
NMM_DECOMP
NMM_nest_rest
NMM_REG_gfsP
NMM_REG_NEMSIO
NMM_REG_RST
NMM_REST_NIO
NMM_THREAD
WAM_gh_l150
 
but at this point the only output I've looked at is the NMM_CNTRL test for
NMM.
 
I would like to also do difference plots for the GFS cases but I'm not sure
how to do that since the output isn't grads (as far as I can tell).  I'm
looking for suggestions, please.
 
 
Code location and summary of changes:  
 
The code is my home directory on Zeus:
 
/home/John.Michalakes/jm_proposal_20140715 
 
and is relative to Revision: 42438  (thus, a little bit behind the top of
the trunk right now, which is at Revision: 42943, but that's just Weiyu's
new changes to the regression scripts).  
 
A summary of the modified files is here:
 
M       src/atmos/nmm/module_DIAGNOSE.F90
M       src/atmos/nmm/module_RADIATION.F90
M       src/atmos/nmm/module_SOLVER_GRID_COMP.F90
M       src/atmos/phys/grrad.f
M       src/atmos/phys/grrad_nmmb.f
M       src/atmos/phys/machine.f
M       src/atmos/phys/makefile
M       src/atmos/phys/module_RA_RRTM.F90
M       src/atmos/phys/radiation_aerosols_nmmb.f
M       src/atmos/phys/radlw_main.f
M       src/atmos/phys/radsw_main.f
M       src/conf/configure.nems.Gaea.intel
M       src/conf/configure.nems.Gaea.pgi
M       src/conf/configure.nems.Jet.ifort
M       src/conf/configure.nems.Linux.g95
M       src/conf/configure.nems.Linux.gnu
M       src/conf/configure.nems.Linux.intel
M       src/conf/configure.nems.Linux.pgi
M       src/conf/configure.nems.Unicos.intel
M       src/conf/configure.nems.Wcoss.intel
M       src/conf/configure.nems.Wcoss.intel_ESMF_520rp1
M       src/conf/configure.nems.Wcoss.intel_ESMF_520rp2
M       src/conf/configure.nems.Wcoss.intel_ESMF_630r
M       src/conf/configure.nems.Yellowstone.intel
M       src/conf/configure.nems.Zeus.intel
M       src/esmf_version
M       src/makefile
M       src/module_NEMS_GRID_COMP.F90
 
Performing status on external item at 'src/atmos/gsm':
M       src/atmos/gsm/phys/gloopr.f
 
Requested action by group:
 
I would appreciate it if those interested could please review and get back
to me with questions, suggestions and concerns within the next week.   Then,
based on the resulting discussions, I would hope to have the modifications
committed to the trunk and passing the reg-tests (with a new set of
reference data sets) by the end of this month (July).
 
Thanks
 
John 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lstsrv.ncep.noaa.gov/pipermail/ncep.list.nems.announce/attachments/20140721/08fc9c95/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RRTMG plots GFS.pdf
Type: application/pdf
Size: 496418 bytes
Desc: not available
Url : https://lstsrv.ncep.noaa.gov/pipermail/ncep.list.nems.announce/attachments/20140721/08fc9c95/attachment-0001.pdf 


More information about the Ncep.list.nems.announce mailing list