[Ncep.list.nems.announce] Proposed NEMS commit for Performance Improvements to RRTM-G
john.michalakes at noaa.gov
Mon Jul 21 17:39:36 UTC 2014
Following up with some additional information on the proposed RRTM-G
1. I've generated a series of difference plots (attached) for output
for the GFS_32 case, comparing 03h, 12h, 24h, and 48h output from the
top-of-repository code with the output from the code with the proposed
updates to RRTMG and with the -fno-alias changes (see #3 below). Differences
are small compared with the magnitudes of the fields, but there is also some
structure to the differences that is visible along the day-night boundary.
2. Contrary to the information in my original email, on WCOSS the GFS
code is also faster with the RRTM changes (both NMMB and GFS now use the
same underlying RRTM LW and SW routines). The earlier results were gathered
on Zeus with older "Westmere" processors. On WCOSS, with Xeon Sandybridges
and AVX vector instructions, the RRTM mods improve performance for the
GFS_16_32 regtest by 8.6%, from 30 sec to 27.4 sec.
3. I tested the effect of compiling NEMS with the -fno-alias flag.
This improves both the NMMB and GFS times, even without the RRTMG changes
that I'm proposing.
a. For the GFS_16_32 regression test (32 threads on 16 cores on
of Trunk compiled without the -fno-alias flag: 34.2s
ii. Top of
Trunk compiled with the -fno-alias flag: 30.0s
iii. Top of
Trunk with RRTM changes and -fno-alias: 27.4s
b. The reason that -fno-alias improves performance is because it allows
the Intel Compiler (ifort 12.1.5 on Tide) to recognize and vectorize 260
loops across 36 source files that it could not vectorize before because it
couldn't assume there was no address aliasing going on. The -fno-alias
option asserts that there is none of this going on and lets the compiler
vectorize. The list of source files with loops that -fno-alias allows to
vectorize is appended, below.
c. I would like to also update the configure.nems files to add this
Please review and let me know if you have any concerns about this proposed
--- List of source files with loops that vectorize with -fno-alias ---
From: John Michalakes [mailto:john.michalakes at noaa.gov]
Sent: Monday, July 14, 2014 5:15 PM
To: 'ncep.list.nems.announce at lstsrv.ncep.noaa.gov'
Subject: Proposed NEMS commit for Performance Improvements to RRTM-G
I would like to start the process for obtaining approval to commit
performance related changes to the RRTMG in the NEMS trunk.
Performance improvement on Zeus for then NMM_CNTRL workload:
Original (revision 42943) : 93.5s (0.794s)
With optimized RRTMG: 89.2s (0.739s)
The first times is the time to run the atmospheric component from beginning
to end (that is, the time spent in the ATM_RUN subroutine). In parentheses
is the time per radiation call averaged over the 1125 calls (25 calls on 45
Performance improvement is configuration and workload dependent and will be
the subject of ongoing work. With regard to the GFS cases, the RRTM changes
degrade performance a little bit, but assuming output is okay, I'm
requesting approval to commit the changes into the trunk now and then work
more on improving performance afterwards. (I've been spending a lot of time
and energy keeping up with changes to the trunk).
The attached pdf file is a small sampling of "old-new" grads difference
plots at 03, 12, and 48 hours. There are differences but generally
"snow-like" (more or less random) especially at the 03hr time period.
Since the new output does not agree bit-for-bit, I modified the regression
test script to report the difference and continue on through the next tests.
All 13 tests run to completion:
but at this point the only output I've looked at is the NMM_CNTRL test for
I would like to also do difference plots for the GFS cases but I'm not sure
how to do that since the output isn't grads (as far as I can tell). I'm
looking for suggestions, please.
Code location and summary of changes:
The code is my home directory on Zeus:
and is relative to Revision: 42438 (thus, a little bit behind the top of
the trunk right now, which is at Revision: 42943, but that's just Weiyu's
new changes to the regression scripts).
A summary of the modified files is here:
Performing status on external item at 'src/atmos/gsm':
Requested action by group:
I would appreciate it if those interested could please review and get back
to me with questions, suggestions and concerns within the next week. Then,
based on the resulting discussions, I would hope to have the modifications
committed to the trunk and passing the reg-tests (with a new set of
reference data sets) by the end of this month (July).
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RRTMG plots GFS.pdf
Size: 496418 bytes
Desc: not available
Url : https://lstsrv.ncep.noaa.gov/pipermail/ncep.list.nems.announce/attachments/20140721/08fc9c95/attachment-0001.pdf
More information about the Ncep.list.nems.announce