[Ncep.list.emc.fv3gfs_tickets] FV3GFS Ticket #12: Changes for nests
FV3GFS Trac Ticket
ncep.list.emc.fv3gfs_tickets at noaa.gov
Wed Apr 5 12:11:54 UTC 2017
#12: Changes for nests
--------------------------+----------------------------
Reporter: tom.black@… | Owner: george.gayno@…
Type: enhancement | Status: accepted
Priority: major | Milestone:
Component: component1 | Version:
Resolution: | Keywords: nest
--------------------------+----------------------------
Comment (by george.gayno@…):
The orography code is compiled with "-O0" on
[https://svnemc.ncep.noaa.gov/trac/fv3gfs/browser/branches/black/fv3gfs/global_shared.v15.0.0/sorc/orog.fd/makefile.sh_cray?rev=90792
Cray] and with "-O3" on
[https://svnemc.ncep.noaa.gov/trac/fv3gfs/browser/branches/black/fv3gfs/global_shared.v15.0.0/sorc/orog.fd/makefile.sh_theia?rev=90792
Theia]. I asked Jim about this difference and here is his response:
{{{
We left out the -O3 compilation option for the Cray by mistake. It does
run the
code quite a bit faster. The answers do change but very little. Tom has
checked
the orography and it looks fine. I was going to suggest updating the trunk
in
a couple of weeks to add the -O3 option on the Cray and also to put in
files
necessary to build and run the codes on phase1/2.
}}}
Jim also explained what changes he made to the code and provided some
timing results:
{{{
I thought you might like to know the changes I made to the orography code.
Compiling the original code and the optimized code with -O0 (how it is set
in the current trunk) yields identical answers. I profiled the code and
also
wrote a simple timing routine to find where the code spent its time. There
was some very expensive duplicate code:
angle = spherical_angle(pnt0, pnt2, pnt1)
anglesum = anglesum + spherical_angle(pnt0, pnt2, pnt1)
The function spherical_angle is very expensive, so I replaced the above
code
with:
angle = spherical_angle(pnt0, pnt2, pnt1)
anglesum = anglesum + angle
Then, I threaded loops in 3 routines: MAKEMT2, MAKEPC2, MAKEOA2.
Finally, I changed -O0 to -O3. This was the only time the answers changed
and
the change is very small. The speed-up is very nice.
Here are timings for the orography code for a C96, uniform case:
tile orig,-O0 Opt,-O0 Opt,-O0, 6threads Opt,-O3,6threads
1 386 311 93 43
2 381 307 97 48
3 1160 927 316 173
4 390 310 106 43
5 391 311 110 45
6 1159 917 306 173
Here are results from C768, uniform. For the regional work, we will need
to
generate a 7th tile and experiment with nest boundaries to get the nest
situated where we want it. I wrote a special driver wrapper script that
runs
the orography for tiles 1-4 simultaneously and the 5-7 simultaneously.
This
greatly reduces our wall time. The following results illustrate this for
the
C768 uniform.
tile orig,-O0 opt,-O3,6 threads
1 874 77
2 873 73
3 2146 249
4 895 76
5 892 77
6 1786 236
So the original code using the default -O0 with no threading ran in 7466
seconds (with the 7th tile add another 800 seconds or so). The optimized
code
compile at -O3 with threading (run with 6 threads) using the wrapper
script
finished in 485 seconds. Thus a speed-up of 15x.
}}}
--
Ticket URL: <https://svnemc.ncep.noaa.gov/trac/fv3gfs/ticket/12#comment:8>
fv3gfs <https://svnemc.ncep.noaa.gov/trac/fv3gfs>
NGGPS FV3GFS Development
More information about the Ncep.list.emc.fv3gfs_tickets
mailing list