[Ncep.list.nws-ncep-management] UPDATE #6 (Re: Problems at Boulder - IDP applications failed over to College Park)

NOAA SDM sdm at noaa.gov
Sat Aug 12 19:28:23 UTC 2017


6th update: NCO tech support have restored nearly all of Boulder systems
back on-line.   They are working with database administrator personnel to
resolve a lingering technical issue with Boulder MADIS database systems.

On 12 August 2017 at 17:26, NOAA SDM <sdm at noaa.gov> wrote:

> 5th update: NCO tech support has made further progress in bring Boulder
> systems back on-line, including dataflow systems that are now reported as
> "healthy."  Otherwise, recovery efforts continue towards declaring Boulder
> a viable backup.
>
> On 12 August 2017 at 16:14, NOAA SDM <sdm at noaa.gov> wrote:
>
>> 4th update: NCO tech support is continuing recovery efforts in Boulder,
>> which include applications checking.  In addition, database support
>> personnel has joined to assess the health of database systems in Boulder.
>>
>> On 12 August 2017 at 15:08, NOAA SDM <sdm at noaa.gov> wrote:
>>
>>> 3rd update: NCO tech support has begun performing applications checking,
>>> to determine whether Boulder is healthy enough to be declared a viable
>>> backup.
>>>
>>> On 12 August 2017 at 14:09, NOAA SDM <sdm at noaa.gov> wrote:
>>>
>>>> 2nd update of the day: NCO tech support has brought Boulder IDP IRIS
>>>> and MADIS databases on-line, but in single user only mode.  They are
>>>> working to resolve the single user mode issue.
>>>>
>>>> On 12 August 2017 at 13:09, NOAA SDM <sdm at noaa.gov> wrote:
>>>>
>>>>> 1st update of the day: NCO tech support is currently on track to have
>>>>> enough Boulder systems restored, to be declared as a viable backup.
>>>>>
>>>>> On 12 August 2017 at 04:06, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>
>>>>>> UPDATE..
>>>>>>
>>>>>> NSB's Mark Reeser provided the following update..
>>>>>>
>>>>>> Recovered from boulder outage at approximately 630pm MDT.  (0030Z)
>>>>>> Root cause is still unknown.  We will be collecting data and sending
>>>>>> to Cisco for root cause analysis.
>>>>>> Initial hypothesis is that the (relatively old code) that we're
>>>>>> running on the Data Center Core switches, (BLDRCR1 and BLDRCR2) was
>>>>>> impacted by a code bug, which was injected into the system earlier in the
>>>>>> day during an approved accelerated change.
>>>>>> Once the change was reverted, the system normalized...but Cisco can
>>>>>> still not connect the dots on how our change, which was done correctly,
>>>>>> would have had this impact. More to come as we continue with the root cause
>>>>>> analysis.
>>>>>>
>>>>>> Boulder IDP apps will remain off line until deemed stable.. Hopefully
>>>>>> tomorrow all apps be be brought back tomorrow..
>>>>>>
>>>>>> Another Impact that was unforeseen..
>>>>>>
>>>>>> 00Z NAM ran without upper air data for the 00Z 08/12/17 cycle due to
>>>>>> the network outage having an impact on WCOSS (TIDE Primary) ingest...  00Z
>>>>>> GFS had near normal data counts.   Also the 00Z LAMP job had to be scrubbed
>>>>>> due to a needed missing MRMS file which was also attributed to the Boulder
>>>>>> network outage..
>>>>>> SDM
>>>>>>
>>>>>> Randy
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: NOAA SDM <sdm at noaa.gov>
>>>>>> Date: 11 August 2017 at 23:42
>>>>>> Subject: Fwd: Problems at Boulder - IDP applications failed over to
>>>>>> College Park
>>>>>> To: "_NCEP.List.NWS-NCEP-management" <ncep.list.nws-ncep-management
>>>>>> @noaa.gov>
>>>>>>
>>>>>>
>>>>>> Problem at Boulder continues.
>>>>>>
>>>>>> *Boulder currently is not a viable backup. * All IDP applications
>>>>>> have been moved to the College Park Facility with the exception of GIS
>>>>>> which is pending Akamai processing which should be complete by 2300Z.
>>>>>> Unfortunately GIS at the College Park Facility is older this will result in
>>>>>> the loss of NWS_Forecasts_Guidance_Warnings / National Digital
>>>>>> Guidance Database image services.
>>>>>>
>>>>>>
>>>>>> The Following Impacts were noted by IDP and our DataFlow Groups
>>>>>>
>>>>>>
>>>>>> TGFTP
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    Some data stopped updating on TGFTP at 17:57z, customers moved to
>>>>>>    CP at 18:58z
>>>>>>
>>>>>>
>>>>>> NOMADS
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    Outage started at 18:12z, customers moved to CP 18:54z
>>>>>>
>>>>>>
>>>>>> IDP data to the TG
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    MADIS data stopped updating to the SBN between 18:11z and 20:22z
>>>>>>    (2hr 11min)
>>>>>>
>>>>>>
>>>>>> RADAR2
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    no impact, customers moved before radar2 systems in BD impacted
>>>>>>
>>>>>>
>>>>>> RADAR3
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    22min outage in BD out between 18:44z and 19:06z, users failed to
>>>>>>    CP starting at 19:16z
>>>>>>
>>>>>>
>>>>>> Data pushes from local centers to Boulder for IDP apps there
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    1hr 20min stoppage in data flow, data backfilled when flow
>>>>>>    restored (stopped 18:12z, started up again at 19:32z)
>>>>>>
>>>>>>
>>>>>>
>>>>>> *MADIS* Customers down for two hours (between 1840Z and 2040Z)
>>>>>>
>>>>>> *NLETS* Customers down for over an hour and a half (between 1810Z
>>>>>> and 1955Z)
>>>>>>
>>>>>> *IRIS/iNWS* down 40 minutes (between 1818Z and 1900Z)
>>>>>>
>>>>>> *EDIS/FTPMail* down 40 minutes (between 1820Z and 1900Z)
>>>>>>
>>>>>>
>>>>>>
>>>>>> SDM
>>>>>> Randy
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: NOAA SDM <sdm at noaa.gov>
>>>>>> Date: 11 August 2017 at 20:40
>>>>>> Subject: Fwd: Problems at Boulder - IDP applications failed over to
>>>>>> College Park
>>>>>> To: "_NCEP.List.NWS-NCEP-management" <ncep.list.nws-ncep-management
>>>>>> @noaa.gov>
>>>>>>
>>>>>>
>>>>>> Problems at Boulder continue.. NCO will be failing over the remainder
>>>>>> of the applications NLETS/EMWIN MADIS and GIS to college park while support
>>>>>> investigates the issue at Boulder..
>>>>>>
>>>>>> SDM
>>>>>>
>>>>>> Randy
>>>>>> ---------- Forwarded message ----------
>>>>>> From: NOAA SDM <sdm at noaa.gov>
>>>>>> Date: 11 August 2017 at 19:03
>>>>>> Subject: Problems at Boulder - IDP applications failed over to
>>>>>> College Park
>>>>>> To: "_NCEP.List.NWS-NCEP-management" <ncep.list.nws-ncep-management
>>>>>> @noaa.gov>
>>>>>>
>>>>>>
>>>>>> All,
>>>>>>
>>>>>> Shortly after 1800Z (2 pm ET), various IDP applications hosted in
>>>>>> Boulder, became non functional.  NCO technical support are failing over
>>>>>> applications back to host in College Park.   Stay tuned for more details as
>>>>>> we receive new information.
>>>>>>
>>>>>> --
>>>>>> Senior Duty Meteorologist
>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Senior Duty Meteorologist
>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Senior Duty Meteorologist
>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Senior Duty Meteorologist
>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Senior Duty Meteorologist
>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Senior Duty Meteorologist
>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>
>>>
>>>
>>>
>>> --
>>> Senior Duty Meteorologist
>>> NOAA/NWS/NCEP/NCO/OMB
>>>
>>
>>
>>
>> --
>> Senior Duty Meteorologist
>> NOAA/NWS/NCEP/NCO/OMB
>>
>
>
>
> --
> Senior Duty Meteorologist
> NOAA/NWS/NCEP/NCO/OMB
>



-- 
Senior Duty Meteorologist
NOAA/NWS/NCEP/NCO/OMB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.nws-ncep-management/attachments/20170812/6a914cc6/attachment-0001.html 


More information about the Ncep.list.nws-ncep-management mailing list