[Ncep.list.nws-ncep-management] Luna WCOSS storage issue

NOAA SDM sdm at noaa.gov
Sat Jan 28 05:22:41 UTC 2017


Current Update from Cray SA's

This verify pass is taking longer than previous passes.
We're currently at 70% complete after 6.5 hours,

Remaining Steps:
Complete verifies (3 hours remaining)
DDN scan to identify bad blocks (1 hour work remaining after verifies
complete)
mmfsck on the HPS file system to resolve any file system inconsistencies.
(5 - 8 hours)
Baseline Test (1 hour)
Cross mount of HPS on Tide (1 hour)

Total time remaining (11 - 14 hours)

SDM

Randy

On 28 January 2017 at 00:42, NOAA SDM <sdm at noaa.gov> wrote:

> FYI...
>
> System Administrators continue to work on Luna... current estimate update
> now is as follows...
>
>
> The final verifies of the raid sets have completed.
>
> Next steps;
> DDN scan to identify bad blocks (3 hours)
> mmfsck on the HPS file system to resolve any file system inconsistencies.
> (5 - 8) hours
> Baseline Test (1 hour)
> Cross mount of HPS on Tide (1 hour)
>
> Total time remaining (10 - 13 hours)
>
> . So the earliest Luna will be returned is early morning tomorrow.
>
> We will continue to provide updates if that time line changes as well as
> when the system is restored.
>
> SDM Grant Newby
>
>
>
> On 27 January 2017 at 21:43, NOAA SDM <sdm at noaa.gov> wrote:
>
>> FYI...
>>
>> System Administrators continue to work on Luna... current estimate now is
>> another 12-15 hours of work. So the earliest Luna will be returned is early
>> morning tomorrow.
>>
>> We will continue to provide updates if that timeline changes as well as
>> when the system is restored.
>>
>> SDM Grant Newby
>>
>>
>>
>> On 27 January 2017 at 15:48, NOAA SDM <sdm at noaa.gov> wrote:
>>
>>> FYI -
>>>
>>> Cray continues work to verify, check, and repair the file system on
>>> LUNA.  IBM is now estimating a probable restoration time of early this
>>> evening.
>>>
>>> We will continue to provide updates if that timeline changes and once
>>> the system is restored.
>>>
>>> Rob Handel
>>>
>>> On 27 January 2017 at 13:49, NOAA SDM <sdm at noaa.gov> wrote:
>>>
>>>> Update as of 8:33am ET -
>>>>
>>>> Cray continues to run verifies on LUNA which are now 80 percent
>>>> complete...this step should be completed around 10am ET.  They will then
>>>> start the process of checking and repairing the file system, clearing any
>>>> bad blocks they find.  ETR unavailable at this time.
>>>>
>>>> We expect another update after 10am.
>>>>
>>>> Rob Handel
>>>>
>>>> On 27 January 2017 at 10:57, NOAA SDM <sdm at noaa.gov> wrote:
>>>>
>>>>>
>>>>> CRAY's Rich Bach reported the following...Update...1/27/17  1056Z
>>>>>
>>>>>
>>>>> The verifies are still running on some of the raid sets at this point.
>>>>>
>>>>>  A number of them had to rerun due to additional errors that were
>>>>> found during the previous verify.
>>>>>
>>>>>
>>>>>
>>>>> I have been monitoring the progress and capturing any additional
>>>>> information requested by DDN.
>>>>>
>>>>> I will update you again when the current verifies complete or by 0830
>>>>> EST this morning.
>>>>>
>>>>>
>>>>>
>>>>> SDM
>>>>>  Randy
>>>>>
>>>>>
>>>>> On 27 January 2017 at 05:46, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>
>>>>>> Latest Update from Rich Bach concerning LUNA.
>>>>>>
>>>>>>
>>>>>> The verifies are still running on the raid sets on DDN rack 2 on
>>>>>> LUNA.
>>>>>>
>>>>>>  Current estimate is that they will finish between 0400 and 0600 in
>>>>>> the morning.
>>>>>>
>>>>>>  I will continue to monitor them through the night in case they
>>>>>> finish before the estimated time.
>>>>>>
>>>>>>  I will send the next update when the verifies complete, or by 0600
>>>>>> which ever happens first.
>>>>>>
>>>>>>
>>>>>> SDM
>>>>>>
>>>>>> Randy
>>>>>>
>>>>>> On 27 January 2017 at 04:05, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>>
>>>>>>> Latest Update... From Cray SA Rich Bach
>>>>>>>
>>>>>>>
>>>>>>> We are continuing to work directly with the DDN support team.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The verifies are still running at this time, we have completed 8 of
>>>>>>> the 48 and from the timing it will be 3 to 5 hours before these complete at
>>>>>>> this point. (estimated.).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Once completed we will need to mmfsck on the HPS file system to
>>>>>>> resolve any file system inconsistencies.   We will then do file system
>>>>>>> cleanup to clear any bad blocks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I will send the next update at 0100 EST.
>>>>>>>
>>>>>>>
>>>>>>> SDM
>>>>>>>
>>>>>>> Randy
>>>>>>>
>>>>>>> On 27 January 2017 at 02:09, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>>>
>>>>>>>> IBM and Cray personnel continue to work on this issue... the latest
>>>>>>>> update from Cray is as follows..
>>>>>>>>
>>>>>>>> Cray SA's have noted... Due to the nature of the issue the raid
>>>>>>>> controllers on rack 2 have gone into a force verify state. That means that
>>>>>>>> there is an active scan/verify of the data stored on the volumes in
>>>>>>>> progress which needs to finish before we continue. Currently the verifies
>>>>>>>> are at about 35 percent.  After this is complete.. they can begin proceed
>>>>>>>> with their work to resolve this issue.
>>>>>>>>
>>>>>>>> We will provide another update once more information becomes
>>>>>>>> available..
>>>>>>>>
>>>>>>>> SDM Grant Newby
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26 January 2017 at 23:13, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>>>>
>>>>>>>>> IBM and Cray personnel continue to work on this issue... the
>>>>>>>>> latest update from Cray is as follows..
>>>>>>>>>
>>>>>>>>> We have supplied some additional data and the escalation is at the
>>>>>>>>> highest level.
>>>>>>>>>
>>>>>>>>>  We are waiting for a response with next steps within the hour.
>>>>>>>>>
>>>>>>>>> As noted previously,  LUNA has been unmounted from TIDE in the
>>>>>>>>> meantime, allowing for that portion of the Reston WCOSS to still be
>>>>>>>>> available.  However, we do not currently have a fully viable backup WCOSS.
>>>>>>>>>
>>>>>>>>> Operations continue to run normally on the primary WCOSS in
>>>>>>>>> Orlando (GYRE and SURGE).
>>>>>>>>>
>>>>>>>>> We will continue provide updates as we get more information.
>>>>>>>>>
>>>>>>>>> SDM Grant Newby
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> FYI -
>>>>>>>>>>
>>>>>>>>>> Around 12:30pm ET, a file system problem developed on LUNA - IBM
>>>>>>>>>> and Cray personnel are currently investigating.  LUNA has been unmounted
>>>>>>>>>> from TIDE in the meantime, allowing for that portion of the Reston WCOSS to
>>>>>>>>>> still be available.  However, we do not currently have a fully viable
>>>>>>>>>> backup WCOSS.
>>>>>>>>>>
>>>>>>>>>> Operations continue to run normally on the primary WCOSS in
>>>>>>>>>> Orlando (GYRE and SURGE).
>>>>>>>>>>
>>>>>>>>>> We will continue provide updates as we get more information.
>>>>>>>>>>
>>>>>>>>>> Rob Handel
>>>>>>>>>> --
>>>>>>>>>> Senior Duty Meteorologist
>>>>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Senior Duty Meteorologist
>>>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>>>
>>>>>>>>> On 26 January 2017 at 18:42, NOAA SDM <sdm at noaa.gov> wrote:
>>>>>>>>>
>>>>>>>>>> FYI -
>>>>>>>>>>
>>>>>>>>>> Around 12:30pm ET, a file system problem developed on LUNA - IBM
>>>>>>>>>> and Cray personnel are currently investigating.  LUNA has been unmounted
>>>>>>>>>> from TIDE in the meantime, allowing for that portion of the Reston WCOSS to
>>>>>>>>>> still be available.  However, we do not currently have a fully viable
>>>>>>>>>> backup WCOSS.
>>>>>>>>>>
>>>>>>>>>> Operations continue to run normally on the primary WCOSS in
>>>>>>>>>> Orlando (GYRE and SURGE).
>>>>>>>>>>
>>>>>>>>>> We will continue provide updates as we get more information.
>>>>>>>>>>
>>>>>>>>>> Rob Handel
>>>>>>>>>> --
>>>>>>>>>> Senior Duty Meteorologist
>>>>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Senior Duty Meteorologist
>>>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Senior Duty Meteorologist
>>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Senior Duty Meteorologist
>>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Senior Duty Meteorologist
>>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Senior Duty Meteorologist
>>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Senior Duty Meteorologist
>>>> NOAA/NWS/NCEP/NCO/OMB
>>>>
>>>
>>>
>>>
>>> --
>>> Senior Duty Meteorologist
>>> NOAA/NWS/NCEP/NCO/OMB
>>>
>>
>>
>>
>> --
>> Senior Duty Meteorologist
>> NOAA/NWS/NCEP/NCO/OMB
>>
>
>
>
> --
> Senior Duty Meteorologist
> NOAA/NWS/NCEP/NCO/OMB
>



-- 
Senior Duty Meteorologist
NOAA/NWS/NCEP/NCO/OMB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.nws-ncep-management/attachments/20170128/6e3b6863/attachment-0001.html 


More information about the Ncep.list.nws-ncep-management mailing list