SU498: [Impact: Critical] HDD (ST4000NM003A, ST8000NM001A) firmware to prevent data unavailability

Views:
914
Last Updated:
5/25/2022, 11:21:50 PM

收藏

Summary

[Impact: Critical = Cluster data outage]

NetApp® has identified a specific configuration where the drive models listed in the table below run at 6Gb/s instead of the expected 12Gb/s, and potentially are not being detected at all, which can lead to data disruption or unavailability. As a result, NetApp has implemented a drive firmware fix that can be upgraded non-disruptively to mitigate the issue. The updated firmware is available from the E/EF-Series Drive Firmware Download page on the NetApp Support site.

Update to minimum drive firmware for the affected drive part numbers and identification strings, below:

Part Number Drive Identifier Capacity New Firmware
E-X4105A ST4000NM003A 4TB MS02
E-X4103A ST4000NM003A 4TB MS02
E-X4127A ST8000NM001A 8TB NE02
E-X4128A ST8000NM001A 8TB NE02

Issue Description

If an IOM12B drive shelf module (Part Number: E-X5720B) is installed in a shelf with one of these drives on firmware versions less than MS02/NE02 (per the summary above), the drives are at risk for running at 6Gb/s instead of the expected 12Gb/s, or potentially not being detected at all by the IOM12B module.

Note: This can also be considered a proactive notification. You might see this same issue when an IOM12B module is used for a future replacement or reconfiguration of the shelf.

Symptom

A storage array profile similar to the following example would signal the occurrence of this issue:

       TRAY, DRAWER, SLOT  STATUS   CAPACITY      MEDIA TYPE       INTERFACE TYPE  CURRENT DATA RATE  PRODUCT ID        FIRMWARE VERSION  CAPABILITIES  

      0,    2,      1     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      2     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      3     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      4     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      5     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      6     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA            

      0,    2,      7     Optimal  7.325,351 GB  Hard Disk Drive  SAS             6 Gbps             ST8000NM001A      NE01              DA 

Or if the drive(s) cannot complete initialization process the drive may show by-passed as shown in the MEL (Major Event Log) example below:

Date/Time: Apr 21, 2022 2:43:41 PM
Sequence number: 41293
Event type: 2823
Event category: Failure
Priority: Critical
Event needs attention: true
Event send alert: true
Event visibility: true
Description: Drive by-passed
Event specific codes: 0/0/0
Component type: Drive
Component location: Shelf 2, Drawer 3, Bay 11
Logged by: A

In a multiple drive failure scenario, RAID limits may be exceeded, in which case a Volume Group would go Offline (or fail), and the data would not be accessible. A single drive failure would result in a degraded volume group.

Note: In any event where more drives are failed than RAID tolerance, immediate engagement with technical support is strongly recommended.

Solution

Update drive firmware per above Summary.

Online Firmware Update:

  • For E-Series systems on CFW (Controller Firmware) 8.20 and later, ODFU (Online Drive Firmware Update) is supported; ensure ODFU is enabled, and that the latest firmware is installed. Refer to the below Additional Information section.
  • Caveat: ODFU is a non-disruptive but manual process.

StorageGRID appliances (SG6060s):

Additional Information

See Bug #1468081

In accordance with the Support Services terms, always update NetApp products with the latest version of firmware and software to provide the best reliability, availability, and serviceability:

Hot spare drives: To best maintain the continuous presence of hot spare drives available in the system, adhere to Hot Spares Best Practices and follow the standard drive replacement process if a drive fails.

Active IQ System Risk Detection: For customers who have enabled AutoSupport on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans. Drives that are not running the latest firmware is an example of such a risk. Not upgrading to the most current drive firmware could leave the storage appliance vulnerable to undesirable behavior.

Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.

***************************************************
*** NETAPP CONFIDENTIAL – FOR LIMITED USE ONLY ***
***************************************************