SU545: [Impact: Critical] HDD firmware (NE02) for ST8000NM018B to prevent data loss or disruption

Views:
411
Last Updated:
12/5/2023, 2:01:55 AM

收藏

Summary

[Impact: Critical = Cluster data outage with potential for data loss.]

An issue has been identified with the initial NE01 firmware for E-X4127B and E-X4128B drives. Under certain read / write and write intensive workloads, a defect in the initial firmware could cause multiple drives in a volume to fail. NetApp has implemented a drive firmware fix that can be upgraded to mitigate the issue. This firmware released October 11th, 2023. The updated firmware is available from the E-Series Disk Firmware page on the NetApp Support site.

Update to minimum drive firmware for the affected drive part numbers and identification strings, below:

Part Number Drive Identifier Capacity Firmware
E-X4127B ST8000NM018B 8TB NE02
E-X4128B ST8000NM018B 8TB NE02

Issue Description

The problem occurs when some mixed, non-sequential read/write IO workloads are running and Low Priority Commands (LPC) are sent to the drive. LPCs are commands to manage the drive itself. Asserts start to appear on the drive, rendering it unavailable.

Symptom

The specific signatures of the problem are below, including the 2/4/F2 error:

  • ST8000NM018B drive fails due to write failure. A 226c drive failure event is logged by E-Series software.

B:9/4/23, 12:20:45 AM (00:20:45) 4890 226c Drive failure - Shelf 0, Drawer 5, Bay 9 - Cause: 3 = Write failure; Drive WWN: 5000c500f0XXXXXX; SN: WRQ1LKEQ00XXXXXXXXXX <--CRITICAL

  • Immediately after the failure, a drive returned check condition event 100a is logged indicating the drive encountered an assertion storm (Sense Key / ASC / ASCQ: 2/4/f2)

B:9/4/23, 12:20:48 AM (00:20:48) 4893 100a Drive returned CHECK CONDITION - Shelf 0, Drawer 5, Bay 9

----> Sense 2/4/f2 = Not Ready - Logical unit not ready, assert storm threshold being exceeded - CDB: 0x1b = Start/Stop Unit - LBA: 0x17f000040f8ca58

Solution

Upgrade to the fixed firmware in the Summary above. For systems with failed drives, manual reconstruction can be done.

  • To recover the failed drives, the drive must be reconstructed. Hardware replacement can be done, but it is not necessary due to this being a firmware issue.
    • Instructions to Reconstruct Drive Manually.
    • Before reconstructing the drive, the drive must be reseated.
      • Alternatively, the drive can be power cycled remotely.
  • In event there are multiple drives failures accompanied with volume failures, please engage NetApp Technical Support for further assistance.

Additional Information

See BUG 1588953

E-Series ST8000NM018B drive failure with an assert storm

In accordance with the Support Services terms, always update NetApp products with the latest version of firmware and software to provide the best reliability, availability, and serviceability:

Hot spare drives: To best maintain the continuous presence of hot spare drives available in the system, adhere to Hot Spares Best Practices and follow the standard drive replacement process if a drive fails.

Active IQ System Risk Detection: For customers who have enabled AutoSupport on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans. Drives that are not running the latest firmware is an example of such a risk. Not upgrading to the most current drive firmware could leave the storage appliance vulnerable to undesirable behavior.

Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.

***************************************************
*** NETAPP CONFIDENTIAL – FOR LIMITED USE ONLY ***
***************************************************