SU488: [Impact: Critical] SSD (PX05*) firmware to prevent data loss / unavailability

Views:
840
Last Updated:
4/27/2022, 2:57:48 PM

收藏

Summary

[Impact: Critical = Data loss or cluster data outage]

NetApp® has identified that the drive models listed in the table below fail at a higher rate than other drives shipped by NetApp. As a result, NetApp has implemented a drive firmware fix that can be upgraded non-disruptively to mitigate the issue. The updated firmware is available from the E/EF-Series Drive Firmware Download page on the NetApp Support site.

Update to the minimum drive firmware for these affected drive part numbers and identification strings:

Part Number Drive Identifier Capacity FW
E-X4041C PX05SVB080 800GB MS05
E-X4043C PX05SVB080 800GB MS05
E-X4058A-R6 PX05SVQ080 800GB MS05
E-X4059B PX05SVB160 1.6TB MS05
E-X4061B PX05SVQ080 800GB MS05
E-X4062B PX05SVB160 1.6TB MS05
E-X4083A PX05SVQ160B 1.6TB MS04
E-X4084A PX05SVQ160B 1.6TB MS04
E-X4085A PX05SVB080 800GB MS05
E-X4086A PX05SVB080 800GB MS05
E-X4087A PX05SVB080 800GB MS05
E-X4092A PX05SVB160 1.6TB MS05
E-X4093A PX05SVB160 1.6TB MS05
E-X4093A PX05SVQ160B 1.6TB MS04
E-X4096A PX05SVQ160B 1.6TB MS04
E-X4104A PX05SVQ160B 1.6TB MS04

Issue Description

Impactful issues resolved with this firmware release:

  1. Excessive NAND programming / erasure can lead to media errors and drive failure.
  2. Needless increase in erase counts of some logical blocks during refresh operations can result in gratuitous drive failure for medium errors.
  3. Unnecessary drive failure due to misreported 03/11/FF error.

Symptom

MEL (Major Event Log) output similar to the following might be indicative of one or more of these issues:

Unrecoverable errors with potential for volume failure:

Date/Time: 11/22/20 7:31:26 AM

Sequence number: 31364

Event type: 1016

Event category: Error

Priority: Informational

Event needs attention: false

Event send alert: false

Event visibility: true

Description: Drive returned CHECK CONDITION

Event specific codes: 3/11/FF

Component type: Drive

Component location: Tray 3, Drawer 3, Slot 9

Logged by: Controller in slot B

In a multiple drive failure scenario, RAID limits may be exceeded, in which case a Volume Group would go Offline (or fail), and the data would not be accessible.

A single drive failure would result in a degraded volume group.

Note: In any event where more drives are failed than RAID tolerance, immediate engagement with technical support is strongly recommended.

Solution

Update drive firmware per above Summary.

For StorageGRID appliances (SG6060s), follow the same steps as the Upgrading SANtricity OS Software on the storage controllers using maintenance mode procedure documented in the 11.4 SG6000 maintenance guide. The difference being a drive FW upgrade is done while in maintenance mode instead of a SANtricity upgrade.

Additional Information

See Bug #1411698

In accordance with the Support Services terms, always update NetApp products with the latest version of firmware and software to provide the best reliability, availability, and serviceability:

Hot spare drives: To best maintain the continuous presence of hot spare drives available in the system, adhere to Hot Spares Best Practices and follow the standard drive replacement process if a drive fails.

Active IQ System Risk Detection:

For customers who have enabled AutoSupport on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans. Drives that are not running the latest firmware is an example of such a risk. Not upgrading to the most current drive firmware could leave the storage appliance vulnerable to undesirable behavior.

Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.

***************************************************
*** NETAPP CONFIDENTIAL – FOR LIMITED USE ONLY ***
***************************************************