SU544: [Impact: Critical] HUH7210* and HUH7212* HDD firmware to prevent potential data loss or disruption / unavailability

Views:: 813

Last Updated:: 1/10/2024, 11:54:18 PM

免责声明：
联想凌拓科技有限公司（“Lenovo NetApp”）不对本页面中提供的任何信息或建议的准确性、可靠性或可维护性，或通过使用这些信息或遵守本文中提供的建议可能获得的任何结果，提供任何陈述或保证。本页面中的信息是按原样分发的，使用这些信息或实施本文中的任何建议或技术是客户的责任，取决于客户评估这些信息并将其整合到客户的运营环境中的能力。本页面及其包含的信息只能与本页面中讨论的 NetApp 产品结合使用。在任何情况下，Lenovo NetApp 均不承担因与使用或执行本页面上提供的信息有关的或导致的任何特殊的、间接的或随之而来的任何损失，或者因使用、数据或利润损失（无论是否在合同履行中）、疏忽或其它侵权行为导致的任何损失。

更多最新信息请参考 NetApp 官网支持公告

Summary

[Impact: Critical = Data loss and/or cluster data outage.]

NetApp^® has previously identified that the drive models listed in the table below have the potential to fail at a higher rate than other drives shipped by NetApp. Due to the nature of this issue, NetApp strongly recommends performing this upgrade as soon as possible to avoid excessive idle time wear of the HDD that can lead to increasingly disruptive scenarios.

This bulletin is a follow up to SU478 to emphasize the criticality of specific drives that have not been upgraded. While the core guidance has not changed, this message is being upgraded to Critical due to the increasing age of systems with long running drives.

Part Number	Drive Identifier	Capacity	Firmware
E-X4073A	HUH721008AL5204	8TB	NE03
E-X4074A	HUH721008AL5204	8TB	NE03
E-X4127A	HUH721008AL5204	8TB	NE03
E-X4128A	HUH721008AL5204	8TB	NE03
E-X4107A	HUH721010AL5204	10TB	NE03
E-X4110A	HUH721010AL5204	10TB	NE03
E-X4111A	HUH721010AL5204	10TB	NE03
E-X4115A	HUH721010AL5204	10TB	NE03
E-X4118A	HUH721010AL5205	10TB	NE03
E-X4121A	HUH721010AL5205	10TB	NE03
E-X4124A	HUH721010AL5205	10TB	NE03
E-X4130A	HUH721010AL5205	10TB	NE03
E-X4131A	HUH721212AL5204	12TB	NE02
E-X4132A	HUH721212AL5204	12TB	NE02

Issue Description

Drives on firmware versions less than those listed in the table above are at risk for a higher-than-expected rate of failure due to excessive background media scan, which can lead to data loss, disruption, or unavailability, if multiple drives fail simultaneously:

In a multiple drive failure scenario, RAID limits may be exceeded, in which case a Volume Group would go Offline (or fail), and data in cache could be lost.
In a single drive failure scenario, a drive will likely be failed for degraded performance. This would result in a degraded volume group.
Note: In any event where more drives are impacted than RAID tolerance, immediate engagement with technical support is strongly recommended.

Symptom

Multiple drives all reporting timeouts would be a potential indicator of the problem. Repeated events like the major event log below can be used to try and detect this, but identification of drive failures as this specific issue requires engagement with NetApp Technical Support.

Date/Time: 4/1/23, 7:18:11 AM Sequence number: 5012 Event type: 100D Event category: Error Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Timeout on drive side of controller Event specific codes: 0/0/0 Component type: Drive Component location: Shelf 1, Bay 8 Logged by: Controller in bay B

Additional Information

See Bug 1372240, 1407507

In accordance with the Support Services terms, always update NetApp products with the latest version of firmware and software to provide the best reliability, availability, and serviceability:

Download drive firmware from the E-Series Disk Firmware page.
Upgrade instructions: Upgrading drive firmware.
Upgrade instructions for StorageGRID appliances using maintenance mode: Upgrading SANtricity OS Software on the storage controllers using maintenance mode.
For more information: How to obtain the latest drive firmware for E/EF-Series.

Hot spare drives: To best maintain the continuous presence of hot spare drives available in the system, adhere to Hot Spares Best Practices and follow the standard drive replacement process if a drive fails.

Active IQ System Risk Detection:
For customers who have enabled AutoSupport on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans. Drives that are not running the latest firmware is an example of such a risk. Not upgrading to the most current drive firmware could leave the storage appliance vulnerable to undesirable behavior.

Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.

Services Partners Additional Notes

Media wear cannot be definitively identified by drive failure rates alone. NetApp Engineering needs drive logs from offline drives and/or drives returned via RCA to specifically confirm media wear failures.

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.