SU536: [Impact Critical]: System disruption occurs on FAS systems with HDDs due to medium errors or recovered errors when running 9.12.1 versions prior to 9.12.1P4
- Views:
- 2,464
- Last Updated:
- 7/7/2023, 12:38:24 PM
收藏
Summary
[Impact Critical: Possible cluster data outage]
- A software defect in ONTAP 9.12.1 can result in a system disruption for FAS systems with HDDs when reading data from a fragmented file system if disk media or recovered errors are experienced on one or more disks housing the aggregate being read from during the attempted read.
- The issue is fixed in ONTAP 9.12.1P4.
- Customers with FAS systems with HDDs running versions of ONTAP 9.12.1 earlier than 9.12.1P4 are strongly advised to upgrade those systems to ONTAP 9.12.1P4 in order to avoid a possible system disruption as a result of experiencing disk media errors.
Issue Description
In a fragmented file system, discontiguous read I/O blocks are padded with dummy blocks to avoid splitting the I/Os into multiple I/Os to improve performance. Because of a defect in ONTAP 9.12.1, if a disk involved in an aggregate data read from a fragmented file system returns medium errors or recovered errors on the dummy blocks, it is not repaired by the ONTAP RAID layer, and a loop of retrying the same read I/Os is created. As a result, a system disruption occurs.
Note: As of time of writing, this issue has only been reported on FAS systems with HDDs.
Symptom
The storage appliance will experience a node panic with a panic string similar to the following:WAFL hung for <aggregate name>. in SK process wafl_exempt<nn> on release 9.12.1
Prior to the panic, media errors for one or more drives associated with the aggregate will be reported by the storage appliance.
Workaround
None. However, replacing the drive(s) reporting errors will prevent further issues caused by continued errors on those drives.
Solution
Upgrade to ONTAP 9.12.1P4 (or later as available).
Additional Information
BUG ID 1524092
联想凌拓科技有限公司(“Lenovo NetApp”)不对本页面中提供的任何信息或建议的准确性、可靠性或可维护性,或通过使用这些信息或遵守本文中提供的建议可能获得的任何结果,提供任何陈述或保证。本页面中的信息是按原样分发的,使用这些信息或实施本文中的任何建议或技术是客户的责任,取决于客户评估这些信息并将其整合到客户的运营环境中的能力。本页面及其包含的信息只能与本页面中讨论的 NetApp 产品结合使用。在任何情况下,Lenovo NetApp 均不承担因与使用或执行本页面上提供的信息有关的或导致的任何特殊的、间接的或随之而来的任何损失,或者因使用、数据或利润损失(无论是否在合同履行中)、疏忽或其它侵权行为导致的任何损失。
更多最新信息请参考 NetApp 官网支持公告