SU539: [Impact Critical] Deswizzler scan might cause system controller disruption when issuing cloud reads to a FabricPool capacity tier.
- Views:
- 168
- Last Updated:
- 8/29/2023, 12:57:38 AM
收藏
Summary
[Impact Critical: Possible cluster data outage]
A bug in specific versions of ONTAP could result in a controller disruption and possible cluster data outage when the cluster is configured to use FabricPool and certain workflows trigger the running of a file system scanner called the deswizzler while the S3 data store hosting the FabricPool capacity tier is unavailable.
Issue Description
This issue can occur on an ONTAP cluster under the following conditions...
- Cluster is running one of the following ONTAP versions:
- ONTAP 9.8x: 9.8P17, 9.8P18, 9.8P19
- ONTAP 9.9.1x: 9.9.1P14, 9.9.1P15, 9.9.1P16
- ONTAP 9.10.1x: 9.10.1P11, 9.10.1P12, 9.10.1P13
- Cluster is configured to utilize NetApp's FabricPool technology.
- A WAFL background scanner (known as the deswizzler scanner) is run against the FabricPool volume as a result of one of the following workflows being used in conjunction with the FabricPool volume
- SnapMirror DP (BRE)
- Vol Move
- Clone Split
- The FabricPool volume has data tiered to a capacity tier that is located on a remotely hosted S3 data store.
- The S3 data store hosting the capacity tier is unavailable while the deswizzler scanner is attempting to run against the FabricPool volume.
Symptom
If encountered, this issue will result in a controller disruption with a panic string similar to the following being reported:
PANIC: Cannot run exception without flag set! message type=WAFL_SCAN_DESWIZZLE flags=1 in SK process wafl_exempt<nn> on release 9.x.yPx (C)
Workaround
Disable the WAFL deswizzler scanner
- Cluster::>
node run -node [node name] "options wafl.deswizzle.enable off"
Note that disabling the WAFL deswizzler scanner may result in some performance impact.
After upgrading to a release with the fix for Bug ID 1314405, the WAFL deswizzler scanner should be re-enabled.
- Cluster::>
node run -node [node name] "options wafl.deswizzle.enable on"
Solution
Upgrade to a release of ONTAP where Bug ID 1314405 is fixed:
- For 9.8x, 9.8P20 will be the first release with the fix for Bug ID 1314405
- For 9.9.1x, 9.9.1P17 will be the first release with the fix for Bug ID 1314405
- For 9.10.1x, 9.10.1P14 is the first release with the fix for Bug ID 1314405
- In addition, the fix for Bug ID 1314405 is included in ONTAP 9.11.1x and all subsequent releases.
Additional Information
Public Report
- Bug ID 1314405
The following KB article contains more information:
FabricPool Best Practices Technical Report
Active IQ System Risk Detection:
For customers who have enabled AutoSupport™ on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans.
Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.
***************************************************
*** NETAPP CONFIDENTIAL – FOR LIMITED USE ONLY ***
***************************************************
联想凌拓科技有限公司(“Lenovo NetApp”)不对本页面中提供的任何信息或建议的准确性、可靠性或可维护性,或通过使用这些信息或遵守本文中提供的建议可能获得的任何结果,提供任何陈述或保证。本页面中的信息是按原样分发的,使用这些信息或实施本文中的任何建议或技术是客户的责任,取决于客户评估这些信息并将其整合到客户的运营环境中的能力。本页面及其包含的信息只能与本页面中讨论的 NetApp 产品结合使用。在任何情况下,Lenovo NetApp 均不承担因与使用或执行本页面上提供的信息有关的或导致的任何特殊的、间接的或随之而来的任何损失,或者因使用、数据或利润损失(无论是否在合同履行中)、疏忽或其它侵权行为导致的任何损失。
更多最新信息请参考 NetApp 官网支持公告