SU543: [Impact High] Shelf firmware repeatedly upgrades, system displays wrong disk names, or SAS shelf environmental status updates delayed after upgrade to ONTAP 9.12.1P4 or 9.13.1
- Views:
- 2,457
- Last Updated:
- 2024/3/30 03:17:23
收藏
Summary
[Impact High]
- A bug introduced in ONTAP 9.12.1P4 and 9.13.1 results in stale information reported to ONTAP regarding SAS storage devices.
- This can manifest itself in a number of ways, including SAS shelf firmware repeatedly being upgraded, replaced drives reporting incorrect disk information, and delayed SAS shelf environmental information updates reported by ONTAP.
- This issue only impacts FAS, AFF, or ASA storage systems with SAS storage devices that are running ONTAP 9.12.1P4 or ONTAP 9.13.1.
- Customers with systems exposed to this issue are advised to upgrade those systems to a release where this issue is fixed (see the Solution section).
This issue is tracked in bug ID 1557006.
Issue Description
Stale SAS environmental shelf data might be processed and displayed by ONTAP versions 9.12.1P4 or 9.13.1. As a result, invalid EMS events might be triggered or shelf maintenance requests might not be properly handled. This can manifest as a number of issues:
- Issue 1: Shelf firmware updates occur every 30 minutes.
- Issue 2: A system displays the disk name as <node_name>:<disk_path> after a disk replacement is performed.
- Issue 3: SAS environmental shelf information updates are delayed in ONTAP 9.12.1P4 and 9.13.1
Symptom
- Issue 1 example:
[node-02: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0a.shelf0 has downrev firmware.
[node-02: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0a.shelf1 has downrev firmware.
[node-02: dsa_sfu: sfu.firmwareDownrev:error]: Disk shelf firmware needs to be updated on 2 disk shelves.
[node-02: dsa_sfu: sfu.downloadStarted:info]: Update of disk shelf firmware started on 2 shelves.
[node-02: dsa_worker1: sfu.ctrllerElmntsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf0.
[node-02: dsa_worker1: sfu.ctrllerElmntsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0b.shelf1.
[node-02: dsa_worker1: sfu.downloadingController:info]: [storage download shelf]: Downloading IOM12E.0250.SFW on disk shelf controller module A on 0b.shelf0.
[node-02: dsa_worker1: sfu.downloadingController:info]: [storage download shelf]: Downloading IOM12A.0310.SFW on disk shelf controller module A on 0b.shelf1.
[node-02: dsa_sfu: sfu.rebootRequest:info]: Issuing a request to reboot disk shelf 0b.shelf0 module A.
[node-02: dsa_sfu: sfu.rebootRequest:info]: Issuing a request to reboot disk shelf 0b.shelf1 module A.
[node-02: dsa_sfu: sfu.adapterSuspendIO.ndu:info]: Suspending SMP to SAS adapter 0b for 35 seconds while shelf firmware is updated.
[node-02: dsa_sfu: sfu.downloadingController:info]: [storage download shelf]: Downloading IOM12E.0250.SFW on disk shelf controller module B on 0a.shelf0.
[node-02: dsa_sfu: sfu.downloadingController:info]: [storage download shelf]: Downloading IOM12A.0310.SFW on disk shelf controller module B on 0a.shelf1.
[node-02: dsa_sfu: sfu.rebootRequest:info]: Issuing a request to reboot disk shelf 0a.shelf0 module B.
[node-02: dsa_sfu: sfu.rebootRequest:info]: Issuing a request to reboot disk shelf 0a.shelf1 module B.
[node-02: dsa_sfu: sfu.adapterSuspendIO.ndu:info]: Suspending SMP to SAS adapter 0a for 35 seconds while shelf firmware is updated.
[node-02: dsa_sfu: sfu.downloadSuccess:info]: [storage download shelf]: Firmware file IOM12A.0310.SFW downloaded on 0a.shelf1.
[node-02: dsa_sfu: sfu.downloadSuccess:info]: [storage download shelf]: Firmware file IOM12E.0250.SFW downloaded on 0a.shelf0.
[node-02: dsa_sfu: sfu.downloadSummary:info]: Shelf firmware updated on 2 shelves.
[node-02: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Thu Jan 1 00:00:00 1970 ( 0+00:00:00.501); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 3-Internal software reset'}
[node-02: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0b.00.99.0, log: (...) 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 3-Internal software reset
[node-02: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': (...) 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 3-Internal software reset'}
[node-02: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0b.02.99.2, log: (...) 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 3-Internal software reset
[node-02: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0b.03.99.3, log: (...) 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 3-Internal software reset
[node-02: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0a.shelf0 has downrev firmware.
[node-02: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0a.shelf1 has downrev firmware. - Issue 2 example:
node02::> disk show -spare -owner node01
Info: This cluster has partitioned disks. To get a complete list of spare disk capacity use "storage aggregate show-spare-disks".
Original Owner: node01
Checksum Compatibility: block
Usable Physical
Disk HA Shelf Bay Chan Pool Type Class RPM Size Size Owner
-------------- -------------- ---- ---- ---- ----------   --- ---- ---- -----
node01:9a.00.17 9a 0 17 A Pool0 SSD solid-state - 13.97TB 13.97TB node01
Device HA SHELF BAY CHAN Disk Vital Product Information
-------------- -------------- ---- ------------------------------
9a.00.17 9a ? ? SA:A 21G0A00LT1JH
The following may also be reported:
Incorrect drive label on shelf 0 bay 17 drive node01:9a.00.17
- Issue 3 example:
Shelf log reports failure immediately (time reported as GMT, equates to 14:07 local time)
Fri Jun 23 05:07:07 2023 (0+02:16:17.705); 030B005B; M0; ENC_MGT; power_manager; 04; PCM 2 faults indicate loss of power (913W)
Fri Jun 23 05:07:07 2023 (0+02:16:17.705); 030B005D; M0; ENC_MGT; power_manager; 04; PCM 2 faults indicate loss of local fan power
However, EMS log reports same failure one hour later (local time reported)
[?] Fri Jun 23 15:00:07 +0900 [ds03n1: dsa_worker3: ses.status.psWarning:error]: DS212-12 (S/N XXXXX245000199) shelf 1 on channel 0b power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
[?] Fri Jun 23 15:00:34 +0900 [ds03n1: dsa_worker3: ses.status.psError:alert]: DS212-12 (S/N XXXXX2245000199) shelf 1 on channel 0b power error for Power supply 2: critical status; power supply error. This module is on the rear of the shelf at the bottom right.
[?] Fri Jun 23 15:00:34 +0900 [ds03n1: dsa_worker3: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
Workaround
- There is no effective workaround.
- A takeover and giveback operation of the affected nodes can temporarily mitigate these issues when seen. However, it will not resolve the underlying problem - subsequent storage issues would still be masked by this bug. To resolve the underlying problem, upgrade to a release where this bug is fixed.
Additional Information
Public Report
- Bug ID 1557006
The following KB article contains more information:
- ONTAP displays disk name incorrectly after disk replacement
- The shelf firmware is repeatedly updating after upgrade to ONTAP 9.12.1P4 or 9.13.1
Active IQ System Risk Detection:
For customers who have enabled AutoSupport™ on their storage systems the Active IQ Portal provides detailed System Risk reports at the customer and site and system levels. The reports show systems that have specific risks as well as severity levels and mitigation action plans.
Important: The purpose of this communication is for NetApp to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution lists are NetApp confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this e-mail distribution list.
***************************************************
*** NETAPP CONFIDENTIAL – FOR LIMITED USE ONLY ***
***************************************************
联想凌拓科技有限公司(“Lenovo NetApp”)不对本页面中提供的任何信息或建议的准确性、可靠性或可维护性,或通过使用这些信息或遵守本文中提供的建议可能获得的任何结果,提供任何陈述或保证。本页面中的信息是按原样分发的,使用这些信息或实施本文中的任何建议或技术是客户的责任,取决于客户评估这些信息并将其整合到客户的运营环境中的能力。本页面及其包含的信息只能与本页面中讨论的 NetApp 产品结合使用。在任何情况下,Lenovo NetApp 均不承担因与使用或执行本页面上提供的信息有关的或导致的任何特殊的、间接的或随之而来的任何损失,或者因使用、数据或利润损失(无论是否在合同履行中)、疏忽或其它侵权行为导致的任何损失。
更多最新信息请参考 NetApp 官网支持公告