Labels

hpunix (63) marathi kavita (52) linux (21) solaris11 (11) AWS (5) numerology (5)
Showing posts with label failed disk. Show all posts
Showing posts with label failed disk. Show all posts

Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

Oracle Exadata is a powerful platform combining compute and storage nodes to deliver high-performance database services. One critical component of Exadata storage cells is the flash disk, which plays a key role in caching and accelerating I/O operations. A failure in flash modules (FMODs) can severely impact performance, making timely replacement essential.

This guide outlines the step-by-step process to identify and replace a failed flash disk in an Exadata cell node.


1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:

#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'

Look for entries indicating poor performance or warning status.


2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run

#cellcli -e list physicaldisk

This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:

FLASH_1_0 15557M04E3N warning - poor performance

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:

#cellcli -e list flashcache detail

Check for degraded cell disks, effective cache size, and disk status.


4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive: 

CellCLI> alter griddisk all inactive


5. Confirm Grid Disk Status

Verify that all grid disks are offline: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


6. Shut Down the Cell Node

Bring down the cell node safely:  

#init 0


7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.


8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status: 

#cellcli -e list physicaldisk

Ensure all disks, including flash disks, show a status of normal.

9. Check Flash Cache Health

Inspect the flash cache again: 

#cellcli -e list flashcache detail


10. Reactivate Grid Disks

Bring the grid disks back online: 

CellCLI> alter griddisk all active


11. Final Verification

Confirm that all grid disks are online: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.


Author: Kiran Jadhav
Principal Consultant | Exadata Admin

Thursday, February 18, 2021

How to enable disk locator on ZFS disk

 How to enable disk locator on ZFS disk:

Suppose there is disk failure on ZFS, we can make disk locator "ON" so the failed disk can be identified easily at the time of disk replacement.

Login to ZFS shell login prompt and run below commands:

==================================================

ZFSC1:> maintenance hardware

ZFSC1:maintenance hardware> list

             NAME         STATE     MANUFACTURER  MODEL                     SERIAL        RPM    TYPE

chassis-003  1645HEN05Y   faulted   Oracle        Oracle Storage DE2-24C    1645HEN05Y    7200   hdd


Here for chasis (chassis-000, chassis-001, chassis-002 etc..) the disk status is showing as 'OK' and for chassis-003 it is showing status as 'faulted' so we can say one of the disk present in chasis-003 could be failed. 

ZFSC1:maintenance hardware> select chassis-003

ZFSC1:maintenance chassis-003> list

                          disk

                           fan

                           psu

                          slot

HBBLMSZFSC1:maintenance chassis-003> select disk

HBBLMSZFSC1:maintenance chassis-003 disk>

HBBLMSZFSC1:maintenance chassis-003 disk> show

Disks:

          LABEL   STATE     MANUFACTURER  MODEL             SERIAL                        RPM    TYPE


disk-000  HDD 0   ok        HGST          H7390A250SUN8.0T  000555PJG4LV        VLJJG4LV  7200   data


disk-001  HDD 1   ok        HGST          H7390A250SUN8.0T  000555PJXALV        VLJJXALV  7200   data


disk-002  HDD 2   ok        HGST          H7390A250SUN8.0T  000555PGGRPV        VLJGGRPV  7200   data


disk-003  HDD 3   faulted   HGST          H7390A250SUN8.0T  000555PGHD0V        VLJGHD0V  7200   data


ZFSC1:maintenance chassis-003 disk> select disk-003

ZFSC1:maintenance chassis-003 disk-003> ls

Properties:

                         label = HDD 3

                       present = true

                       faulted = true

                  manufacturer = HGST

                         model = H7390A250SUN8.0T

                        serial = 000555PGHD0V        VLJGHD0V

                      revision = P9E2

                          size = 7.15T

                          type = data

                           use = data

                           rpm = 7200

                        device = c0t5000CCA2608B17DCd0

                     pathcount = 2

                     interface = SAS

                        locate = false

                       offline = false


ZFSC1:maintenance chassis-003 disk-003> set locate=true

                        locate = true (uncommitted)

ZFSC1:maintenance chassis-003 disk-003> commit

ZFSC1:maintenance chassis-003 disk-003> ls

Properties:

                         label = HDD 3

                       present = true

                       faulted = true

                  manufacturer = HGST

                        locate = true

                       offline = false

ZFSC1:maintenance chassis-003 disk-003>


Regards,

Kiran Jaddhav