Labels

hpunix (63) marathi kavita (52) linux (21) solaris11 (11) AWS (5) numerology (5)

Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

Oracle Exadata is a powerful platform combining compute and storage nodes to deliver high-performance database services. One critical component of Exadata storage cells is the flash disk, which plays a key role in caching and accelerating I/O operations. A failure in flash modules (FMODs) can severely impact performance, making timely replacement essential.

This guide outlines the step-by-step process to identify and replace a failed flash disk in an Exadata cell node.


1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:

#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'

Look for entries indicating poor performance or warning status.


2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run

#cellcli -e list physicaldisk

This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:

FLASH_1_0 15557M04E3N warning - poor performance

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:

#cellcli -e list flashcache detail

Check for degraded cell disks, effective cache size, and disk status.


4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive: 

CellCLI> alter griddisk all inactive


5. Confirm Grid Disk Status

Verify that all grid disks are offline: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


6. Shut Down the Cell Node

Bring down the cell node safely:  

#init 0


7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.


8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status: 

#cellcli -e list physicaldisk

Ensure all disks, including flash disks, show a status of normal.

9. Check Flash Cache Health

Inspect the flash cache again: 

#cellcli -e list flashcache detail


10. Reactivate Grid Disks

Bring the grid disks back online: 

CellCLI> alter griddisk all active


11. Final Verification

Confirm that all grid disks are online: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.


Author: Kiran Jadhav
Principal Consultant | Exadata Admin

Sunday, July 18, 2021

How to run sundiag on multiple cell nodes - exadata or SSC

 How to run sundiag on multiple cell nodes - exadata or SSC:

What is sundiag:

sundiag is Oracle Exadata Database Machine - Diagnostics Collection Tool which collects diagnostics information which help the support analyst in diagnosing problem such as failed hardware like a failed disk, etc.

In Exadata box or solaris supercluster (SSC) we may have multiple storage cell nodes attached. 

If we have 10-12 storage cells nodes then instead of login to each and every cells and collecting sundiag will be a time consuming task. By below one command we can run sundiag on multiple servers (passwordless ssh should be there from the compute node to the cell nodes).

1. on Solaris super cluster:

#dcli -g /opt/oracle.supercluster/bin/cell_group -l root /opt/oracle.SupportTools/sundiag.sh

where # cat /opt/oracle.supercluster/bin/cell_group  --> will list number of cell nodes attached to the SSC machine


2. on Exadata servers:

#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root /opt/oracle.SupportTools/sundiag.sh

where # cat /opt/oracle.SupportTools/onecommand/cell_group  --> will list number of cell nodes attached to the Exadata machine


Thank U

- Kiiran B Jaadhav