Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

Oracle Exadata is a powerful platform combining compute and storage nodes to deliver high-performance database services. One critical component of Exadata storage cells is the flash disk, which plays a key role in caching and accelerating I/O operations. A failure in flash modules (FMODs) can severely impact performance, making timely replacement essential.
This guide outlines the step-by-step process to identify and replace a failed flash disk in an Exadata cell node.

1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:
#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'
Look for entries indicating poor performance or warning status.

2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run

#cellcli -e list physicaldisk

This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:

FLASH_1_0 15557M04E3N warning - poor performance

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:
#cellcli -e list flashcache detail
Check for degraded cell disks, effective cache size, and disk status.

4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive:
CellCLI> alter griddisk all inactive

5. Confirm Grid Disk Status

Verify that all grid disks are offline:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

6. Shut Down the Cell Node

Bring down the cell node safely:
#init 0

7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.

8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status:
#cellcli -e list physicaldisk
Ensure all disks, including flash disks, show a status of `normal`.

9. Check Flash Cache Health

Inspect the flash cache again:
#cellcli -e list flashcache detail

10. Reactivate Grid Disks

Bring the grid disks back online:
CellCLI> alter griddisk all active

11. Final Verification

Confirm that all grid disks are online:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.
Author: Kiran Jadhav
Principal Consultant | Exadata Admin

Thursday, February 18, 2021

How to enable disk locator on ZFS disk

How to enable disk locator on ZFS disk:

Suppose there is disk failure on ZFS, we can make disk locator "ON" so the failed disk can be identified easily at the time of disk replacement.

==================================================

ZFSC1:> maintenance hardware

ZFSC1:maintenance hardware> list

NAME STATE MANUFACTURER MODEL SERIAL RPM TYPE

chassis-003 1645HEN05Y faulted Oracle Oracle Storage DE2-24C 1645HEN05Y 7200 hdd

Here for chasis (chassis-000, chassis-001, chassis-002 etc..) the disk status is showing as 'OK' and for chassis-003 it is showing status as 'faulted' so we can say one of the disk present in chasis-003 could be failed.

ZFSC1:maintenance hardware> select chassis-003

ZFSC1:maintenance chassis-003> list

disk

fan

psu

slot

HBBLMSZFSC1:maintenance chassis-003> select disk

HBBLMSZFSC1:maintenance chassis-003 disk>

HBBLMSZFSC1:maintenance chassis-003 disk> show

Disks:

LABEL STATE MANUFACTURER MODEL SERIAL RPM TYPE

disk-000 HDD 0 ok HGST H7390A250SUN8.0T 000555PJG4LV VLJJG4LV 7200 data

disk-001 HDD 1 ok HGST H7390A250SUN8.0T 000555PJXALV VLJJXALV 7200 data

disk-002 HDD 2 ok HGST H7390A250SUN8.0T 000555PGGRPV VLJGGRPV 7200 data

disk-003 HDD 3 faulted HGST H7390A250SUN8.0T 000555PGHD0V VLJGHD0V 7200 data

ZFSC1:maintenance chassis-003 disk> select disk-003

ZFSC1:maintenance chassis-003 disk-003> ls

Properties:

label = HDD 3

present = true

faulted = true

manufacturer = HGST

model = H7390A250SUN8.0T

serial = 000555PGHD0V VLJGHD0V

revision = P9E2

size = 7.15T

type = data

use = data

rpm = 7200

device = c0t5000CCA2608B17DCd0

pathcount = 2

interface = SAS

locate = false

offline = false

ZFSC1:maintenance chassis-003 disk-003> set locate=true

locate = true (uncommitted)

ZFSC1:maintenance chassis-003 disk-003> commit

ZFSC1:maintenance chassis-003 disk-003> ls

Properties:

label = HDD 3

present = true

faulted = true

manufacturer = HGST

locate = true

offline = false

ZFSC1:maintenance chassis-003 disk-003>

Regards,

Kiran Jaddhav

Kiren Jadhav

Labels

Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:
#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'
Look for entries indicating poor performance or warning status.

2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run
#cellcli -e list physicaldisk
This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:
`FLASH_1_0 15557M04E3N warning - poor performance`

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:
#cellcli -e list flashcache detail
Check for degraded cell disks, effective cache size, and disk status.

4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive:
CellCLI> alter griddisk all inactive

5. Confirm Grid Disk Status

Verify that all grid disks are offline:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

6. Shut Down the Cell Node

Bring down the cell node safely:
#init 0

7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.

8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status:
#cellcli -e list physicaldisk
Ensure all disks, including flash disks, show a status of `normal`.

9. Check Flash Cache Health

Inspect the flash cache again:
#cellcli -e list flashcache detail

10. Reactivate Grid Disks

Bring the grid disks back online:
CellCLI> alter griddisk all active

11. Final Verification

Confirm that all grid disks are online:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.
Author: Kiran Jadhav
Principal Consultant | Exadata Admin

Thursday, February 18, 2021

How to enable disk locator on ZFS disk

Popular Posts

Total Pageviews

BlogAdda Rating

Visitor

Labels

Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'Look for entries indicating poor performance or warning status.

2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run#cellcli -e list physicaldiskThis will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:FLASH_1_0 15557M04E3N warning - poor performance

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:#cellcli -e list flashcache detailCheck for degraded cell disks, effective cache size, and disk status.

4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive: CellCLI> alter griddisk all inactive

5. Confirm Grid Disk Status

Verify that all grid disks are offline: CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

6. Shut Down the Cell Node

Bring down the cell node safely: #init 0

7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.

8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status: #cellcli -e list physicaldiskEnsure all disks, including flash disks, show a status of normal.

9. Check Flash Cache Health

Inspect the flash cache again: #cellcli -e list flashcache detail

10. Reactivate Grid Disks

Bring the grid disks back online: CellCLI> alter griddisk all active

11. Final Verification

Confirm that all grid disks are online: CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.Author: Kiran JadhavPrincipal Consultant | Exadata Admin

Thursday, February 18, 2021

How to enable disk locator on ZFS disk

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:
#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'
Look for entries indicating poor performance or warning status.

Once the faulty cell node is identified, log in to it and run
#cellcli -e list physicaldisk
This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:
`FLASH_1_0 15557M04E3N warning - poor performance`

To get more information about the degraded flash disk:
#cellcli -e list flashcache detail
Check for degraded cell disks, effective cache size, and disk status.

Before shutting down the cell node, make all grid disks inactive:
CellCLI> alter griddisk all inactive

Verify that all grid disks are offline:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

Bring down the cell node safely:
#init 0

Once the cell node is powered back on, log in and check the disk status:
#cellcli -e list physicaldisk
Ensure all disks, including flash disks, show a status of `normal`.

Inspect the flash cache again:
#cellcli -e list flashcache detail

Bring the grid disks back online:
CellCLI> alter griddisk all active

Confirm that all grid disks are online:
CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.
Author: Kiran Jadhav
Principal Consultant | Exadata Admin