Labels

hpunix (63) marathi kavita (52) linux (21) solaris11 (11) AWS (5) numerology (5)
Showing posts with label cell node. Show all posts
Showing posts with label cell node. Show all posts

Wednesday, July 20, 2022

Exadata Patching steps from Beginning

 Exadata Patching steps from start to end: 

* Exadata patching doc number : doc id 888828.1

* Patching sequence: Below patching sequence we have to follow while doing exadata patching:

  1. DB/Grid patching

  2. Cell node patching

  3. Compute node patching

  4. Roce/Ibswitch patching 

Patching steps:

1. Open the doc id 888828.1 in MOS (My Oracle Support) to know the latest (N) or N-1 patch information. Identify the exact patch you want to download

2. Copy the patches to the server ( there will be approx 10 files)

3. After copying patches to the server : 

[root@testJan_2022]# ls -lrth

total 30G

-rw-r--r-- 1 root root 3.1G Apr 22 17:05 p33567288_210000_Linux-x86-64_1of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 17:18 p33567288_210000_Linux-x86-64_3of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 17:27 p33567288_210000_Linux-x86-64_9of10.zip

-rw-r--r-- 1 root root 1.7G Apr 22 17:33 p33567288_210000_Linux-x86-64_10of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 18:18 p33567288_210000_Linux-x86-64_2of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 18:20 p33567288_210000_Linux-x86-64_7of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 18:24 p33567288_210000_Linux-x86-64_5of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 18:58 p33567288_210000_Linux-x86-64_8of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 19:00 p33567288_210000_Linux-x86-64_6of10.zip

-rw-r--r-- 1 root root 3.1G Apr 22 19:06 p33567288_210000_Linux-x86-64_4of10.zip


4. Then unzip all patch files

#unzip p33567288_210000_Linux-x86-64_1of10.zip

.

#unzip p33567288_210000_Linux-x86-64_10of10.zip

or 

#unzip '*.zip'


4. Unzipping files will create tar files something like below:

-rw-r--r-- 1 root root 3.1G Jan 21 15:12 33567288.tar.splitaa

-rw-r--r-- 1 root root 3.1G Jan 21 15:12 33567288.tar.splitab

-rw-r--r-- 1 root root 3.1G Jan 21 15:13 33567288.tar.splitac

-rw-r--r-- 1 root root 3.1G Jan 21 15:13 33567288.tar.splitad


5. Now untar all files to create a common patch repo files:

#cat *.tar.* | tar -xvf -


6. Now unzip patch files from below directory in order to get dbnodeupdate.sh and patchmgr scripts.

#cd /QFSDP/Jan_2022/33567288/Infrastructure/SoftwareMaintenanceTools/DBNodeUpdate/21.211221

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/FabricSwitch

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataStorageServer_InfiniBandSwitch


7. Once DB team confirm that they are done with the DB/GRID patching then we can start cellnode patching.

8. Before doing actual patching we have to raise the prechecks SR in MOS and need to upload necessary logs like sosreport and sundiag from compute nodes and cell nodes, exachk from one of the compute node and the prechecks logs from compute node and cell nodes.

8.a Below is the cell node prechecks commands:

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataStorageServer_InfiniBandSwitch/patch_21.2.8.0.0.220114.1

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -reset_force

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -cleanup

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -patch_check_prereq -rolling -ignore_alerts

Note : If error found in prechecks get it rectified with the help of backend team.

The possible error could be:

https://kiranbjadhav.blogspot.com/2022/05/exadata-cell-node-patching-errorusb.html

8.b If there is no error found in prechecks then we can proceed with the actual patching commands.

Actual patching command :

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataStorageServer_InfiniBandSwitch/patch_21.2.8.0.0.220114.1

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -reset_force

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -cleanup

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -patch -ignore_alerts   

all cells patching will get completed (DB needs to be down here)

If you are doing cellnode patching without DB downtime then 

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -patch -rolling -ignore_alerts   

all cells will get patched in rolling mode one by one.


If you want to do manual cell node patching taking one cell at a time then

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group_1 -patch -rolling -ignore_alerts  ---> 1 cell at a time assuming cell_group_1 has only one cell entry.

#./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group_2 -patch -rolling -ignore_alerts and so on


9. Once cell nodes patching completed successfully then we can start with Compute node patching 

9.a Compute node patching prechecks

#cd /QFSDP/Jan_2022/33567288/Infrastructure/SoftwareMaintenanceTools/DBNodeUpdate/21.211221

#./dbnodeupdate.sh -u -l /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataDatabaseServer_OL7/p33665705_212000_Linux-x86-64.zip -v 

The possible error could be:

https://kiranbjadhav.blogspot.com/2022/05/exadata-compute-node-patching-prechecks.html


9.b Actual patching command: 

Prerequisite:

. Once compute node at a time and 

. DB and CRS must be down on the particular compute node

. NFS mount point should be unmounted

#./dbnodeupdate.sh -u -l /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataDatabaseServer_OL7/p33665705_212000_Linux-x86-64.zip


9.c After successful compute node upgrade run below command to finish post steps.

#./dbnodeupdate.sh -c 


10. After successful compute node patching we can do ROCE switch or IBswitch patching

10.a Roce switch patching prechecks:

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/FabricSwitch/patch_switch_21.2.8.0.0.220114.1

#./patchmgr --roceswitches /roceswitches.lst --upgrade --roceswitch-precheck --log_dir /scratchpad/


10.b Actual patching command:

./patchmgr --roceswitches /roceswitches.lst --upgrade --log_dir /scratchpad/


10.c If ibswitches are there instead of Roce switches

#cd /QFSDP/Jan_2022/33567288/Infrastructure/21.2.8.0.0/ExadataStorageServer_InfiniBandSwitch/patch_21.2.8.0.0.220114.1

  ./patchmgr -ibswitches /opt/oracle.SupportTools/onecommand/ibs_group -upgrade -ibswitch_precheck


10.d Actual patching command:

./patchmgr -ibswitches /opt/oracle.SupportTools/onecommand/ibs_group -upgrade



Regards,

Kiran Jaadhav


Wednesday, July 28, 2021

Flash disk replacement in Exadata

Steps to Replace a Flash Disk in Oracle Exadata

Oracle Exadata is a powerful platform combining compute and storage nodes to deliver high-performance database services. One critical component of Exadata storage cells is the flash disk, which plays a key role in caching and accelerating I/O operations. A failure in flash modules (FMODs) can severely impact performance, making timely replacement essential.

This guide outlines the step-by-step process to identify and replace a failed flash disk in an Exadata cell node.


1. Identify the Faulty Cell Node

Log in to a compute node and run the following command to check the status of physical disks across all cell nodes:

#dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'cellcli -e list physicaldisk'

Look for entries indicating poor performance or warning status.


2. Verify Disk Type on the Affected Cell Node

Once the faulty cell node is identified, log in to it and run

#cellcli -e list physicaldisk

This will help determine whether the issue is with a normal disk or a flash disk. A typical output for a failed flash disk might look like:

FLASH_1_0 15557M04E3N warning - poor performance

3. Inspect Flash Cache Details

To get more information about the degraded flash disk:

#cellcli -e list flashcache detail

Check for degraded cell disks, effective cache size, and disk status.


4. Inactivate Grid Disks

Before shutting down the cell node, make all grid disks inactive: 

CellCLI> alter griddisk all inactive


5. Confirm Grid Disk Status

Verify that all grid disks are offline: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


6. Shut Down the Cell Node

Bring down the cell node safely:  

#init 0


7. Replace the Flash Disk

Hand over the cell node to hardware support for flash disk replacement.


8. Verify Disk Status Post-Replacement

Once the cell node is powered back on, log in and check the disk status: 

#cellcli -e list physicaldisk

Ensure all disks, including flash disks, show a status of normal.

9. Check Flash Cache Health

Inspect the flash cache again: 

#cellcli -e list flashcache detail


10. Reactivate Grid Disks

Bring the grid disks back online: 

CellCLI> alter griddisk all active


11. Final Verification

Confirm that all grid disks are online: 

CellCLI> list griddisk attributes name, asmmodestatus, asmdeactivationoutcome


Conclusion

Replacing a failed flash disk in Exadata requires careful coordination and precise execution to avoid data loss and restore optimal performance. Following these steps ensures a smooth and safe replacement process.


Author: Kiran Jadhav
Principal Consultant | Exadata Admin