An Unknown DBA blog: cell-status.sh: An overview of your Exadata cell and grid disks

cell-status.sh: An overview of your Exadata cell and grid disks

You may already know rac-status.sh which gives you on overview of your RAC/GI resources in a glimpse; here is now cell-status.sh which gives you a status of the cell disks and the grid disks of an Exadata !
This appears to be very useful on a daily basis, let check how it looks srtaight away !

A sample example:

Let's describe this first screenshot:

On top of the output, and as usual on all my Exadata scripts, the Exadata model is shown
The the first table shows the cell disks with the first column listing the analyzed cells
Then 2 sets of 3 columns, one for the FlashDisks and one for the HardDisks; each one containing 3 columns:

Nb : number of disks
Normal : number of cell disks with the status "Normal"; it appears in red if the number of "Normal" disks is less than the number of disks -- you will then quickly see if some disks are not back online after a patch for example
Errors: Number of errors on the cell disks

A second table showing the status of the grid disks with the first column listing the analyzed cells
1 set of column per diskgroup showing 3 columns per diskgroup:

Nb : number of disks
Online: number of "Online" grid cell disks; a red xx is shown if some disks are Offline -- you will then quickly see if you have any issue with your configuration
Errors: Number of errors on the grid disks

A legend under the tables which is self explanatory.

An Extreme Flash configuration

In an Extreme Flash configuration, you will indeed have 1 set of column in the cell disks table as it has only Flash Disks.

An asmDeactivationOutcome issue

As having an asmDeactivationOutcome parameter not to Yes is really something you don't want, the script will show this with a red background as you can see on the above screenshot. You can then quickly spot any problem related to this to investigate and fix this ASAP.

List of failed disks (-v option)

The above outputs are cool but you may also want to know which disks have issues; this is the purpose of the -v option which will add after the table a detail of the failed cell disks and grid disks like the below one:

Users

To fit the needs to every different configuration you can have, I made the way of executing this script and of connecting to the cells flexible knowing that the user who executes the script must have SSH passwordless connectivity to the cells; here is how it works:

If cell-status.sh is executed as root, then root is used to connect to the cells (it is defined on top of the script by USER="root")
If cell-status.sh is executed as a non root user, then cellmonitor is used to connect to the cells (it is defined on top of the script by NONROOTUSER="cellmonitor")
You can change this behavior by forcing the use of a specific user with the -u option

List of cells

The list of cells to report about can also be customized as below:

If cell-status.sh is executed as root, it uses ibhosts to build the list of cells to connect to
If cell-status.sh is executed as a non root user, it uses the databasemachine.xml file to build the list of cells to connect to
You can also specify a specific list of cells to analyze using the -c option and a cell_group file

Option -h for help

Feel free to check the help using the -h option:

[root@exadb01]# ./cell-status.sh -h

When patching

This script is very useful when patching Exadata and it is now fully integrated in my Exadtaa patching procedure.
I then check the status of the cells during the pre-requisites phase to be sure I'm gonna patch an healthy systen as well as before patching the cells:

[root@exadb01]# ./cell-status.sh -h | tee -a cell_status_before_patching

I then re check this status after having the cells patched:

[root@exadb01]# ./cell-status.sh -h | tee -a cell_status_after_patching

And a simple diff would give me any issue that could have happended during the cells patching like if some disks are not properly back online:

[root@exadb01]# diff cell_status_before_patching cell_status_after_patching

The code

You can download the code from my github repo.

Enjoy !

15 comments:

RanjanMay 15, 2019 at 11:43 PM
As usual, you always rocks with new scripts for Exadata.
Looks like the error showing on your script is false or something i am missing.
When i checked the flashdisk status, could not find any error.

Cell Disks | HardDisk | FlashDisk |
| Nb | Normal | Errors | Nb | Normal | Errors |
---------------------------------------------------------------------------
Exalabcel1 | 12 | 12 | 0 | 16 | 16 | 0 |
Exalabcel2 | 12 | 12 | 0 | 16 | 16 | 0 |
Exalabcel3 | 12 | 12 | 0 | 16 | 16 | 0 |
Exalabcel4 | 12 | 12 | 0 | 4 | 4 | 0 |
Exalabcel5 | 12 | 12 | 0 | 4 | 4 | 130 |
Exalabcel6 | 12 | 12 | 0 | 4 | 4 | 0 |
---------------------------------------------------------------------------

Failed Cell Disks details
Cell | Name | Status | Size | Nb_Error | Disktype |
---------------------------------------------------------------------------------------------
Exalabcel5 | FD_03_Exalabcel5 | normal | 1.455474853515625T | 130 | FlashDisk |
---------------------------------------------------------------------------------------------
ReplyDelete
Replies
unknowndbaMay 16, 2019 at 5:58 AM
This comment has been removed by the author.
ReplyDelete
Replies
arshad.alicsJuly 15, 2021 at 5:03 PM
Hi,

I am getting this error while executing on X8M. Help to resolve.

ibwarn: [358611] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:784; can't open MAD port ((null):0)
Error: No cells specified.

Cluster is a X8M-2 Eighth Rack HC 14TB

Cell Disks |
|
---------------------
---------------------

ReplyDelete
Replies
unknowndbaJuly 15, 2021 at 5:13 PM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousJune 8, 2022 at 8:42 AM
Hi, can be this script adapted to ExaC@C Gen2? Thanks and regards.
ReplyDelete
Replies

An Unknown DBA blog

Twitter