Twitter

Exadata: Infiniband Switches ILOM -- stop, start, restart, status

Database nodes and storage servers in an Exadata have a dedicated ILOM running independantly from the OS. Each ILOM is a PCI card adapter (number 12 on the below picture which comes from this oracle documentation, you'll also find in this documentation the description of all the below numbers).







But this is a different story for the Infiniband Switches; indeed, the IB switches have a software ILOM (and not a hardware ILOM) which you access using the spsh command (you'll find an example in this post).

So when this ILOM is hung (which can hang your IB switches patching for example), you can restart it easily; first of all, we can check its status:
[root@exa01sw-ib3 ~]# service ilom status
ILOM stack is partly started with 16 processes.
ILOM daemons that failed to start are :  stdiscoverer  <== something wrong here
However, ILOM stack subsystem is locked..... OK!
[root@exa01sw-ib3 ~]#
And then restart the ILOM:
[root@exa01sw-ib3 ~]# service ilom restart
Stopping ILOM stack
Stopping Servicetags listener: stlistener.
. . .
Running ntpdate...
29 Oct 00:10:25 ntpdate[5508]: step time server 10.55.111.125 offset 0.001666 sec
Starting Servicetags listener: stlistener.
Starting platform_logger
[root@exa01sw-ib3 ~]# service ilom status
ILOM stack is running.
[root@exa01sw-ib3 ~]#
You can also stop/start the ILOM instead of restart; let's do that on another switch:
[root@exa01sw-ib2 ~]# service ilom stop
Stopping ILOM stack
. . .
Stopping capidirect daemon: capidirectd  Done
Created dump file: /coredump/sp_trace/reboot/dump.gz
[root@exa01sw-ib2 ~]# service ilom start
Creating home directories
Updating FW version
. . .
Running ntpdate...
29 Oct 00:06:45 ntpdate[8892]: step time server 10.55.111.125 offset -0.000279 sec
Starting Servicetags listener: stlistener.
Starting platform_logger
[root@exa01sw-ib2 ~]#
That's all folks, another issue easily fixed !

No comments:

Post a Comment

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...