Twitter

Exadata : restart SSH on a storage cell with no SSH access

Few days ago, I was preparing an Exadata upgrade to 18c, I then started by checking the versions of each component and I got an error that the cell storage 2 was unreachable :
[root@exadatadb01]# ./exa-versions.sh

                Cluster is a X5-2 Elastic Rack HC 8TB

           -- Database Servers

   exadatadb01            exadatadb02          exadatadb03      exadatadb04
--------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
--------------------------------------------------------------------------------


Unable to connect to cells: ['exadatacel02']
           -- Cells

   exadatacel01          exadatacel03        exadatacel04        exadatacel05        exadatacel06
----------------------------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
----------------------------------------------------------------------------------------------------


         -- Infiniband Switches

    exadatasw-ib2      exadatasw-ib3
----------------------------------------
       2.2.9-3             2.2.9-3
----------------------------------------

[root@exadatadb01]#

I then tried to ping (with success) and login (with no success):
[root@exadatadb01]# ping exadatacel02
PING exadatacel02 (10.11.12.13) 56(84) bytes of data.
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=1 ttl=64 time=0.169 ms
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=2 ttl=64 time=0.153 ms
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=3 ttl=64 time=0.152 ms
^C
--- exadatacel02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2939ms
rtt min/avg/max/mdev = 0.152/0.158/0.169/0.007 ms
[root@exadatadb01]#  ssh exadatacel02
ssh: connect to host exadatacel02 port 22: Connection refused
[root@exadatadb01]#

It looked like the cell was still up and running (there was no other alert) but SSH was dead. I could have rebooted the cell from the ILOM but it may have been overreacted as the cell seemed to work fine.
Anyhow, I needed to connect to the cell to check what was going on.

But how to connect there with no SSH ?

Let's start by connecting to the cell ILOM (indeed, this is the "closest" to the cell we can go as the cell is not SSHable) :
[root@exadatadb01]# ssh exadatacel02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.2.26.a r123797
Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: exadatacel02-ilom
->

But now, how to get to the cell itself?

Let's start the console to get even closer to the cell itself:
->start /sp/console
Are you sure you want to start /SP/console (y/n)?y
Serial console started.  To stop, type ESC (

And here is the trick : "CTRL + D" will give you the cell prompt to login ! (pressing ENTER as well)
-> ^D
exadatacel02 login: root
Password:
[root@exadatacel02 ~]#

I could then check that SSH was indeed dead:
[root@exadatacel02 ~]# service sshd status
openssh-daemon dead but pid file exists
[root@exadatacel02 ~]#

Also check that all the disks were online and then working properly:
[root@exadatacel02 ~]# cellcli
CellCLI: Release 12.2.1.1.7 - Production on Sun Dec 09 17:17:36 CST 2018
Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.
CellCLI> list griddisk attributes name,asmmodestatus
         DATA_CD_00_exadatacel02      ONLINE
         DATA_CD_01_exadatacel02      ONLINE
         DATA_CD_02_exadatacel02      ONLINE
         DATA_CD_03_exadatacel02      ONLINE
         DATA_CD_04_exadatacel02      ONLINE
         DATA_CD_05_exadatacel02      ONLINE
         DATA_CD_06_exadatacel02      ONLINE
         DATA_CD_07_exadatacel02      ONLINE
         DATA_CD_08_exadatacel02      ONLINE
         DATA_CD_09_exadatacel02      ONLINE
         DATA_CD_10_exadatacel02      ONLINE
         DATA_CD_11_exadatacel02      ONLINE
         DBFS_DG_CD_02_exadatacel02   ONLINE
         DBFS_DG_CD_03_exadatacel02   ONLINE
         DBFS_DG_CD_04_exadatacel02   ONLINE
         DBFS_DG_CD_05_exadatacel02   ONLINE
         DBFS_DG_CD_06_exadatacel02   ONLINE
         DBFS_DG_CD_07_exadatacel02   ONLINE
         DBFS_DG_CD_08_exadatacel02   ONLINE
         DBFS_DG_CD_09_exadatacel02   ONLINE
         DBFS_DG_CD_10_exadatacel02   ONLINE
         DBFS_DG_CD_11_exadatacel02   ONLINE
         RECO_CD_00_exadatacel02      ONLINE
         RECO_CD_01_exadatacel02      ONLINE
         RECO_CD_02_exadatacel02      ONLINE
         RECO_CD_03_exadatacel02      ONLINE
         RECO_CD_04_exadatacel02      ONLINE
         RECO_CD_05_exadatacel02      ONLINE
         RECO_CD_06_exadatacel02      ONLINE
         RECO_CD_07_exadatacel02      ONLINE
         RECO_CD_08_exadatacel02      ONLINE
         RECO_CD_09_exadatacel02      ONLINE
         RECO_CD_10_exadatacel02      ONLINE
         RECO_CD_11_exadatacel02      ONLINE
CellCLI>

And then restart SSH:
[root@exadatacel02 ~]# service sshd status
openssh-daemon dead but pid file exists
[root@exadatacel02 ~]# service sshd start
Starting sshd: [  OK  ]
[root@exadatacel02 ~]# service sshd status
openssh-daemon (pid  16249) is running...
[root@exadatacel02 ~]#

Then, re running my exa-versions script, I could verify that everything was in good shape before starting to upgrade this Exadata to 18c !
[root@exadatadb01]# ./exa-versions.sh

                Cluster is a X5-2 Elastic Rack HC 8TB

           -- Database Servers

      exadatadb01        exadatadb02          exadatadb03         exadatadb04
--------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
--------------------------------------------------------------------------------


           -- Cells

      exadatacel01      exadatacel02         exadatacel03        exadatacel04         exadatacel05      exadatacel06
------------------------------------------------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
------------------------------------------------------------------------------------------------------------------------


         -- Infiniband Switches

    exadatasw-ib2      exadatasw-ib3
----------------------------------------
       2.2.9-3             2.2.9-3
----------------------------------------


[root@exadatadb01]#


Enjoy !

No comments:

Post a Comment

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...