Twitter

Exadata : restart SSH on a storage cell with no SSH access

Few days ago, I was preparing an Exadata upgrade to 18c, I then started by checking the versions of each component and I got an error that the cell storage 2 was unreachable :
[root@exadatadb01]# ./exa-versions.sh

                Cluster is a X5-2 Elastic Rack HC 8TB

           -- Database Servers

   exadatadb01            exadatadb02          exadatadb03      exadatadb04
--------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
--------------------------------------------------------------------------------


Unable to connect to cells: ['exadatacel02']
           -- Cells

   exadatacel01          exadatacel03        exadatacel04        exadatacel05        exadatacel06
----------------------------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
----------------------------------------------------------------------------------------------------


         -- Infiniband Switches

    exadatasw-ib2      exadatasw-ib3
----------------------------------------
       2.2.9-3             2.2.9-3
----------------------------------------

[root@exadatadb01]#

I then tried to ping (with success) and login (with no success):
[root@exadatadb01]# ping exadatacel02
PING exadatacel02 (10.11.12.13) 56(84) bytes of data.
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=1 ttl=64 time=0.169 ms
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=2 ttl=64 time=0.153 ms
64 bytes from exadatacel02 (10.11.12.13): icmp_seq=3 ttl=64 time=0.152 ms
^C
--- exadatacel02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2939ms
rtt min/avg/max/mdev = 0.152/0.158/0.169/0.007 ms
[root@exadatadb01]#  ssh exadatacel02
ssh: connect to host exadatacel02 port 22: Connection refused
[root@exadatadb01]#

It looked like the cell was still up and running (there was no other alert) but SSH was dead. I could have rebooted the cell from the ILOM but it may have been overreacted as the cell seemed to work fine.
Anyhow, I needed to connect to the cell to check what was going on.

But how to connect there with no SSH ?

Let's start by connecting to the cell ILOM (indeed, this is the "closest" to the cell we can go as the cell is not SSHable) :
[root@exadatadb01]# ssh exadatacel02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.2.26.a r123797
Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: exadatacel02-ilom
->

But now, how to get to the cell itself?

Let's start the console to get even closer to the cell itself:
->start /sp/console
Are you sure you want to start /SP/console (y/n)?y
Serial console started.  To stop, type ESC (

And here is the trick : "CTRL + D" will give you the cell prompt to login ! (pressing ENTER as well)
-> ^D
exadatacel02 login: root
Password:
[root@exadatacel02 ~]#

I could then check that SSH was indeed dead:
[root@exadatacel02 ~]# service sshd status
openssh-daemon dead but pid file exists
[root@exadatacel02 ~]#

Also check that all the disks were online and then working properly:
[root@exadatacel02 ~]# cellcli
CellCLI: Release 12.2.1.1.7 - Production on Sun Dec 09 17:17:36 CST 2018
Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.
CellCLI> list griddisk attributes name,asmmodestatus
         DATA_CD_00_exadatacel02      ONLINE
         DATA_CD_01_exadatacel02      ONLINE
         DATA_CD_02_exadatacel02      ONLINE
         DATA_CD_03_exadatacel02      ONLINE
         DATA_CD_04_exadatacel02      ONLINE
         DATA_CD_05_exadatacel02      ONLINE
         DATA_CD_06_exadatacel02      ONLINE
         DATA_CD_07_exadatacel02      ONLINE
         DATA_CD_08_exadatacel02      ONLINE
         DATA_CD_09_exadatacel02      ONLINE
         DATA_CD_10_exadatacel02      ONLINE
         DATA_CD_11_exadatacel02      ONLINE
         DBFS_DG_CD_02_exadatacel02   ONLINE
         DBFS_DG_CD_03_exadatacel02   ONLINE
         DBFS_DG_CD_04_exadatacel02   ONLINE
         DBFS_DG_CD_05_exadatacel02   ONLINE
         DBFS_DG_CD_06_exadatacel02   ONLINE
         DBFS_DG_CD_07_exadatacel02   ONLINE
         DBFS_DG_CD_08_exadatacel02   ONLINE
         DBFS_DG_CD_09_exadatacel02   ONLINE
         DBFS_DG_CD_10_exadatacel02   ONLINE
         DBFS_DG_CD_11_exadatacel02   ONLINE
         RECO_CD_00_exadatacel02      ONLINE
         RECO_CD_01_exadatacel02      ONLINE
         RECO_CD_02_exadatacel02      ONLINE
         RECO_CD_03_exadatacel02      ONLINE
         RECO_CD_04_exadatacel02      ONLINE
         RECO_CD_05_exadatacel02      ONLINE
         RECO_CD_06_exadatacel02      ONLINE
         RECO_CD_07_exadatacel02      ONLINE
         RECO_CD_08_exadatacel02      ONLINE
         RECO_CD_09_exadatacel02      ONLINE
         RECO_CD_10_exadatacel02      ONLINE
         RECO_CD_11_exadatacel02      ONLINE
CellCLI>

And then restart SSH:
[root@exadatacel02 ~]# service sshd status
openssh-daemon dead but pid file exists
[root@exadatacel02 ~]# service sshd start
Starting sshd: [  OK  ]
[root@exadatacel02 ~]# service sshd status
openssh-daemon (pid  16249) is running...
[root@exadatacel02 ~]#

Then, re running my exa-versions script, I could verify that everything was in good shape before starting to upgrade this Exadata to 18c !
[root@exadatadb01]# ./exa-versions.sh

                Cluster is a X5-2 Elastic Rack HC 8TB

           -- Database Servers

      exadatadb01        exadatadb02          exadatadb03         exadatadb04
--------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
--------------------------------------------------------------------------------


           -- Cells

      exadatacel01      exadatacel02         exadatacel03        exadatacel04         exadatacel05      exadatacel06
------------------------------------------------------------------------------------------------------------------------
  12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
------------------------------------------------------------------------------------------------------------------------


         -- Infiniband Switches

    exadatasw-ib2      exadatasw-ib3
----------------------------------------
       2.2.9-3             2.2.9-3
----------------------------------------


[root@exadatadb01]#


Enjoy !

No comments:

Post a Comment

OCI: Datapump between 23ai ADB and 19c ADB using database link

Now that we know how to manually create a 23ai ADB in OCI , that we also know how to create a database link between a 23ai ADB and a 19C AD...