Twitter

Exadata: re-image a Cell Storage Server to 19c (Introduction)

Sometimes, things go wrong, a cell storage crashes, the / filesystem cannot be mounted any more, fsck does not work, restore from a superblock backup does not work either... So here you have to use the cell storage rescue procedure to boot from the USB key and restore the system but the Cell Rescue process may not be effective in such case and indeed it didn't work so last resort is such case is to re-image the cell storage.

1/ Introduction

This happened with a recently upgraded cell to 19.2.2.0.0.190513.2. A summary of the situation is:
  • The cell was no more reachable through SSH, only through the ILOM
  • Here is a status of what was left on the cell:
  • -sh-4.1# df -h
    Filesystem Size    Used    Available Use% Mounted on 
    devtmpfs   47.1G   256.0K   47.1G     0% /dev 
    tmpfs      47.1G    0       47.1G     0% /dev/shm 
    /dev/sr0  124.8M   124.8M    0      100% /mnt/iso 
    -sh-4.1#
    
  • Everything else in the Exadata was working fine, our DATA diskgroup has HIGH redundancy so we are pretty good here even with this missing cell
  • The cell-status.sh script was showing this output which look pretty good except that cel02 was missing:
  • [root@exadata01_db01]# ./cell-status.sh
    
                    Cluster is a X5-2 Elastic Rack HC 8TB
    
         Cell Disks     |         FlashDisk        |         HardDisk         |
                        |   Nb   | Normal | Errors |   Nb   | Normal | Errors |
    ---------------------------------------------------------------------------
       exadata01_cel01  |    4   |    4   |    0   |   12   |   12   |    0   |
       exadata01_cel03  |    4   |    4   |    0   |   12   |   12   |    0   |
       exadata01_cel04  |    4   |    4   |    0   |   12   |   12   |   14   |
       exadata01_cel05  |    4   |    4   |    0   |   12   |   12   |    0   |
    ---------------------------------------------------------------------------
    
         Grid Disks     |           DATA           |          DBFS_DG         |           RECO           |
                        |   Nb   | Online | Errors |   Nb   | Online | Errors |   Nb   | Online | Errors |
    ------------------------------------------------------------------------------------------------------
       exadata01_cel01  |   12   |   12   |    0   |   10   |   10   |    0   |   12   |   12   |    0   |
       exadata01_cel03  |   12   |   12   |    0   |   10   |   10   |    0   |   12   |   12   |    0   |
       exadata01_cel04  |   12   |   12   |   14   |   10   |   10   |    0   |   12   |   12   |    0   |
       exadata01_cel05  |   12   |   12   |    0   |   10   |   10   |    0   |   12   |   12   |    0   |
    ------------------------------------------------------------------------------------------------------
     --  : Unused disks | xx  : Not ONLINE disks   |     : asmDeactivationOutcome is NOT yes
    
    [root@exadata01_db01]#
    
  • The cell versions were good as well except that cel02 was not reachable any more:
  • [root@exadata01_db01]# ./exa-versions.sh -c
    
                    Cluster is a X5-2 Elastic Rack HC 8TB
    
               -- Cells
    
       exadata01_cel01     exadata01_cel03     exadata01_cel04     exadata01_cel05
    --------------------------------------------------------------------------------
     19.2.2.0.0.190513.2 19.2.2.0.0.190513.2 19.2.2.0.0.190513.2 19.2.2.0.0.190513.2
    --------------------------------------------------------------------------------
    [root@exadata01_db01]#
    

2/ Preparation

Before jumping into re-imaging your cell, you need to get some information as the installer will ask you these required information; please find below how to find these information from a surviving cell and / or a database node:
  • DNS servers:
  • [root@exadata01_db01]# grep -i ^nameserver /etc/resolv.conf
    nameserver 10.200.200.4
    nameserver 10.200.200.5
    [root@exadata01_db01]#
    
  • NTP servers: you will find them in the /etc/npt.conf file
  • [root@exadata01_db01]# grep -i ^server /etc/ntp.conf
    server 10.248.1.1 prefer iburst burst minpoll 4 maxpoll 4
    [root@exadata01_db01]#
    
  • Timezone:
  • [root@exadata01_db01]# ls -l /etc/localtime
    lrwxrwxrwx 1 root root 37 Jun 16 23:06 /etc/localtime -> ../usr/share/zoneinfo/America/Chicago
    [root@exadata01_db01]#
    
  • Network configuration information:
  • -- From a database node
    # ping exadata01_cel02                                                       # This is eth0
    # grep cel02 /etc/hosts
    192.168.1.3 exadata01_cel02-priv1.domain.com exadata01_cel02-priv1         # This is ib0
    192.168.1.4 exadata01_cel02-priv2.domain.com exadata01_cel02-priv2         # This is ib1
    # 
    
    - From a surviving cell
    # grep -i gateway /etc/sysconfig/network-scripts/ifcfg-eth0                  # The gateway
    # ifconfig ib0                                                               # To get the netmask and broadcast
    # ifconfig ib1                                                               # To get the netmask and broadcast (same as ib0)
    # ifconfig eth0                                                              # To get the netmask and broadcast
    
    -- So you end with a configuration like this one for the lost cel02:
    ib0        : 192.168.1.3
    ib1        : 192.168.1.4
    netmask    : 255.255.252.0 
    broadcast  : 192.168.1.255
    
    eth0       : 10.1.2.3
    netmask    : 255.255.254.0
    broadcast  : 10.1.2.255
    gateway    : 10.1.2.1
    

3/ Procedure how to:

Now that we got all the information to re-image our lost cel, we need to understand how the procedure will go:
  • 1/ We mount the ISO image of our version on the lost cell ILOM
  • 2/ We boot on it
  • 3/ We fill the required information we got earlier
  • 4/ We add the disks back to ASM
  • 5/ And we are all done !

So we first have to download the good ISO of our version; to achieve that, go to note 888828.1 and look at the Supplemental README for your version:

Go to this Supplemental README and you will find the patch containing the cell image for your version:

Download this image and save it on a jump server which has a network access to the lost cell ILOM.


Once you got the ISO download and on your jump server, go to part 2 to move forward !

Quick links: part 1 / part 2 / part 3

No comments:

Post a Comment

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...