1/ Introduction
This happened with a recently upgraded cell to 19.2.2.0.0.190513.2. A summary of the situation is:
Everything else in the Exadata was working fine, our DATA diskgroup has HIGH redundancy so we are pretty good here even with this missing cell
The cell-status.sh script was showing this output which look pretty good except that cel02 was missing:
The cell versions were good as well except that cel02 was not reachable any more:
- The cell was no more reachable through SSH, only through the ILOM
- Here is a status of what was left on the cell:
-sh-4.1# df -h Filesystem Size Used Available Use% Mounted on devtmpfs 47.1G 256.0K 47.1G 0% /dev tmpfs 47.1G 0 47.1G 0% /dev/shm /dev/sr0 124.8M 124.8M 0 100% /mnt/iso -sh-4.1#
[root@exadata01_db01]# ./cell-status.sh Cluster is a X5-2 Elastic Rack HC 8TB Cell Disks | FlashDisk | HardDisk | | Nb | Normal | Errors | Nb | Normal | Errors | --------------------------------------------------------------------------- exadata01_cel01 | 4 | 4 | 0 | 12 | 12 | 0 | exadata01_cel03 | 4 | 4 | 0 | 12 | 12 | 0 | exadata01_cel04 | 4 | 4 | 0 | 12 | 12 | 14 | exadata01_cel05 | 4 | 4 | 0 | 12 | 12 | 0 | --------------------------------------------------------------------------- Grid Disks | DATA | DBFS_DG | RECO | | Nb | Online | Errors | Nb | Online | Errors | Nb | Online | Errors | ------------------------------------------------------------------------------------------------------ exadata01_cel01 | 12 | 12 | 0 | 10 | 10 | 0 | 12 | 12 | 0 | exadata01_cel03 | 12 | 12 | 0 | 10 | 10 | 0 | 12 | 12 | 0 | exadata01_cel04 | 12 | 12 | 14 | 10 | 10 | 0 | 12 | 12 | 0 | exadata01_cel05 | 12 | 12 | 0 | 10 | 10 | 0 | 12 | 12 | 0 | ------------------------------------------------------------------------------------------------------ -- : Unused disks | xx : Not ONLINE disks | : asmDeactivationOutcome is NOT yes [root@exadata01_db01]#
[root@exadata01_db01]# ./exa-versions.sh -c Cluster is a X5-2 Elastic Rack HC 8TB -- Cells exadata01_cel01 exadata01_cel03 exadata01_cel04 exadata01_cel05 -------------------------------------------------------------------------------- 19.2.2.0.0.190513.2 19.2.2.0.0.190513.2 19.2.2.0.0.190513.2 19.2.2.0.0.190513.2 -------------------------------------------------------------------------------- [root@exadata01_db01]#
2/ Preparation
Before jumping into re-imaging your cell, you need to get some information as the installer will ask you these required information; please find below how to find these information from a surviving cell and / or a database node:
NTP servers: you will find them in the /etc/npt.conf file
Timezone:
Network configuration information:
- DNS servers:
[root@exadata01_db01]# grep -i ^nameserver /etc/resolv.conf nameserver 10.200.200.4 nameserver 10.200.200.5 [root@exadata01_db01]#
[root@exadata01_db01]# grep -i ^server /etc/ntp.conf server 10.248.1.1 prefer iburst burst minpoll 4 maxpoll 4 [root@exadata01_db01]#
[root@exadata01_db01]# ls -l /etc/localtime lrwxrwxrwx 1 root root 37 Jun 16 23:06 /etc/localtime -> ../usr/share/zoneinfo/America/Chicago [root@exadata01_db01]#
-- From a database node # ping exadata01_cel02 # This is eth0 # grep cel02 /etc/hosts 192.168.1.3 exadata01_cel02-priv1.domain.com exadata01_cel02-priv1 # This is ib0 192.168.1.4 exadata01_cel02-priv2.domain.com exadata01_cel02-priv2 # This is ib1 # - From a surviving cell # grep -i gateway /etc/sysconfig/network-scripts/ifcfg-eth0 # The gateway # ifconfig ib0 # To get the netmask and broadcast # ifconfig ib1 # To get the netmask and broadcast (same as ib0) # ifconfig eth0 # To get the netmask and broadcast -- So you end with a configuration like this one for the lost cel02: ib0 : 192.168.1.3 ib1 : 192.168.1.4 netmask : 255.255.252.0 broadcast : 192.168.1.255 eth0 : 10.1.2.3 netmask : 255.255.254.0 broadcast : 10.1.2.255 gateway : 10.1.2.1
3/ Procedure how to:
Now that we got all the information to re-image our lost cel, we need to understand how the procedure will go:
So we first have to download the good ISO of our version; to achieve that, go to note 888828.1 and look at the Supplemental README for your version:
Go to this Supplemental README and you will find the patch containing the cell image for your version:
Download this image and save it on a jump server which has a network access to the lost cell ILOM.
- 1/ We mount the ISO image of our version on the lost cell ILOM
- 2/ We boot on it
- 3/ We fill the required information we got earlier
- 4/ We add the disks back to ASM
- 5/ And we are all done !
So we first have to download the good ISO of our version; to achieve that, go to note 888828.1 and look at the Supplemental README for your version:
Go to this Supplemental README and you will find the patch containing the cell image for your version:
Download this image and save it on a jump server which has a network access to the lost cell ILOM.
Once you got the ISO download and on your jump server, go to part 2 to move forward !
No comments:
Post a Comment