Twitter

Showing posts with label patching. Show all posts
Showing posts with label patching. Show all posts

Grid Infrastructure Out of Place Patching (aka GI OOP)

Out of Place patching has become the standard for database patching for years now (I have described it precisely here) but for any reason, people restrain themselves for doing Out of Place patching for Grid Infrastructure and usually do In Place GI patching and Out of Place GI upgrade (you cannot do In Place upgrade :)). I will describe below how to easily perform GI OOP.

To start on the right foot, a quick reminder of the concept and the required steps of an Out of Place patching:
  1. Your system is running on a source version home let's say /u01/app/19.0.0.0/grid
  2. You prepare the future alread patched target home let's say /u01/app/19.11.0.0/grid
  3. The day of the maintenance, you stop what is running on the source home and restart on the target home
  4. If, for any reason, something goes wrong, you just have to restart everything on the source home
This can be represented with the below image:






For the purpose of this blog, I will use the below homes in the examples:
  • The source GI home: /u01/app/19.0.0.0/grid
  • The target GI Home: /u01/app/19.11.0.0/grid

1. Prepare your target home

Preparing the target home is to prepare a GI home with the patches you will want to use; here, I will go with GI 19.11 with the latest opatch, the latest GI JDK and patch 31602782. To achieve this, you can clone a source GI Home or, what I prefer and recommend, to create a gold image of your target home. Oracle has/had a note with a list of already prepared gold image per version but this note has kind of disappeared recently so I gave up on that one. Also, building your own gold image is easy and very good to know how all of that works. To build my target GI 19.11 gold image, you first need to get:
  • The base GI 19c version which is 19.3: GI_gold_193_V982068-01.zip -- from edelivery.oracle.com
  • The GI 19.11 patch: GI_1911_p32545008_190000_Linux-x86-64.zip
  • The latest opatch: opatch_p6880880_122010_Linux-x86-64.zip
  • The latest GIJDK: GIJDK_April2021_p32490416_190000_Linux-x86-64.zip (this one is no more the latest but it was the latest when I did this gold image)
  • The patch 31602782: p31602782_1911000DBRU_Linux-x86-64.zip
Note: you may want to have a look at this blog to get the notes where to download GI, the critical patches, GI JDK, etc... Note 2: you do not have to apply the latest GI JDK, it is to show that you can apply any one-off patch on top of the RUR in your target gold image -- GI tends to have many critical issues and then patches so better to be know how to deal with it. Here is what it looks like once you have the files on your server:
[root@target gioop]# pwd
/u01/stage/gioop
[root@target gioop]# ls -ltr
GI_1911_p32545008_190000_Linux-x86-64.zip               <= GI 19.11 patch
GIJDK_April2021_p32490416_190000_Linux-x86-64.zip       <= April JDK
opatch_p6880880_122010_Linux-x86-64.zip                 <= Latest opatch
GI_gold_193_V982068-01.zip                              <= GI gold image
p31602782_1911000DBRU_Linux-x86-64.zip                  <= Patch 31602782
[root@target gioop]#
Unzip the 19.3 gold image:
[root@target gioop]# mkdir temp
[root@target gioop]# unzip -q GI_gold_193_V982068-01.zip -d temp/.
[root@target gioop]#
Unzip the GI JDK and the 31602782 patch (any number of one-off patches):
[root@target gioop]# unzip -o -q GIJDK_April2021_p32490416_190000_Linux-x86-64.zip
[root@target gioop]# unzip -o -q p31602782_1911000DBRU_Linux-x86-64.zip
[root@target gioop]#
Very importantly, all needs to be done as the oracle (grid owner) user (not root) so give the correct permissions and you should have the below situation:
[root@target gioop]# chown -R oracle:oinstall /u01/stage/gioop
[root@target gioop]# ls -ltr
oracle oinstall       4096 Apr 20 07:17 32545008                                           <= GI 19.11 patch
oracle oinstall       2477 Apr 22 16:16 PatchSearch.xml
oracle oinstall 2523672126 May  7 11:28 GI_1911_p32545008_190000_Linux-x86-64.zip
oracle oinstall  125203135 May  7 11:28 GIJDK_April2021_p32490416_190000_Linux-x86-64.zip
oracle oinstall  120761121 May  7 11:28 opatch_p6880880_122010_Linux-x86-64.zip
oracle oinstall 2889184573 May  7 12:26 GI_gold_193_V982068-01.zip
oracle oinstall       4096 May  7 12:28 temp                                               <= GI gold image 
oracle oinstall       4096 May  7 12:33 32490416                                           <= GI JDK
oracle oinstall       4096 Apr 25 21:04 31602782                                           <= Patch 31602782
[root@target gioop]#
Start by upgrading opatch to the latest version:
[root@target gioop]# su - oracle
[oracle@target:]/home/oracle => cd /u01/stage/gioop/temp
[oracle@target:]/u01/stage/gioop/temp => ./OPatch/opatch version
OPatch Version: 12.2.0.1.17
OPatch succeeded.
[oracle@target:]/u01/stage/gioop/temp => unzip -o -q ../opatch_p6880880_122010_Linux-x86-64.zip
[oracle@target:]/u01/stage/gioop/temp =>./OPatch/opatch version
OPatch Version: 12.2.0.1.24
OPatch succeeded.
[oracle@target:]/u01/stage/gioop/temp =>
We can now patch the gold image with GI 19.11, the GI JDK and the patch 31602782; we can do all of this in a single command line:
[oracle@target:]/u01/stage/gioop/temp => ./gridSetup.sh -silent -printtime -waitForCompletion -noCopy -applyRU /u01/stage/gioop/32545008 -applyOneOffs /u01/stage/gioop/31602782,/u01/stage/gioop/32490416
Preparing the home to patch...
Applying the patch /u01/stage/gioop/32545008...
Successfully applied the patch.
Applying the patch /u01/stage/gioop/31602782...
Successfully applied the patch.
Applying the patch /u01/stage/gioop/32490416...
Successfully applied the patch.
The log can be found at: /u01/app/oraInventory/logs/GridSetupActions2021-05-07_01-12-59PM/installerPatchActions_2021-05-07_01-12-59PM.log
Launching Oracle Grid Infrastructure Setup Wizard...
[FATAL] [INS-40426] Grid installation option has not been specified.       <== you can ignore this error
   ACTION: Specify the valid installation option.
[oracle@target:]/u01/stage/gioop/temp =>
Before continuing, we need to temporarely attach the home to the system:
[oracle@target:]/home/oracle => /u01/app/19.0.0.0/grid/oui/bin/runInstaller -attachHome ORACLE_HOME=/u01/stage/gioop/temp ORACLE_HOME_NAME=gold_gi1911
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 24575 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/AttachHome2021-05-07_01-53-14PM.log
'AttachHome' was successful.
[oracle@target:]/home/oracle =>
Note that you have to attach the home to be able to use opatch and create a gold image but you cannot apply the RU nor the one-off patches if the home is attached:
[oracle@target:]/u01/stage/gioop/temp => ./gridSetup.sh -silent -printtime -waitForCompletion -noCopy -applyRU /u01/stage/gioop/32545008 -applyOneOffs /u01/stage/gioop/31602782,/u01/stage/gioop/32490416
[INS-32826] The software home (/u01/stage/gioop/temp) is already registered in the central inventory. Refer to patch readme instructions on how to apply.
[oracle@target:]/u01/stage/gioop/temp =>
We now have a prepared target home with our target version located in a temporary directory. We can verify the list of patch of our home:
[oracle@target:]/u01/stage/gioop/temp => ./OPatch/opatch lspatches -oh /u01/stage/gioop/temp
31602782;SAME INSTANCE SLAVE PARSE FAILURE FLOOD CONTROL
32490416;JDK BUNDLE PATCH 19.0.0.0.210420
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32579761;OCW RELEASE UPDATE 19.11.0.0.0 (32579761)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.
[oracle@target:]/u01/stage/gioop/temp =>
We will now create our own gold image which we could easily copy and deploy on all the other systems (dev, qa, dr, prod, etc ...):
[oracle@target:]/u01/stage/gioop/temp =>  ./gridSetup.sh -silent -createGoldImage -destinationLocation /u01/stage/gioop/
Launching Oracle Grid Infrastructure Setup Wizard...
Successfully Setup Software.
Gold Image location: /u01/stage/gioop/grid_home_2021-05-07_01-59-02PM.zip
[oracle@target:]/u01/stage/gioop/temp =>
You can now save the prepared gold image /u01/stage/gioop/grid_home_2021-05-07_01-59-02PM.zip on a central repository server as this is the image you will be usng on all your systems -- this is your future GI !

To keep your systems clean, let's detach the temporary home:
[oracle@target:]/u01/stage/gioop/temp => /u01/app/19.0.0.0/grid/oui/bin/runInstaller -detachHome ORACLE_HOME=/u01/stage/gioop/temp ORACLE_HOME_NAME=gold_gi1911
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 24575 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
[oracle@target:]/u01/stage/gioop/temp =>


2.Switch the home

Create the target GI directory on all the servers
[root@exadb01 ~]# cat ~/dbs_group
exadb01
exadb02
. . .
exadb08
[root@exadb01 ~]# dcli -g ~/dbs_group -l root "df -h /u01"
Filesystem                    Size  Used Avail Use% Mounted on
exad01: /dev/mapper/VGExaDb-LVDbOra1  250G  125G  125G  50% /u01          <= check that you have enough disk space on each node
. . .
[root@exadb01 ~]# dcli -g ~/dbs_group -l root "mkdir -p /u01/app/19.11.0.0/grid; chown -R oracle:oinstall /u01/app/19.11.0.0/grid"
[root@exadb01 ~]# 
Unzip the previously prepared goldimage (only on one node !)
[oracle@exadb01:]/home/oracle => unzip -q /u01/stage/gioop/GI_gold_1911_2021-05-07_01-59-02PM.zip -d /u01/app/19.11.0.0/grid
[oracle@exadb01:]/home/oracle => dcli -g ~/dbs_group -l oracle "du -sh /u01/app/19.11.0.0/grid"
exadb01: 9.9G      /u01/app/19.11.0.0/grid      <== your gold image unzipped here 
exadb02: 4.0K      /u01/app/19.11.0.0/grid      <== empty directory here
. . .
exadb08: 4.0K      /u01/app/19.11.0.0/grid<     <== empty directory here
[oracle@exadb01:]/home/oracle =>
Something important here to aboid issues during the patch process; verify that the ASM passwordfile and the ASM spfile is located under ASM (if not, you'll find a quick procedure here on how to move them to ASM):
[root@exadb01 ~]# . oraenv <<< +ASM1
ORACLE_SID = [root] ? The Oracle base has been set to /u01/app/oracle
[root@exadb01 ~]# asmcmd spget
+DATA/mycluster/ASMPARAMETERFILE/registry.253.1045914043
[root@exadb01 ~]# asmcmd pwget --asm
+DATA/orapwASM
[root@exadb01 ~]#
Prepare a responsefile such as this one:
[oracle@exadb01:+ASM1]/home/oracle => cat /u01/stage/gioop/1911oop_response.rsp
oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v19.0.0
oracle.install.option=CRS_SWONLY
ORACLE_BASE=/u01/app/oracle
oracle.install.asm.OSDBA=oinstall
oracle.install.asm.OSOPER=oinstall
oracle.install.asm.OSASM=oinstall
oracle.install.crs.config.ClusterConfiguration=STANDALONE
[oracle@exadb01:+ASM1]/home/oracle =>
Gridsetup, this will only copy the software across all the nodes, this will NOT modify anything else
[oracle@exadb01:]/u01/app/19.11.0.0/grid => ./gridSetup.sh -silent -responseFile /u01/stage/gioop/1911oop_response.rsp -waitForCompletion
Launching Oracle Grid Infrastructure Setup Wizard...

[WARNING] [INS-41813] OSDBA for ASM, OSOPER for ASM, and OSASM are the same OS group.
   CAUSE: The group you selected for granting the OSDBA for ASM group for database access, and the OSOPER for ASM group for startup and shutdown of Oracle ASM, is the same group as the OSASM group, whose members have SYSASM privileges on Oracle ASM.
   ACTION: Choose different groups as the OSASM, OSDBA for ASM, and OSOPER for ASM groups.
[WARNING] [INS-41874] Oracle ASM Administrator (OSASM) Group specified is same as the inventory group.
   CAUSE: Operating system group oinstall specified for OSASM Group is same as the inventory group.
   ACTION: It is not recommended to have OSASM group same as inventory group. Select any of the group other than the inventory group to avoid incorrect configuration.
The response file for this session can be found at:
 /u01/app/19.11.0.0/grid/install/response/grid_2021-05-10_10-54-29AM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/GridSetupActions2021-05-10_10-54-29AM/gridSetupActions2021-05-10_10-54-29AM.log

As a root user, execute the following script(s):
        1. /u01/app/19.11.0.0/grid/root.sh

Execute /u01/app/19.11.0.0/grid/root.sh on the following nodes:
[exadb01]
As instructed, run this root.sh script:
[root@exadb01 ~]# /u01/app/19.11.0.0/grid/root.sh
Check /u01/app/19.11.0.0/grid/install/root_exadb01.domain.com_2021-05-10_11-03-09-927603750.log for the output of root script
[root@exadb01 ~]# cat /u01/app/19.11.0.0/grid/install/root_exadb01.domain.com_2021-05-10_11-03-09-927603750.log
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/19.11.0.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Cluster or Grid Infrastructure for a Stand-Alone Server execute the following command as oracle user:
/u01/app/19.11.0.0/grid/gridSetup.sh
This command launches the Grid Infrastructure Setup Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

[root@exadb01 ~]#
OK, this was the last step to be done before the real maintenance, the next steps are do be done under a window maintenance only as the GI will be switched to the new home node by node stopping all the resources running on the old GI home and restarting all the resources on the new GI home. I will recommend using the rac-status.sh script to check the status of all the resources of the cluster before switching the home -- and do the same after the home switching to ensure that your maintenance is idempotent:
[oracle@exadb01:]/home/oracle => /u01/app/19.11.0.0/grid/gridSetup.sh -silent -switchGridHome
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2021-05-10_11-05-43AM.log

As a root user, execute the following script(s):
        1. /u01/app/19.11.0.0/grid/root.sh

Execute /u01/app/19.11.0.0/grid/root.sh on the following nodes:
[exadb01, exadb02, exadb03, exadb04, exadb05, exadb06, exadb07, exadb08]

Run the scripts on the local node first. After successful completion, run the scripts in sequence on all other nodes.

Successfully Setup Software.
[oracle@exadb01:]/home/oracle =>
Now, strictly follow the instructions and run the root.sh scripts as instructed; do NOT run them concurrently on multiple nodes; note that they will take time to run:
[root@exadb01 ~]# /u01/app/19.11.0.0/grid/root.sh
Check /u01/app/19.11.0.0/grid/install/root_exadb01.domain.com_2021-05-10_11-11-21-158300990.log for the output of root script
[root@exadb01 ~]#
And so on on all the nodes one by one ... and you are done ! hmm not exactly you need to update your /etc/oratab (on each node) as the ASM entry will be removed by the patching:
[root@exadb01 ~]# grep ASM /etc/oratab
+ASM1:/u01/app/19.11.0.0/grid:N
You can have a look at the inventory and you could see the old and new GI Home as below:
[root@exadb01 ~]# dcli -g ~/dbs_group -l root "grep -i grid /u01/app/oraInventory/ContentsXML/inventory.xml"
exadb01: <HOME NAME="OraGI19Home1" LOC="/u01/app/19.0.0.0/grid" TYPE="O" IDX="1">                     <== old
exadb01: <HOME NAME="OraGI19Home2" LOC="/u01/app/19.11.0.0/grid" TYPE="O" IDX="19" CRS="true"/>       <== new
. . .
exadb08: <HOME NAME="OraGI19Home1" LOC="/u01/app/19.0.0.0/grid" TYPE="O" IDX="1">                     <== old
exadb08: <HOME NAME="OraGI19Home2" LOC="/u01/app/19.11.0.0/grid" TYPE="O" IDX="14" CRS="true"/>       <== new
[root@exadb01 ~]#
Now you are all done ! a last check with rac-status.sh to ensure that everything is running as expected and you can use the same gold image and procedure to all your GIs !

3. The Rollback procedure

In case of something goes wrong during or after you have switched to your new home, you need to have a tested rollback procedure and the beauty of Out of Place patching is that the old home is still on the system, untouched, as it was before. You then just have to switch back to the old home.
Note that the below chown -R oracle:oinstall (or to the grid owner) is mandatory; indeed, the switch is ran as root and root.sh will later on put the correct privileges back in place.
[root@exadb01 ~]# dcli -g ~/dbs_group -l root "chown -R oracle:oinstall /u01/app/19.0.0.0/grid"           <== this is mandatory
[root@exadb01 ~]# su - oracle
[oracle@exadb01:]/home/oracle => /u01/app/19.0.0.0/grid/gridSetup.sh -silent -switchGridHome
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2021-05-10_12-25-25PM.log

As a root user, execute the following script(s):
        1. /u01/app/19.0.0.0/grid/root.sh

Execute /u01/app/19.0.0.0/grid/root.sh on the following nodes:
[exadb01, exadb02, exadb03, exadb04, exadb05, exadb06, exadb07, exadb08]

Run the scripts on the local node first. After successful completion, run the scripts in sequence on all other nodes.

Successfully Setup Software.
[oracle@exadb01:]/home/oracle =>
Same as before, now run the root.sh script on each node one by one, do not run them concurrently.
[root@exadb01 ~]# /u01/app/19.0.0.0/grid/root.sh
Check /u01/app/19.0.0.0/grid/install/root_exadb01.domain.com_2021-05-10_12-36-17-716165992.log for the output of root script
[root@exadb01 ~]#
. . .
[root@exadb01 ~]# ssh exadb08
Last login: Mon May 10 11:44:34 2021 from exadb01.domain.com
[root@exadb08 ~]# /u01/app/19.0.0.0/grid/root.sh
Check /u01/app/19.0.0.0/grid/install/root_exadb08.domain.com_2021-05-10_12-49-56-584075655.log for the output of root script
[root@exadb08 ~]#
No, refix back your oratab, execute rac-status.sh to check that everything is back up and running as expected and you are all done !

My Exadata patches download cheat sheet

It has been a while since I wanted to share my Exadata patches download cheat sheet I use to download all the patches I need when patching Exadata from the storage up to the GI. Indeed, looking for the correct mix of patches sometimes looks like:
So let's try to clarify this and have a look at everything needed to patch an Exadata with some explanations; you will also find a short sumup at the end of this blog as well.

  • A note about switches and cells patches:
  • From 19.3.0.0.0, the switches patch and the cells patch are delivered in two different patches, there was only one patch containing the switches and the cells patch before 19.3.0.0.0 (up to 19.2.22.0.0)

  • Switches -- monthly:
  • Note 888828.1 will give you the link to the switches patches as example Patch 32957082: EXADATA 21.2.2.0.0 SWITCH PATCH; patchmgr for this patch (not the same as others) is included in this patch, the patch is for Infiniband and ROCE switches. You can find more info on how to patch the IB switches here and the ROCE switches here.

  • Cells -- monthly:
  • Note 888828.1 will give you the link to the cells patch as example Patch 33120692: EXADATA RELEASE UPDATE 21.2.2.0.0 (MOS NOTE 2781458.1); patchmgr for this patch (not the same as others) is included in this patch. More info on the procedure to patch the cells here.

  • Database nodes (physical and VM) -- monthly:
  • Note 888828.1 will provide you with this patch number. Note that the database nodes and the VMs share the same patch whether the hypervisor is XEN or KVM. From Exadata 19.3, you can have VM running with KVM as hypervisor; in this case, the same patch applies to the KVM hosts (hypervisor); example: Patch 32957083: EXADATA COMPUTE NODE 21.2.2.0.0 OL7 BASE REPO ISO; this is the zip of an ISO containing RPMs; do not unzip it, patchmgr will do it for you. This patch does NOT include patchmgr which has to be downloaded separately (see below). More info on how to patch the DB nodes here.

  • dom0s -- monthly:
  • Note 888828.1 will provide you with this patch number. Note that xen dom0s require a dedicated patch like Patch 32957084: EXADATA COMPUTE NODE 21.2.2.0.0 DOM0 BASE REPO ISO; this is the zip of an ISO containing RPMs; do not unzip it, patchmgr will do ot for you. This patch does NOT include patchmgr which has to be downloaded separately (see below).

  • patchmgr -- quite often but not regularly:
  • patchmgr for DB nodes, VM, KVM hosts and xen dom0s needs to be downloaded separately in Patch 21634633: DBSERVER.PATCH.ZIP ORCHESTRATOR PLUS DBNU - ARU PLACEHOLDER. It is the same patchmgr for xen dom0s and the DB nodes/VM even if they do not share the same patch :)

  • ksplice -- quite often but not regularly:
  • If for any reason (the latest patch you apply does not include a patch you need), you may need to apply some extra ksplice patches, you'll find the documentation in Uptrack: HOWTO: Install ksplice kernel updates for Exadata Database Nodes (Doc ID 2207063.1); you'll need a user on https://linux.oracle.com (as far as I remember it is not the same as your MOS user . . .) and download the required ksplice patch.

  • JDK -- monthly:
  • If you patch the JDK on the DB nodes, VMs, hypervisors and cells (Oracle always provides patches with old JDK so depending on your compliance requirements, the JDK from a brand new patch is already... not compliant :)), you'll find it in JDK: Supported Java SE Downloads on MOS (Doc ID 1439822.1), search for Oracle JDK 8 Update XXX

  • Grid Infrastructure - quarterly:
  • You will find the link of the GI version you want to install in Note 888828.1 (example: Patch 32895426: GI RELEASE UPDATE 19.12.0.0.0). Be careful, Oracle maintains 3 GI versions at a time backporting some patches to two previous versions so you will end up with 3 different current versions each quarter with this kind of naming convention: X.Y.0, X.Y.1 and X.Y.2 so the ".2" is supposed to be the most mature and final version of the "X.Y" GI version except when a version is so buggy that they have to quickly release a ".1" version very quickly after a ".0" like GI 19.10 for example which now has a 19.10.3 version -- but it it supposed to be exceptional. To sum up, here is the GI versions you will have in July 2021:
    • 19.12.0.0.210720
    • 19.11.1.0.210720
    • 19.10.3.0.210720
    Once you got your GI, you are not done yet, indeed, this piece of software has many bugs and you have to closely study the below notes to find out which patches are included, which new critical issues are found, etc ...
    • Oracle Database 19c Important Recommended One-off Patches (555.1)
    • This note is indeed about "Database" but Oracle took the bad habit to more or less include GI in DB . . . sometimes :)
    • Grid Infrastructure 19 Release Updates and Revisions Bugs Fixed Lists (Doc ID 2523221.1)
    • Oracle Database 19c Release Update & Release Update Revision July 2021 Known Issues (Doc ID 19202107.9)
    • There will be a new note for this every 3 months (for every GI release) and it can be updated with new critical issues / patches to apply at any time. For example, April's was Oracle Database 19c Release Update & Release Update Revision April 2021 Known Issues (Doc ID 19202104.9). More info on GI parching here.


  • GI JDK -- quarterly:
  • If you need to ensure a certain compliance, as I was mentioning earlier, the JDK provided with the GI is kind of old for compliance point of view so you will need to patch it; you'll find it in JDK and PERL Patches for Oracle Database Home and Grid Home (Doc ID 2584628.1), paragraph 2.1 Latest JDK Patches for Database

  • RPMs -- not regularly:
  • If you want to install some specific (non-default) RPM (I like to install screen for example), you can find them here:
    • OL7: https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/index.html
    • OL6: https://yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/index.html
    • Mainly for the xen dom0s.

And this should be all !

To sumup, the notes numbers:

Happy patching !

IB Switches: May the --force be with you !

Patching Infiniband Switches is usually really hassle free but you may one day face a (very) reluctant to be patched IB Switch. Note that this blog is part of a more general Exadata patching troubleshooting blog.
This journey started with some failed IB Switches pre-requisites:
FAILED : DONE: Initiate pre-upgrade validation check on InfiniBand switch(es).
ERROR : FAILED run of command:/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/patchmgr -ibswitches /root/ib_group -upgrade -ibswitch_precheck
INFO : upgrade attempted on nodes in file /root/ib_group: [exa-ib1 exa-ib2 exa-ib3]
Looking at patchmgr.trc, I could find:
[patchmgr_send_notification_to_all_nodes][702]  Arguments: Failed 808 ibswitch
And in upgradeIBSwitch.trc:
[TRACE][/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/upgradeIBSwitch.sh - 1740][copyToIBSwitch][1740]   Arguments: exa-ib1 xcp /usr/local/bin/xcp
[WARNING][/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/upgradeIBSwitch.sh - 1749][copyToIBSwitch][]  [CMD: scp xcp root@\[exa-ib1\]:/usr/local/bin/xcp] [CMD_STATUS: 1]
    ----- START STDERR -----
    xcp: No such file or directory
    ----- END STDERR -----
[TRACE][/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/upgradeIBSwitch.sh - 1740][copyToIBSwitch][1740]   Arguments: exa-ib1 libxcp.so.1 /usr/local/lib/libxcp.so.1
[WARNING][/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/upgradeIBSwitch.sh - 1749][copyToIBSwitch][]  [CMD: scp libxcp.so.1 root@\[exa-ib1\]:/usr/local/lib/libxcp.so.1] [CMD_STATUS: 1]
    ----- START STDERR -----
    libxcp.so.1: No such file or directory
This was looking like if patchmgr was unable to copy xcp and libxcp.so.1 to the switches so it could be a SSH passwordless connectivity issue (patchmgr tries to connect back to the database node used to patch it which may be impossible depending on what is in the switch /etc/hosts or your SSH security config defined in /etc/ssh/sshd_config -- you can find notes like this one Exadata: Patchmgr fails during the InfiniBand patching precheck. (Doc ID 2356026.1) on MOS about this. But well I know this and I was able SSH to my Switch which also could SSH back properly on any network interfaces to the DB node I was using to patch; I could also manually scp the famous xcp and libxcp.so.1. All was supposed to be OK.

Checking further, I found that this reluctant switch was with a very old version:
[root@exadb01 ~]# ./exa-versions.sh -I ~/ib_group
       Cluster is a X5-2 Quarter Rack HC 8TB
         -- Infiniband Switches
       exa-ib1        exa-ib2        exa-ib3
----------------------------------------------------
       2.1.8-1       2.2.15-1        2.2.15-1
----------------------------------------------------
[root@exadb01 ~]#
Indeed, if you look into Note 888828.1, you will find that Switch firmware 2.1.8-1 - Supplied with Exadata 12.1.2.3.x; 12.1.2.3 being released in April 2016 (keep in mind that patchmgr has been released in version 12.2.1.1.0 which was shipping IB Switch version 2.2.4-3 then ** after ** this 2.1.8-1 -- more on that later); It was then indeed an old version and it also meant that I was not really the first one facing this issue which then had not been resolved before :)

After investigating all of this with Oracle support which basically wanted to be sure that the SSH config was working, I have been pointed to the manual way of patching an IB Switch (which was the way of patching a Switch before patchmgr) described here : https://docs.oracle.com/cd/E76424_01/html/E76431/z400029a1775330.html#scrolltoc. It is pretty straightforward: you load the package, the switch installs and reboot and that's it; so I gave it a go (you cannot directly upgrade to 2.2.16-1 which was my target version but you first have to upgrade to 2.2.7-2):
-> load -source fhttp://10.11.12.13/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/sundcs_36p_repository_upgrade_2.1_to_2.2.7_2.pkg
Downloading firmware image. This will take a few minutes.
Error: Couldn't connect to server
-> load -source http://10.11.12.13/patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/sundcs_36p_repository_upgrade_2.1_to_2.2.7_2.pkg
Downloading firmware image. This will take a few minutes.
Error: Couldn't connect to server
which failed miserably; I then realized that this could not work, there is no FTP nor HTTP running on my database server where I want to load that package to the switch. The (very good) MOS engineer told me there was no other way: FTP or HTTP -- wow, FTP ? really ? you mean that old buddy running on port 21 ? no way I can have this to run on my DB node; FTP is a bit like Nokia 3210 -- you remember it was great but no way you can use it nowadays :D

As I could obviously not install a FTP or a HTTP server anywhere close to that switch, I tried to scp the package to the switch itself and load it from there -- locally:
-> load -source ftp://10.20.21.22/tmp/sundcs_36p_repository_2.2.7_2.pkg
Error: Insufficient disk space/memory. Firmware update requires minimum of
120 MB space in /tmp directory
80 MB space in / filesystem
120 MB of free memory 
It also failed miserably as obviously there is not enough space on the Switch to save a 180M package -- this started to be tough:
  • patchmgr fails at pre-requisites
  • No way to have a FTP server to load the package to the switch
  • No way to have a HTTP server to load the package to the switch
  • Not enough space to copy the package to the switch to load it locally
  • Not sure how patchmgr manages it but he can somehow load the package to the switch as this is what he usually does (I have to check patchmgr code to see what is that magic trick)

But still well, I had to patch this switch even if it was actually looking a bit like that to me:

So we (MOS engineer and I) thought that as patchmgr was released after this switch version, it may just not be aware of this switch version and then the pre-requisites could fail just because he didnt't know that version -- so I tried the upgrade (of that switch only) ignoring the pre-requisites with the -- force option !:
[root@exadb01 patch_switch_20.1.6.0.0.210113]# ./patchmgr -ibswitches ~/ib1 -upgrade --force yes
. . .
[INFO     ] Package will be downloaded at firmware update time via scp  <== a clue about how patchmgr does it -- but where does it find the disk space ? this is another story :)
[SUCCESS  ] Execute plugin check for Patching on exa-ib1
[INFO     ] Starting upgrade on exa-ib1 to 2.2.7_2. Please give upto 15 mins for the process to complete. DO NOT INTERRUPT or HIT CTRL+C during the upgrade
[INFO     ] Additional firmware load required. Starting secondary firmware load. DO NOT INTERRUPT or HIT CTRL+C
[INFO     ] Rebooting exa-ib1 to complete the firmware update. Wait for 15 minutes before continuing. DO NOT MANUALLY REBOOT THE INFINIBAND SWITCH
. . . looking good so far . . . 
[INFO     ] Validating the current firmware on the InfiniBand Switch
[SUCCESS  ] Firmware verification on InfiniBand switch exa-ib1
[INFO     ] Finished post-update validation on exa-ib1
[FAIL     ] Post-update validation on exa-ib1
[ERROR    ] Failed to upgrade exa-ib1 to 2.2.7-2. Cannot proceed with upgrading switch to 2.2.16_1
[FAIL     ] Update switch exa-ib1 to 2.2.16_1
[INFO     ] Aborting the process. Not going to try anymore switches. Retry after resolving the problems.
[FAIL     ] Overall status
OK so here it seems that the upgrade to 2.2.7-2 was OK but the post steps were KO -- may be also because of the fact that the original version of the Switch was too old; I could verify the version which was now good:
[root@exa-ib1 ~]# version
SUN DCS 36p version: 2.2.7-2 <================ looks good
Build time: Nov 2 2017 09:21:37
. . .
[root@exa-ib1 ~]#
OKay, now I could run the upgrade to 2.2.16-1 pre-requisites on Switch -- which were OK:
----- InfiniBand switch update process ended 2021-02-12 12:19:33 +1100 -----
2021-02-12 12:19:33 +1100 1 of 1 :SUCCESS: Initiate pre-upgrade validation check on InfiniBand switch(es).
2021-02-12 12:19:33 +1100 :SUCCESS: Completed run of command: /patches/20.1.6.0.0/patch_switch_20.1.6.0.0.210113/patchmgr -ibswitches /root/ib1 -upgrade -ibswitch_precheck
2021-02-12 12:19:33 +1100 :INFO : upgrade attempted on nodes in file /root/ib1: [exa-ib1] 
And then I could upgrade all the 3 switches in a row, patchmgr now taking care of the version difference to upgrade all these switches to the target version: 2.2.16-1 which worked like a charm !

I could then finally patch this reluctant to be patched Infiniband Switch thanks to the -- force option because he was older than patchmgr -- May the -- force be with you !

I finally found my top Orace 12c-19c database feature !

Every Oracle database version comes with tons of new features (or renamed features to kind of re release a feature which was not really well implemented in a previous version :)) which are advertised a lot but when you think back about these features, which ones were really the top new feature of each version ?

I can easily name what I consider (this is indeed very subjective) to be the best feature of each version up to 11g but honnestly, for the 12c family (12, 18 and 19), I had nothing in mind (also may be because I have been doing less database work for years now). Let's start by my top feature for each version:
  • Oracle 7: CBO -- I haven't worked that much with Oracle 7 though but well, CBO is a huge feature

  • Oracle 8: RMAN -- We finally had something more integrated and more efficient than BEGIN/END BACKUP which was kind of a hassle. I also remind funny stories when interviewing to recruit people: "Me: do you know RMAN ?", "Candidate: Hermann Maier ? yes he is very good at ski racing", "Me: OK, we'll call you" :D

  • Oracle 8i: if 8i is a different version than 8 (some were saying that at that time), I would say Java in the database ! haha joking obviously; not sure this was the best idea ever as we still cannot really patch it online 20 years later and it is still full of vulnerabilities we have to fix every single month :) I would then say Partitionning -- 8 had partitionning but it was very basic, it started to be really usable in 8i; if 8i is not a different version than 8 then I go with RMAN for 8/8i

  • Oracle 9i: dbms_metadata.get_ddl -- how awesome it was when get_ddl was released ! what a pain it was to generate a DDL in the previous versions ! query tons of different data dict views, doing export and clean the DDL from the export file, using some graphical tools, ... there was no easy and 100% reliable way, get_ddl was clearly a relief, long live get_ddl ! :)

  • Oracle 10g: AWR -- no need to argue here, there was clearly a before and an after in term of performance investigations and tuning thanks to AWR

  • Oracle 11g: Snapshot Standby -- we could already do this in 10g with restore points, I remember I had scripted this in the past but Snapshot Standbys made it easy (and official !) to open read write a copy of a production to developper and resync it at night, excellent feature; I could also have said Exadata ! but at the time Exadata was released, it was still kind of confidential, it took time for people to use / trust Exadata so for me, Snapshot Standby stays the top 11g feature

Then 12c came and I couldn't find anything really appealling to me. I was thinking about the online datafiles move which is indeed a very cool feature but Oracle has mainly addressed these kind of datafiles placement issues in previous versions with OMF, db_file_create_dest, etc ... so even if this is a very cool feature, it would have had a bigger impact if implemeted in 8 than in 12. One would say CDB/PDB; it is indeed the major architecture change but I personnaly don't really like it and making it default makes everything more complicated for 99% for may be 1% who would really need this feature so for me, it is not really the best 12c family feature. Also, it is not totally integrated with CRS which is why I cannot show the PDBs in rac-status.sh. It should be coming with GI 21c (as GI 20c has not been released) -- we'll see.

And suddenly, with 19c, came my top 12c family feature:
  • Services can now automatically failback to the preferred node using -failback yes !!
Oh boy, I have always been waiting for this feature ! For everyone who have done patching / upgrades (everyone ?), you know how messy it is with the services: they restart on the available nodes when you patch the preferred node, application teams want to run on a specific node because they application is still not RAC-friendly or whatever reason, etc... You can easily know what services have moved during a maintenance using rac-status but when it comes to rebalance everything to the preferred node(s), this is another story. Indeed, CRS does not show the preferred / available nodes information with crsctl and you then have to svrctl config service to get this information knowing that you need to use the srvctl command from the database home version (and not CRS !) for this as a wrong srvctl version used for this would fail, .. in short, it is a big mess for something which should be easy in my opinion.

But this was before -failback yes feature ! you just have to set it up on a service like this:
srvctl config service -d DB_NAME -s SERVICE_NAME -failback yes
Then at next restart of the server, GI or instance, the service will automatically been rebalanced to the preferred node -- gracefully, without disconnecting anyone -- very cool, right ?!

To help setting that up, I have then wrote 2 scripts to be able to easily take advantage of this feature:
Enjoy !

Exadata: patchmgr -- dbnodeupdate.sh backup failed on one or more nodes

Patchmgr is nice enough to backup a databae node system before patching it which is a very cool feature. You can also only backup (and not patch)a database node (patchmgr -backup) or patch a database node without backing it up (patchmgr -nobackup) -- but the default is to backup the system just before patching it which is perfectly fine.

But every feature can eventually fail and the whole patching session is not started if the backup fails like in the below log:
2021-02-03 07:31:32 +1100        :Working: dbnodeupdate.sh running a backup on 2 node(s).
2021-02-03 07:34:26 +1100        :ERROR  : dbnodeupdate.sh backup failed on one or more nodes 
SUMMARY OF ERRORS FOR dbnode01:
dbnode01: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log
SUMMARY OF ERRORS FOR dbnode02:
dbnode02: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log

You can then check the logfile pointed by patchmgr in the error stack:
[root@dbnode01 ~]# cat /var/log/cellos/dbserver_backup.sh.log
Feb 03 07:34:09  [INFO]
Feb 03 07:34:09  [INFO] Start to backup root /dev/VGExaDb/LVDbSys1 and boot partitions
Feb 03 07:34:09  [INFO] Check for Volume Group "VGExaDb"
Feb 03 07:34:09  [INFO] Check for the LVM root partition "VGExaDb/LVDbSys1"
Feb 03 07:34:10  [INFO] LVM snapshot Logical Volume
Feb 03 07:34:10  [ERROR] LVM snapshot Logical Volume "VGExaDb/LVDbSys1Snap" already exists
Feb 03 07:34:10  [ERROR] Remove snapshot manually and then restart this utility
[root@dbnode01 ~]#
The issue is very clear here, the LVM snapshot used for the backup already exists most likely due to something wrong during a previous patching session. We even have the solution which is to remove it manually which is an easy procedure:
[root@dbnode01 ~]# lvm lvscan | grep -i snap
inactive Snapshot '/dev/VGExaDb/LVDbSys1Snap' [10.00 GiB] inherit
[root@dbnode01 ~]# lvremove -f /dev/VGExaDb/LVDbSys1Snap
Logical volume "LVDbSys1Snap" successfully removed
[root@dbnode01 ~]# lvm lvscan | grep -i snap
[root@dbnode01 ~]#

Now that the snapshot LVM is removed, you can restart your database nodes patching which now will be successful !

Exadata: make a cell or a DB node blink !

Hardware issues happen and when it happens on Exadata systems, you need to have an Oracle Field Engineer to go into your Datacenter and replace the faulty part. What also happens is that you have many Exadatas in your datacenter, your CMDB is not really up to date and as a result, the Field Engineer is in trouble locating the Exadata component he needs to replace something faulty in.
Hopefully, the ILOMs have a very cool feature and you can make a cell or a DB node blink ! (then easy to locate)

First of all, connect to the ILOM and check the /SP/LOCATE property:
-> show /SYS/LOCATE
/SYS/LOCATE
    Targets:
    Properties:
        type = Indicator
        ipmi_name = LOCATE
        value = Off            <=== OFF
    Commands:
        cd
        set
        show
->
Let's make it blink:
-> set /SYS/LOCATE value=fast_blink
Set 'value' to 'fast_blink' [Fast Blink]
-> show /SYS/LOCATE
/SYS/LOCATE
    Targets:
    Properties:
        type = Indicator
        ipmi_name = LOCATE
        value = Fast Blink     <=== It blinks !
    Commands:
        cd
        set
        show
->
Once the Field Engineer has located your blinking component, it is time to stop the blinking -- I know it is very fun so let it a couple more minutes and then stop it:
-> set /SYS/LOCATE value=off
Set 'value' to 'off' [Off]
-> show /SYS/LOCATE
/SYS/LOCATE
    Targets:
    Properties:
        type = Indicator
        ipmi_name = LOCATE
        value = Off            <=== OFF
    Commands:
        cd
        set
        show
->

This is be very useful, I use it a lot !

Exadata: Hack patchmgr

Do no try this at home, this blog is for educational (and fun) purpose only.

Let's take as an example that weird patchmgr behavior when patching Exadata:
  • 1/ patchmgr uses the /etc/hosts IP of a host to start the patching
  • 2/ patchmgr uses the DNS ip of a host when waiting for a host to reboot
So if it happens that the DNS IP is blocked by a firewall for example (this would be most likely due to a wrong configuration but let's assume this is the situation we are in) then patchmgr will wait for the host te be back after reboot forever -- and you will be in trouble.

To work this around, you can still comment the DNS server(s) out of the /etc/resolv.conf file from the host you start patchmgr from (no DNS server then patchmgr would not be able to use the DNS IP to ping the host waiting to come back after reboot then would use the /etc/hosts IP instead) but, by doing that, the whole host would be unable to use DNS during this patch session and this is not what you want; indeed, if you use an external server to start patchmgr, you will most likely create an incident on this system.

Another way is to ... "adapt" patchmgr to make it not to use the DNS IP when pinging the patched host to wait for it to come back online after reboot. And this kind of easy as patchmgr is made out of Shell script -- great news, right ?
So looking into patchmgr code, you will find that patchmgr uses the host command to resolve the target server IP:
host_name_or_ip=$(host -t A $target | awk '/has address/ {print $NF; exit}')
And if you look at man host, you'll see that host resolves using DNS:
host - DNS lookup utility
So this is where the described issue comes from; we would not have this issue if patchmgr would resolve the host from the /etc/hosts file using for example a simple ping:
host_name_or_ip=$(ping -qc1 $target | head -1 | awk -F "[()]" '{print $2}')
Note that the resolution from /etc/hosts before DNS is default as you can see in /etc/nsswitch.conf:
# grep hosts /etc/nsswitch.conf
hosts:          files dns    <=== we resolve using files before DNS
All that said, we can then update patchmgr to resolve using ping and no more host:
#host_name_or_ip=$(host -t A $target | awk '/has address/ {print $NF; exit}')
host_name_or_ip=$(ping -qc1 $target | head -1 | awk -F "[()]" '{print $2}')
And let's run a test (just a precheck) to see how it goes:
# ./patchmgr -dbnodes ~/dbs_group -precheck -nomodify_at_prereq -target_version 19.3.12.0.0.200905 -iso_repo ../p31720221_193000_Linux-x86-64.zip -allow_active_network_mounts
2020-11-05 15:22:45 +1100        :ERROR  : Incorrect md5sum of /patches/dbserver_patch_20.200911/patchmgr
#
Hey, patchmgr has detected that we have modified it, clever ! Indeed, patchmgr checks the md5 of all the files before starting a patching sesssion; these md5 values are saved in the md5sum_files.lst file, so we just have to get the md5 of our "patched" patchmgr and update md5sum_files.lst with it:
# md5sum patchmgr
8e75bf3c1cae3e2d75229c85e84f8e0e  patchmgr
# cp md5sum_files.lst md5sum_files.lst.orig
# vi md5sum_files.lst
# grep patchmgr md5sum_files.lst
8e75bf3c1cae3e2d75229c85e84f8e0e  patchmgr
40acc292fec697492dd40a6938fb60c4  patchmgr_functions
#

And you are now good to go, you can now run your patched patchmgr !

Again, do no try this at home, this blog is for educational (and fun) purpose only. Nothing here would be supported by Oracle -- but it is still good to know ! :)

Exadata: Repair a corrupted / broken RPM database

Storage servers and DB nodes are Linux machines which use RPM as package manager. This RPM has a database to store the installed packages, etc ... It happened a few times that this database became corrupted after an Exadata patching. And you usually discover this during your next patching session with this kind of complaint from the pre-requisites:
2020-11-12 13:12:15 +1100        :FAILED : Check space and state of cell services.
You can note that the error message is Check space and state of cell services so the first thing is to check whether you have a space issue or not then once you have verified it is not that (dcli -g ~/cell_group -l root "df -h /" -- / needs 3 GB for patching), you know that the issue is with the state of your component (cell or DB node).

You then have to check the hostname.log of the culprit in the patchmgr directory (usually all the "hostname.log" logfiles have a very similar size so if one is bigger, you most likely have the culprit with a simple ls) and you may find:
cel02: [ERROR] Can not continue. Runtime configuration is not consistent with values configured in /opt/oracle.cellos/cell.conf.
cel02: [ERROR] Run ipconf to correct the inconsistencies. Failed check: /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose
This is not excessively verbose but it gives you a good hint and the command which has failed, you can then re execute this command and see how it goes:
[root@cel02 ~]# /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
[Info]: ipconf command line: /root/_cellupd_dpullec_/_p_/ipconf.pl -check-consistency -at-runtime -semantic -verbose -nocodes
Logging started to /var/log/cellos/ipconf.log
[Warning]: File not found /etc/ntp.conf
. . . many more info not useful in this scenario . . .

We can clearly see 2 issues here:
  • Missing /etc/ntp.conf: this can be ignored, it is documented in note 2689297.1
  • A RPM issue

We can confirm the RPM issue just by queying the RPM database:
[root@cel02 ~]# rpm -qa
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
[root@cel02 ~]#
So here we have to rebuild this corrupted RPM database and the good is that it ca be done 100% onlin with no disruption:
[root@cel02 ~]# mkdir /var/lib/rpm/backup
[root@cel02 ~]# cp -a /var/lib/rpm/__db* /var/lib/rpm/backup/
[root@cel02 ~]# rm -f /var/lib/rpm/__db*
[root@cel02 ~]# rpm --rebuilddb
[root@cel02 ~]#
Easy, right ? you can now verify the good health of your RPM database:
[root@cel02 ~]# rpm -qa | wc -l
455
[root@cel02 ~]#
You can revalidate the whole configuration:
[root@cel02 ~]# cellcli -e alter cell validate configuration
Cell cel02 successfully altered
[root@cel02 ~]#

And you are good to go, everything is now fixed and clean, your pre-requisites and future patch will now be working !
Note that this example is with a cell but it works the same way with a database node (and any Linux server).

Exadata: reinstall a broken system RPM

The Exadata compute nodes and storage servers are some Linux servers with a specific software which optimizes everything for the Oracle database, the Oracle clusterware, etc... it means that the software is a mix of RPMs like any other Red Hat system and when we patch these servers, Oracle provides an ISO (in note 888828.1) containing the new version of all these RPMs and patchmgr updates all these RPMs. And usually, our job stops here.

But . . . it may happen that a RPM needs to be reinstalled as it may be become corrupted or broken. Let's take as an example the Exadata Management Package which provides the ability to monitor the Exadata compute nodes and storage cells. This Exadata Management Package software is located in /etc/oracle/dbserver so if you remove this directory (don't do that :)), the dbmcli command won't work any more and you will be in trouble (for patching and more).

The situation will then be that the RPM wil still be installed at the system level but dbmcli is not here any more:
[root@node01]# rpm -qa | grep exadata-dbmmgmt
exadata-dbmmgmt-19.2.8.0.0.191119-1.noarch
[root@node01]# dbmcli
dbmcli: command not found
To fix this situation, you need to reinstall this RPM and to start with, you need to get it for your current version (imageinfo or exa-versions.sh will give you the current version of your system). If you have patched your Exadata yourself, you will have the patch already somewhere; if not, you'll find it in note 888828.1.

To get the ISO out of the patch, you need to unzip the patch (thing that patchmgr does for us when patching):
[root@node01]# unzip -q p30398147_192000_Linux-x86-64.zip
[root@node01]# ls -ltr exa*
-rw-rw-r-- 1 root root 1421932544 Nov 20  2019 exadata_ol7_base_repo_19.2.8.0.0.191119.iso
Now, we will mount the ISO:
[root@node01]# mkdir /mnt
[root@node01]# mount -o loop exadata_ol7_base_repo_19.2.8.0.0.191119.iso /mnt
mount: /dev/loop0 is write-protected, mounting read-only
We can now copy the RPM somewhere for future use and umount the ISO (we are done with it, you can delete it):
[root@node01]# cp /mnt/x86_64/exadata-dbmmgmt-19.2.8.0.0.191119-1.noarch.rpm /tmp/.
[root@node01]# umount /mnt
Let's now deinstall the broken RPM:
[root@node01]# rpm -e exadata-dbmmgmt-19.2.8.0.0.191119-1.noarch --nodeps
tee: /opt/oracle/dbserver_19.2.8.0.0.191119/.install_log.txt: No such file or directory
2020-11-03 15:00:05 +1100: Pre uninstallation steps in progress ...
Uninstalling version 19.2.8.0.0.191119.
. . . many errors as the directory has been dropped . . .
And reinstall the good RPM:
[root@node01]# rpm -ivh /tmp/exadata-dbmmgmt-19.2.8.0.0.191119-1.noarch.rpm
Preparing...                          ################################# [100%]
2020-11-03 15:00:24 +1100: Pre Installation steps in progress ...
2020-11-03 15:00:24 +1100: This is a fresh install.
Updating / installing...
   1:exadata-dbmmgmt-19.2.8.0.0.191119################################# [100%]
2020-11-03 15:00:34 +1100: Post Installation steps in progress ...
. . .
[root@node01]#
In the specific case of the management package, we also need to restart the services (dbmcli -e alter dbserver restart services all) and you are good to go as if /etc/oracle/dbserver has never been dropped !

The important thing to understand here is that the patches provided in the note 888828.1 to patch your Exadatas servers are ISO containing RPMs which can then be used to reinstall some RPMs in case of something wrong happens on your systems -- and this can be very very useful !

Exadata: ILOM hostname change

I am not sure about the root cause of this one but it was very weird finding that an ILOM had a wrong hostname (yes, wrong hostname) -- below the logs from a cell patching pre-requisites session:
cel01: [ERROR] Details:
cel01: ILOM hostname cel01.domain.com must match cel01-ilom.domain.com in /opt/oracle.cellos/cell.conf  : FAILED
cel01: [Info]: Consistency check FAILED
Indeed, the ILOM had the same hostname as the cell which obviously is not what we want:
[root@db01 ~]# ssh cel01-ilom
Password:
Hostname: cel01
-> show /SP hostname
/SP
    Properties:
        hostname = cel01         <===== bad, this is the cell hostname !
->
The cool thing with ILOMs is that we can change this online with no disruption:
-> set /sp hostname=cel01-ilom
Set 'hostname' to 'cel01-ilom'
-> show /SP hostname
/SP
    Properties:
        hostname = cel01-ilom   <===== good
->
One more easy online fix (for a weird issue :))!

Exadata: ILOM NTP change / fix

A recent cell patching showed me this error during the pre-requisites phase:
cel01: ILOM use NTP servers "disabled" must match "enabled" in /opt/oracle.cellos/cell.conf              : FAILED
cel01: ILOM first NTP server 0.0.0.0 must have non-empty value                                           : FAILED
This was due to a bad NTP configuration on the ILOM:
[root@db01 ~]# ssh cel01-ilom
Password:
Hostname: cel01
-> show /sp/clients/ntp
/SP/clients/ntp
    Targets:
        server
    Properties:         <===== nothing here
->
You can set the NTP servers as below:
-> set /SP/clients/ntp/server/1 address=10.11.12.13
Set 'address' to '10.11.12.13'
-> set /SP/clients/ntp/server/2 address=10.11.12.14
Set 'address' to '10.11.12.14'
->
And a quick check to verify the setting:
-> show /sp/clients/ntp/server/1
/SP/clients/ntp/server/1
    Targets:
    Properties:
        address = 10.11.12.13   <===== good       
-> show /sp/clients/ntp/server/2
/SP/clients/ntp/server/2
    Targets:
    Properties:
        address = 10.11.12.14   <===== good
->
This is good but not enough as we also have to check that the use of NTP is enabled:
-> show /SP/clock usentpserver
/SP/clock
    Targets:
    Properties:
        usentpserver = disabled     <===== disabled (bad)
->
Let's enable it:
-> set /SP/clock usentpserver=enabled
Set 'usentpserver' to 'enabled'
-> show /SP/clock usentpserver
/SP/clock
    Targets:
    Properties:
        usentpserver = enabled     <===== enabled (good)
->

An easy and 100% online fix !

Exadata: ILOM DNS change / fix

I recently faced this error during a cell patching session:
cel01: [ERROR] Details:
cel01: ILOM DNS server 0.0.0.0 defined in system configiuration            : FAILED
cel01: [Info]: Consistency check FAILED
You can also find it in list alerthistory:
FAILED [Warning]: ILOM DNS server(s) could not be retrieved [Info]: Consistency check FAILED
This happens because the ILOM had no DNS defined as we can see below:
[root@db01 ~]# ssh cel01-ilom
Password:
-> show /sp/clients/dns
/SP/clients/dns
    Targets:
    Properties:
        auto_dns = enabled
        nameserver = (none)       <==== no DNS server
        retries = 1
        searchpath = (none)       <==== no search path
        timeout = 5
    Commands:
        cd
        set
        show
->
We can see on the log above that nor the DNS nor the search domain are defined here. We can then define them as below (this is a 100% online operation):
-> set /sp/clients/dns nameserver=10.11.12.13
Set 'nameserver' to '10.11.12.13'
-> set /sp/clients/dns searchpath=domain.com
Set 'searchpath' to 'domain.com'
-> 
And a quick check to verify that the config is now correct:
-> show /sp/clients/dns
/SP/clients/dns
    Targets:
    Properties:
        auto_dns = enabled
        nameserver = 10.11.12.13  <==== good
        retries = 1
        searchpath = domain.com   <==== good
        timeout = 5
    Commands:
        cd
        set
        show
-> 
An easy and 100% online fix !

Exadata: how to extend /u01

Marketers are very good at selling everyone that resources (CPU, disks, memory) are now infinite which is mainly true on a slide but it's less true in real life and you may one day face this situation with your Exadata:
[root@exa01_db01 ~]# dcli -g ~/dbs_group -l root df -h /u01
exa01_db01: 99G   90G  4.2G  96% /u01
exa01_db02: 99G   88G  6.2G  94% /u01
exa01_db03: 99G   87G  6.5G  94% /u01
exa01_db04: 99G   90G  3.6G  97% /u01
exa01_db05: 99G   92G  1.9G  99% /u01
exa01_db06: 99G   92G  2.0G  98% /u01
exa01_db07: 99G   91G  2.8G  98% /u01
exa01_db08: 99G   85G  8.5G  91% /u01
[root@exa01_db01 ~]#
I won't explore the reasons of the why /u01 is filled here, that you can ensure that your logfiles are properly rotated / purged but on how to extend the /u01 filesystem on an Exadata.

A word on /u01

Whether you install Exadata on your own or you re-image it, you cannot influence the size of /u01 during these steps. /u01 is created with a 100 GB size, end of the story.

Space available for /u01

Having a closer look at /u01, we can see that it resides on the /dev/mapper/VGExaDb-LVDbOra1 Logical Volume:
[root@exa01_db01 ~]# df -h /u01
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbOra1   99G   90G  4.2G  96% /u01
[root@exa01_db01 ~]#
This LVDbOra1 Logical Volume is in the VGExaDb Volume Group:
[root@exa01_db01 ~]# lvs
  LV                 VG      Attr       LSize  
  LVDbOra1           VGExaDb -wi-ao---- 100.00G
  LVDbSwap1          VGExaDb -wi-ao----  24.00G
  LVDbSys1           VGExaDb -wi-ao----  30.00G
  LVDbSys2           VGExaDb -wi-a-----  30.00G
  LVDoNotRemoveOrUse VGExaDb -wi-a-----   1.00G
[root@exa01_db01 ~]#
This VGExaDb Volume Group contains 2 Physical Volumes:
[root@exa01_db01 ~]# pvs
  PV         VG      Fmt  Attr PSize   PFree
  /dev/sda2  VGExaDb lvm2 a--u 557.36G 372.36G
  /dev/sda3  VGExaDb lvm2 a--u   1.09T   1.09T
[root@exa01_db01 ~]#
and (good news), there's a lot of free space available here:
[root@exa01_db01 ~]# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  VGExaDb   2   5   0 wz--n- 1.63T 1.45T
[root@exa01_db01 ~]#

Extend /u01

So we just have to extend the filesystem which is an online operation:
[root@exa01_db01 ~]#  lvextend -L +100G /dev/mapper/VGExaDb-LVDbOra1
  Size of logical volume VGExaDb/LVDbOra1 changed from 100.00 GB (25600 extents) to 200.00 GB (51200 extents).
  Logical volume LVDbOra1 successfully resized.
[root@exa01_db01 ~]# resize2fs /dev/mapper/VGExaDb-LVDbOra1
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/mapper/VGExaDb-LVDbOra1 is mounted on /u01; on-line resizing required
old_desc_blocks = 7, new_desc_blocks = 13
Performing an on-line resize of /dev/mapper/VGExaDb-LVDbOra1 to 52428800 (4k) blocks.
The filesystem on /dev/mapper/VGExaDb-LVDbOra1 is now 52428800 blocks long.
[root@exa01_db01 ~]#
And you're done !
[root@exa01_db01 ~]# df -h /u01
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbOra1  197G   94G   94G  50% /u01
[root@exa01_db01 ~]#
Note that this has to be done on each node.


Hope it helps !

Exadata: re-image a Cell Storage Server to 19c (OS configuration)

Now that we got the required information to re-image our lost cell from part 1, configured the system to boot on the ISO in part 2, let's now connect to the console and do the OS configuration with the installer.

5/ OS configuration


5.0/ Connect to the console

Connect to the console to see the installer logs:
[root@exadata01_db01]# ssh exadata01_cel02-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 4.0.4.36 r128807
Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: exadata01_cel02-ilom
-> set /sp/cli timeout=0
Set 'timeout' to '0'
-> start /sp/console
Are you sure you want to start /SP/console (y/n)? y
Serial console started.  To stop, type ESC (
[200446.510699] usb 2-1.7: new high-speed USB device number 3 using ehci-pci
[200446.603855] usb 2-1.7: New USB device found, idVendor=0430, idProduct=a101
[200446.611631] usb 2-1.7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
. . .
Note here the life saver option set /sp/cli timeout=0 to disable the console timeout which is 15 minutes by default; indeed, as the whole re-image takes 1h30 ~ 2h, it is not handy at all to be disconnected every 15 minutes, lose the history and when some steps take 20+ minutes, you would be under the impression that the installation is stuck as you cannot see anything at the console after you reconnect after a timeout so set /sp/cli timeout=0 is a must have option to use.

5.1/ Set eth0

After around 15 minutes, you will be asked to set an eth0; use the information collected before to fill this section:
IP Address of this host: 10.1.2.3
Netmask of this host: 255.255.254.0
Default gateway: 10.1.2.1
A very important thing here: nor backspace nor CTRL+H seems to work here so be very careful when entering these information as if you mess it up, you'll have to restart the whole process. If someone knows how to correct a typo here, please let me know in the comments, I haven't found how.

5.2/ DNS servers

It may take 45 minutes ~ 1 hour to reach this configuration point after the previous one:
Nameserver:
Add more nameservers (y/n) [n]: y
Nameserver: 10.200.200.4
Add more nameservers (y/n) [n]: y
Nameserver: 10.200.200.5
Add more nameservers (y/n) [n]: y
Nameserver: 10.89.1.28
Add more nameservers (y/n) [n]: n

5.3/ Timezone

1) Andorra
2) United Arab Emirates
3) Afghanistan
4) Antigua & Barbuda
5) Anguilla
6) Albania
7) Armenia
8) Angola
9) Antarctica
10) Argentina
11) Samoa (American)
12) Austria
13) Australia
14) Aruba
15) Ã…land Islands
16) Azerbaijan
Select country by number, [f]irst, [b]ack, [n]ext, [l]ast: 233
Selected country: United States (US). Now choose a zone
For an US timezone, it is 233.

5.4/ NTP

Note the screen bug here where the DNS IP erases the beginning of the line:
The current NTP server(s):
Do you want to change it (y/n) [n]: y
48.1.1qualified hostname or ip address for NTP server. Press enter if none: 10.2
Continue adding more ntp servers (y/n) [n]: n

5.5/ Configure network interfaces

Network interfaces
Name  Bonding  Speed    Status  IP address  Netmask  Gateway  Net type  Hostname
ib0                     UNCONF
ib1                     UNCONF
eth0                    UNCONF
eth1                    UNCONF
eth2                    UNCONF
eth3                    UNCONF
Select interface name to configure or press Enter to continue: ib0
Selected interface. ib0
IP address or none: 192.168.1.3
Netmask: 255.255.252.0
Fully qualified hostname or none: exadata01_cel02-priv1.domain.com
Continue configuring or re-configuring interfaces? (y/n) [y]: y

Network interfaces
Name  Bonding  Speed    Status  IP address    Netmask       Gateway  Net type  Hostname
ib0                     UP      192.168.1.3 255.255.252.0          Private   exadata01_cel02-priv1.domain.com
ib1                     UNCONF
eth0                    UNCONF
eth1                    UNCONF
eth2                    UNCONF
eth3                    UNCONF
Select interface name to configure or press Enter to continue: ib1
Selected interface. ib1
IP address or none: 192.168.1.4
Netmask: 255.255.252.0
Fully qualified hostname or none: exadata01_cel02-priv2.domain.com
Continue configuring or re-configuring interfaces? (y/n) [y]: y

Network interfaces
Name  Bonding  Speed    Status  IP address    Netmask       Gateway  Net type  Hostname
ib0                     UP      192.168.1.3 255.255.252.0          Private   exadata01_cel02-priv1.domain.com
ib1                     UP      192.168.1.4 255.255.252.0          Private   exadata01_cel02-priv2.domain.com
eth0                    UNCONF
eth1                    UNCONF
eth2                    UNCONF
eth3                    UNCONF
Select interface name to configure or press Enter to continue: eth0
Selected interface. eth0
IP address or none: 10.1.2.3
Netmask: 255.255.254.0
Gateway (IP address or none) or none: 10.1.2.1
Link speed (default,10000,25000):
[Warning]: Invalid value. Try again
Link speed (default,10000,25000): default
Fully qualified hostname or none: exadata01_cel02.domain.com
Continue configuring or re-configuring interfaces? (y/n) [y]: y

Network interfaces
Name  Bonding  Speed    Status  IP address    Netmask       Gateway     Net type   Hostname
ib0                     UP      192.168.1.3 255.255.252.0             Private    exadata01_cel02-priv1.domain.com
ib1                     UP      192.168.1.4 255.255.252.0             Private    exadata01_cel02-priv2.domain.com
eth0           default  UP      10.1.2.3 255.255.254.0 10.1.2.1 Management exadata01_cel02.domain.com
eth1                    UNCONF
eth2                    UNCONF
eth3                    UNCONF
Select interface name to configure or press Enter to continue:

5.6/ Canonical hostname

Select canonical hostname from the list below
1: exadata01_cel02-priv1.domain.com
2: exadata01_cel02-priv2.domain.com
3: exadata01_cel02.domain.com
Canonical fully qualified domain name: 3

5.7/ Default gateway

Select default gateway interface from the list below
1: eth0
Default gateway interface: 1

5.8/ A sum up

Network interfaces
Name  State  Speed    Status  IP address    Netmask       Gateway     Net type   Hostname
ib0   Linked          UP      192.168.1.3 255.255.252.0             Private    exadata01_cel02-priv1.domain.com
ib1   Linked          UP      192.168.1.4 255.255.252.0             Private    exadata01_cel02-priv2.domain.com
eth0  Linked default  UP      10.1.2.3 255.255.254.0 10.1.2.1 Management exadata01_cel02.domain.com
eth1  Linked          UNCONF
eth2  Linked          UNCONF
eth3  Linked          UNCONF
Is this correct (y/n) [y]: y

5.9/ ILOM settings

Let's checked / update the ILOM settings:
Do you want to configure basic ILOM settings (y/n) [y]: y
Loading basic configuration settings from ILOM ...
ILOM Fully qualified hostname [exadata01_cel02-ilom.domain.com]:
Inet protocol (IPv4,IPv6) [IPv4]:
ILOM IP address [10.1.2.144]:
ILOM Netmask [255.255.254.0]:
ILOM Gateway or none [10.1.2.1]:
ILOM Nameserver (multiple IPs separated by a comma) or none [10.200.200.4]:
ILOM Use NTP Servers (enabled/disabled) [enabled]:
1]:  First NTP server. Fully qualified hostname or ip address or none [10.248.1.1
ILOM Second NTP server. Fully qualified hostname or ip address or none [none]:
ILOM Vlan id or zero for non-tagged VLAN (0-4079) [0]:

Basic ILOM configuration settings:
Hostname             : exadata01_cel02-ilom.domain.com
IP Address           : 10.1.2.144
Netmask              : 255.255.254.0
Gateway              : 10.1.2.1
DNS servers          : 10.200.200.4
Use NTP servers      : enabled
First NTP server     : 10.248.1.1
Second NTP server    : none
Timezone (read-only) : America/Chicago
VLAN id              : 0
Is this correct (y/n) [y]: y

5.10/ An ignorable error

You can ignore this weird error, at that point, just . . . wait . . .:
[Info]:  Updating runtime sysctl configuration for ib1: net.ipv6.conf.ib1.disable_ipv6=1
[Info]:  Updating runtime sysctl configuration for eth0: net.ipv6.conf.eth0.disable_ipv6=1
[Info]: Adjust settings for IB interfaces in /etc/sysctl.conf
[INFO     ] /opt/oracle.cellos/cellFirstboot.sh: done
exadata01_cel02 login: [ 1948.671627]   MST::  : get_space_support_status 438: At least one SPACE is not supported
Also, you will be disconnected from the ILOM as it will be rebooted as well so just reconnect to the console as you did previously.

5.11/ All good !

It has now finished, you can login with the default welcome1 password and see your system back !
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -mode first_boot -force -quiet -all
Run validation beginfirstboot - PASSED
Run validation ipmisettings - PASSED
Run validation misceachboot - PASSED
Run validation celldstatus - PASSED
Run validation calibration - PASSED
Run validation saveconfig - BACKGROUND RUN
2019-08-01 02:05:25 -0500 The first boot completed with SUCCESS
2019-08-01 02:05:25 -0500 2019-08-01 02:05:25 -0500 [FACTORY_TEST_END] Post installation tests ended with success
2019-08-01 02:05:25 -0500 2019-08-01 02:05:25 -0500 [FACTORY_COMPLETE] Imaging ended with success 

exadata01_cel02 login: root
Password:
Last failed login: Sun Jul 14 18:42:13 CDT 2019 on ttyS0
Last login: Sun Jul 14 18:43:21 on ttyS0
[root@exadata01_cel02]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         47G     0   47G   0% /dev
tmpfs            47G     0   47G   0% /dev/shm
tmpfs            47G  4.1M   47G   1% /run
tmpfs            47G     0   47G   0% /sys/fs/cgroup
/dev/md5        9.8G  3.2G  6.1G  34% /
/dev/md7        2.9G  1.3G  1.5G  46% /opt/oracle
/dev/md4        244M   52M  177M  23% /boot
/dev/md11       4.6G   61M  4.3G   2% /var/log/oracle
tmpfs           9.4G     0  9.4G   0% /run/user/0
/dev/sdm1       7.3G  1.8G  5.2G  25% /mnt/usb.mrdiag
[root@exadata01_cel02]#

5.12 / Password, SSH, reboot and checks

I would recommend you to modify the default passwords (root, celladmin, cellmon) to the one you want and also you have to regenerate the SSH key
[root@exadata01_cel02]# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@exadata01_cel02]# ssh-keygen
. . .
[root@exadata01_cel02]#
Here, I also like to reboot to be sure everything comes back online properly.
[root@exadata01_cel02]# reboot
It is worth here doing few checks
[root@exadata01_cel02]# cellcli
CellCLI: Release 19.2.2.0.0 - Production on Thu Aug 01 02:16:05 CDT 2019

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.

CellCLI> list griddisk

CellCLI> list physicaldisk
         252:0           NAAAAA                  normal
         252:1           NBBBBB                  normal
         252:2           NCCCCC                  normal
         252:3           NDDDDD                  normal
         252:4           NEEEEE                  normal
         252:5           NFFFFF                  normal
         252:6           NGGGGG                  normal
         252:7           NHHHHH                  normal
         252:8           NJJJJJ                  normal
         252:9           NKKKKK                  normal
         252:10          NLLLLL                  normal
         252:11          NMMMMM                  normal
         FLASH_10_1      PHLE111111111P4BGN-1    normal
         FLASH_10_2      PHLE222222222P4BGN-2    normal
         FLASH_4_1       PHLE333333333P4BGN-1    normal
         FLASH_4_2       PHLE444444444P4BGN-2    normal
         FLASH_5_1       PHLE555555555P4BGN-1    normal
         FLASH_5_2       PHLE666666666P4BGN-2    normal
         FLASH_6_1       PHLE777777777P4BGN-1    normal
         FLASH_6_2       PHLE888888888P4BGN-2    normal
         M2_SYS_0        PHYHXXXXXXFA240J        normal
         M2_SYS_1        PHYHYYYYYYYR240J        normal

CellCLI> exit
quitting
[root@exadata01_cel02]# imageinfo

Kernel version: 4.1.12-124.26.12.el7uek.x86_64 #2 SMP Wed May 8 22:25:03 PDT 2019 x86_64
Cell version: OSS_19.2.2.0.0_LINUX.X64_190513.2
Cell rpm version: cell-19.2.2.0.0_LINUX.X64_190513.2-1.x86_64

Active image version: 19.2.2.0.0.190513.2
Active image kernel version: 4.1.12-124.26.12.el7uek
Active image activated: 2019-08-01 02:05:25 -0500
Active image status: success
Active system partition on device: /dev/md24p5
Active software partition on device: /dev/md24p7

Cell boot usb partition: /dev/md25p1
Cell boot usb version: 19.2.2.0.0.190513.2

Inactive image version: undefined
Rollback to the inactive partitions: Impossible
[root@exadata01_cel02]#

6/ Bring back the disks into ASM

Now, you'll have to add the disks back into ASM; depending in how your system crashes and/or why you have to re-image your cell storage, there may be different scenario; in our case, we had to recreate the cell disk CD_00 and CD_01 (as they were missing) and add all the disks back into ASM:
-- Recreate celldisks CD_00 and CD-01
CellCLI> create celldisk CD_00_exadata01_cel02 lun=0_0
CellDisk CD_00_exadata01_cel02 successfully created

CellCLI> create celldisk CD_01_exadata01_cel02 lun=0_1
CellDisk CD_01_exadata01_cel02 successfully created

CellCLI>

-- Recreate the grid disks on these cell disks
CellCLI> create griddisk DATAC1_CD_00_exadata01_cel02 celldisk=CD_00_exadata01_cel02, size=5.6953125T
GridDisk DATAC1_CD_00_exadata01_cel02 successfully created

CellCLI> create griddisk RECOC1_CD_00_exadata01_cel02 celldisk=CD_00_exadata01_cel02, size=1.42388916015625T
GridDisk RECOC1_CD_00_exadata01_cel02 successfully created

CellCLI> create griddisk DATAC1_CD_01_exadata01_cel02 celldisk=CD_01_exadata01_cel02, size=5.6953125T
GridDisk DATAC1_CD_01_exadata01_cel02 successfully created

CellCLI> create griddisk RECOC1_CD_01_exadata01_cel02 celldisk=CD_01_exadata01_cel02, size=1.42388916015625T
GridDisk RECOC1_CD_01_exadata01_cel02 successfully created

CellCLI>

-- Added the disks back into ASM
SQL> alter diskgroup DATA add disk
  2  'o/*/DATAC1_CD_00_exadata01_cel02',
  3  'o/*/DATAC1_CD_01_exadata01_cel02',
  4  'o/*/DATAC1_CD_02_exadata01_cel02' force,
  5  'o/*/DATAC1_CD_03_exadata01_cel02' force,
  6  'o/*/DATAC1_CD_04_exadata01_cel02' force,
  7  'o/*/DATAC1_CD_05_exadata01_cel02' force,
  8  'o/*/DATAC1_CD_06_exadata01_cel02' force,
  9  'o/*/DATAC1_CD_07_exadata01_cel02' force,
 10  'o/*/DATAC1_CD_08_exadata01_cel02' force,
 11  'o/*/DATAC1_CD_09_exadata01_cel02' force,
 12  'o/*/DATAC1_CD_10_exadata01_cel02' force,
 13  'o/*/DATAC1_CD_11_exadata01_cel02' force
 14  rebalance power 32;

Diskgroup altered.

SQL> alter diskgroup RECO add disk
  2  'o/*/RECOC1_CD_00_exadata01_cel02',
  3  'o/*/RECOC1_CD_01_exadata01_cel02',
  4  'o/*/RECOC1_CD_02_exadata01_cel02' force,
  5  'o/*/RECOC1_CD_03_exadata01_cel02' force,
  6  'o/*/RECOC1_CD_04_exadata01_cel02' force,
  7  'o/*/RECOC1_CD_05_exadata01_cel02' force,
  8  'o/*/RECOC1_CD_06_exadata01_cel02' force,
  9  'o/*/RECOC1_CD_07_exadata01_cel02' force,
 10  'o/*/RECOC1_CD_08_exadata01_cel02' force,
 11  'o/*/RECOC1_CD_09_exadata01_cel02' force,
 12  'o/*/RECOC1_CD_10_exadata01_cel02' force,
 13  'o/*/RECOC1_CD_11_exadata01_cel02' force
 14  rebalance power 32;

Diskgroup altered.

SQL>



And you are all done now !


Quick links: part 1 / part 2 / part 3


OCI: Datapump between 23ai ADB and 19c ADB using database link

Now that we know how to manually create a 23ai ADB in OCI , that we also know how to create a database link between a 23ai ADB and a 19C AD...