Twitter

How to Patch Exadata / Upgrade Exadata to 18c and 19c -- Part 2 -- Cells, IB and DB Servers patching

3/ The patching procedure

3.1/ Patching the Cells (aka Storage Servers)


3.1.0/ Information

  • All actions have to be executed as root
  • Patching a cell takes around 1h30 (it may take longer in the event of heavy I/O activity, we experienced some 3 hours per cell patching sessions on an heavy I/O loaded Exadata)
  • You can connect to a cell console to check what is happening during a patch application. Please find the procedure on how to connect to an ILOM console. Once connected, you will see everything that is happening on the server console like the reboot sequence, etc...:
[root@myclusterdb01 dbserver_patch_5.170131]# ssh root@myclustercel01-ilom
 Password:
 Oracle(R) Integrated Lights Out Manager
 Version 3.1.2.20.c r86871
 Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
 > start /sp/console
 Are you sure you want to start /SP/console (y/n)? y
 Serial console started. To stop, type ESC (

3.1.1/ Set disk_repair_time to 24h

[grid@myclusterdb01 ~]$ . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'`
[grid@myclusterdb01 ~]$ sqlplus / as sysasm
SQL> set lines 200
SQL> col attribute for a30
SQL> col value for a50

-- Check the current setting for each diskgroup
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name = 'disk_repair_time' ;

-- For each diskgroup set disk_repair_time to 24h
SQL> alter diskgroup XXXXX SET ATTRIBUTE 'disk_repair_time' = '24h' ;

-- Verify the new setting for each diskgroup
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$sm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name = 'disk_repair_time' ;

3.1.2/ Check the Version of Each Cell Before Patching as well as the cell and grid disk status

All versions should be the same on each cell at this point.
[root@myclusterdb01 ~]# ./exa-versions.sh -c
                     Cluster is a X6-2 Half Rack HC 8TB

 myclustercel01        myclustercel02      myclustercel03         myclustercel04      myclustercel05
----------------------------------------------------------------------------------------------------
12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
----------------------------------------------------------------------------------------------------
[root@myclusterdb01 ~]#
Also, let's check and save the status of the cell disks and grid disks before the maintenance:
[root@exadatadb01]# ./cell-status.sh | tee ~/cell_status_before_patching


3.1.3/ Apply the Patch

A few notes:
  • You may use screen instead of nohup if it is installed
  • You can avoid the -patch_check_prereq step as it should has already been done previously, but I personally like to do it right before the patch to be absolutely sure.
  • You can also use the -smtp_to and the -smtp_from options to receive email notifications: -smtp_from "dba@mycompany.com" -smtp_to "myteam@mycompany.com dba@myclient.com"
  • Ensure you are connected on the database server node 1 (myclusterdb01)
-- Check that ~/cell_group contains the same cells as those from the exa-versions.sh script
[root@flccssdbadm01 ~]# cat ~/cell_group

-- Apply the patch
[root@flccssdbadm01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/patch_18.1.9.0.0.181006
[root@flccssdbadm01 ~]# ./patchmgr -cells ~/cell_group -reset_force
[root@flccssdbadm01 ~]# ./patchmgr -cells ~/cell_group -cleanup
[root@flccssdbadm01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling
[root@flccssdbadm01 ~]# nohup ./patchmgr -cells ~/cell_group -patch -rolling &
[root@flccssdbadm01 ~]# ./patchmgr -cells ~/cell_group -cleanup
You can then follow the patch in the nohup.out file (tail -f nohup.out). You can also check what is happening on the console or check in the patchmgr.out file.

Non-Rolling Manner

You may also want to apply this patch in a non-rolling manner. While this will be faster, it requires a complete downtime of all the databases running on the Exadata. To do so, you will have to stop the cluster and the cells and then remove the "-rolling" option from the previous patchmgr command line :
  • Stop the clusterware
  • [root@myclusterdb01 ~]# crsctl stop cluster -all
    [root@myclusterdb01 ~]# crsctl stop crs
    [root@myclusterdb01 ~]# crsctl check crs
    -- If the cluster is not stopped properly at this step, use the -f option : crsctl stop crs -f
    
  • Stop the cells
  • You can stop the cells services on the cells to be patched using the following command on each cell:
    [root@myclustercel01 ~]# cellcli -e 'alter cell shutdown services all'
    
    Or use the dcli command to launch it on all the cells
    [root@myclusterdb01 ~]# dcli -g ~/cell_group -l root "cellcli -e alter cell shutdown services all"
    
  • Apply the patch
  • [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -reset_force
    [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup
    [root@myclusterdb01 ~]# nohup ./patchmgr -cells ~/cell_group -patch &
    [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup
    

3.1.4/ Check the Version of Each Cell After Patching

All versions should be the same on each cell at this point.
[root@myclusterdb01 ~]# ./exa-versions.sh -c
                     Cluster is a X6-2 Half Rack HC 8TB

 myclustercel01        myclustercel02      myclustercel03         myclustercel04      myclustercel05
----------------------------------------------------------------------------------------------------
18.1.9.0.0.181006   18.1.9.0.0.181006   18.1.9.0.0.181006   18.1.9.0.0.181006   18.1.9.0.0.181006
----------------------------------------------------------------------------------------------------
[root@myclusterdb01 ~]#
Check and save the status of the cell disks and grid disks after the maintenance and compare with the status before the maintenance to be sure everything came back online properly:
[root@exadatadb01]# ./cell-status.sh | tee ~/cell_status_after_patching
[root@exadatadb01]# diff ~/cell_status_before_patching ~/cell_status_after_patching

3.1.5/ Set disk_repair_time back to 3.6h

[grid@myclusterdb01 ~]$ . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'`
[grid@myclusterdb01 ~]$ sqlplus / as sysasm
SQL> set lines 200
SQL> col attribute for a30
SQL> col value for a50

-- Check the current setting for each diskgroup
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name = 'disk_repair_time' ;

-- For each diskgroup set disk_repair_time to 3.6h
SQL> alter diskgroup XXXXX SET ATTRIBUTE 'disk_repair_time' = '3.6h' ;

-- Verify the new setting for each diskgroup
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$sm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name = 'disk_repair_time' ;


3.2/ Patching the IB Switches


3.2.0/ Information

  • Patching an IB Swicth takes around 45 minutes
  • All steps have to be executed as root
  • It is a 100% online operation
  • I've become accustomed to using the database node 1 (myclusterdb01) to patch the IB Switches, which is why I have deployed the root SSH keys from the DB node 1 to the IB Switches in the pre-requisites section
  • Be sure to apply patch 26678971 on your switches before proceeding
  • Please find a procedure to create the ib_group file.

3.2.1/ Check the Version of Each IB Switch Before the Patch

[root@myclusterdb01 ~]# ./exa_versions.sh -i
     -- Infiniband Switches

     myclustersw-ib2    myclustersw-ib3
     ----------------------------------------
         2.2.7-1             2.2.7-1
     ----------------------------------------
[root@myclusterdb01 ~]#
If your IB Switches run a version < 2.2.8, apply patch 26678971 as mentioned in Preventing an Infiniband Switch from becoming un-bootable due to Real Time Clock corruption (Doc ID 2302714.1). This patch takes 1 second to apply, can be applied online and is mandatory as your Switch could become unbootable ! This patch needs to be applied on each Switch.
[root@exadb01 ~]# scp /tmp/patch_bug_26678971 root@switch1:/tmp/. 
[root@exadb01 ~]# ssh switch1
[root@switch1 ~]# cd /tmp
[root@switch1 ~]# ./patch_bug_26678971
. . .
[root@switch1 ~]# exit

3.2.2/ Apply the Patch

A few notes:
  • You can use screen instead of nohup if it is installed on your system
  • Be sure to be connected to the myclusterdb01 server
  • The patch has been copied in /tmp/IB_PATCHING during the pre requisites phase
-- Verify that the ib_group file contains the same switches as those shown by the exa-versions.sh script
[root@flccssdbadm01 ~]# cat ~/ib_group

-- Apply the patch
[root@flccssdbadm01 ~]# cd /tmp/IB_PATCHING/patch_18.1.9.0.0.181006
[root@flccssdbadm01 ~]# ./patchmgr -ibswitches ~/ib_group -ibswitch_precheck -upgrade
[root@flccssdbadm01 ~]# nohup ./patchmgr -ibswitches ~/ib_group -upgrade &
You may face the below error message during the pre-requisites and the patch application (it is confirmed that this bug affects April 2019):
 ----- InfiniBand switch update process ended 2019-05-11 15:49:37 -0500 -----
2019-05-11 15:49:37 -0500 1 of 1 :SUCCESS: DONE: Initiate pre-upgrade validation check on InfiniBand switch(es).
./patchmgr: line 423: g_working_ibswitch_list: No such file or directory
This can be safely ignored as documented in PATCHMGR SUCCEEDS WITH 'G_WORKING_IBSWITCH_LIST: NO SUCH FILE OR DIRECTORY' ERROR (Doc ID 2523025.1).

3.2.3/ Check the Version of Each IB Switch After the Patch

[root@myclusterdb01 ~]# ./exa_versions.sh -i
     -- Infiniband Switches

     myclustersw-ib2    myclustersw-ib3
     ----------------------------------------
         2.2.11-2             2.2.11-2
     ----------------------------------------
[root@myclusterdb01 ~]#

3.2.4/ Drop the temporary directory used for patching

[root@myclusterdb01 ~]# rm -fr /tmp/IB_PATCHING

3.2.roce/ Patching the ROCE Switches


3.2.roce.0/ Information

  • From X8M, Exadata is no more shipped with Infiniband switches but with ROCE switches
  • From X8M, the ibhosts command does not work any more so please have a look at this post to see how to manage the dbs_group, cell_group and roce_group files from X8M.
  • Patching the ROCE switches is a 100% online operation
  • You have to use the script /opt/oracle.SupportTools/RoCE/setup_switch_ssh_equiv.sh to set passwordless connection to the Switches

3.2.roce.1/ The paint does not seem to be dry (< July 2020)

Please note that this paragraph applies to Exadata version < July 2020 and is ~ described in Doc ID 2634626.1. When you seup the passwordless connectivity to the ROCE switches using the script /opt/oracle.SupportTools/RoCE/setup_switch_ssh_equiv.sh, is set up passwordless connectivity from the root user on the DB node you execute the script from to the admin user on the ROCE switches as we can see in the code:
  USER="admin"

if [[ ${USER} == "root" ]]; then
  echo "Cannot setup ssh equivalency for '${USER}' on cisco switch, please use a different user"
  exit 1
fi
And in /opt/oracle.SupportTools/RoCE/verify_roce_cables.py:
  def get_lldp_ne(switch):
    cmd = "ssh admin@%s sh lldp ne" % switch
    print "Enter %s admin password" % switch

This totally makes sense, as far as I know, it is like that on all the Cisco switches.

The only issue here is that patchmgr is usally run as root; and if not, there is a specific option to specify:
[ERROR] Must run as root. Or use --log_dir option to specify alternate working directory as non-root user
and when we look at patchmgr code:
      if [[ ! -z $g_roceswitch_count && $g_roceswitch_count -ne 0 ]]; then
      # Check password equivalency
      for g_roce_switch in $(cat $g_working_roceswitch_list | sed 's/:.*//g')
      do
        ret=$(roce_switch_api/switch_ssh $g_roce_switch "echo" 2>/dev/null 1>/dev/null; echo $?)
        if [[ $ret -gt 0 ]]
        then
            while :
            do
              echo -e -n "\033[40;1;33m\n[NOTE     ]\033[0m Password equivalency is NOT setup for user '`whoami`' to $g_roce_switch from '`hostname`'. Set it up? (y/n): "
              read resp
              resp=$(echo $resp |  tr '[:upper:]' '[:lower:]')
              if [[ $resp == "y" ]]
              then
                #roce_switch_api/setup_switch_ssh_equiv.sh $g_roce_switch | tee $g_wip_stdout
                roce_switch_api/setup_switch_ssh_equiv.sh $g_roce_switch
So clearly, patchmgr verifies the passwordless connection to the ROCE switches with the whoami user so it should be admin here -- and not root as we use to use. The only way we found to work that around was to create an OS user admin and then use the option --log_dir as specified in the error message shown eralier.

3.2.roce.2/ Check the Version of Each ROCE Switch Before the Patch

[root@x8m_01]# ./exa-versions.sh -r
        Cluster is a X8M-2 Elastic Rack HC 14TB
            -- ROCE Switches
            
   x8m_01_roce_01        x8m_01_roce_02
----------------------------------------
     7.0(3)I7(7)         7.0(3)I7(7)
----------------------------------------
[root@x8m_01]#

3.2.roce.3/ Apply the patch

Note the admin in the prompt here.
[admin@x8m_01]$ pwd
/patches/APR2020/30783929/Infrastructure/19.3.7.0.0/ExadataStorageServer_InfiniBandSwitch/patch_19.3.7.0.0.200428
[admin@x8m_01]$ ./patchmgr --roceswitches ~/roce_group --upgrade --log_dir /tmp
. . .
[admin@x8m_01]$

3.2.roce.2/ Check the Version of Each ROCE Switch After the Patch

[root@x8m_01]# ./exa-versions.sh -r
        Cluster is a X8M-2 Elastic Rack HC 14TB
            -- ROCE Switches
            
   x8m_01_roce_01        x8m_01_roce_02
----------------------------------------
     7.0(3)I7(7)         7.0(3)I7(7)
----------------------------------------
[root@x8m_01]#

3.3 Patching the DB nodes (aka Compute Nodes)


3.3.0 - Information

  • All actions must be done as root
  • Patching a database node takes 45 minutes to one hour
  • It is not possible to start the patch from a database node that will be patched (which makes sense). The official way to apply this patch in a rolling manner is to:
    • 1/ Start the patch from the database node 1 to patch all the other nodes
    • 2/ Once done, copy patchmgr and the ISO file to an already patched node and then start the patch to patch the remaining node (node 1 in my example)

    To improve this way of patching the database nodes, I use a cell to start the patch and can then patch all the database servers in one patchmgr session which is easier and more efficient.

  • I use /tmp to save patchmgr and the ISO on the cell node 1 as /tmp exists on 100% of the Unix boxes. An important thing to keep in mind is that /tmp on the cells is regularly purged as described in this documentation. The dbnodeupdate.zip file could then be deleted by this purge mechanism if there is too much time between the time you copy patchmgr and when you use it and then you won't be able to launch patchmgr as dbnodeupdate.zip is mandatory. There are few workarounds to that though:
    • 1/ Copy patchmgr and the ISO file outside of /tmp. I do not recommend this.
    • 2/ Copy patchmgr and the ISO file just before you do the pre requisites and before you apply the patch.
    • 3/ The directories with SAVE in the name are ignored, then you could create a /tmp/SAVE directory to put patchmgr and the ISO file in; this won't survive a reboot though.
    I use a mix of points 2 and 3.

3.3.1 - Check the Image Versions Before the Patch

You may want to have the same version for each node here.
[root@myclusterdb01 ~]# ./exa-versions.sh -d
                     Cluster is a X6-2 Half Rack HC 8TB

 myclusterdb01        myclusterdb02      myclusterdb03         myclusterdb04
--------------------------------------------------------------------------------
12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506   12.2.1.1.7.180506
--------------------------------------------------------------------------------
[root@myclusterdb01 ~]#

3.3.2 - Save the status of the resources before the maintenance

This is an important step. You have to know exactly what is running on your system before proceeding with the database nodes patching to be able to compare with an after maintenance status to ensure that everything is back to normal after a maintenance and then avoid any unpleasantness.
I use the rac-status.sh script to easily achieve this goal. Note that this script has to be executed on a database node (not on a cell server).
[root@myclusterdb01 ~]# ./rac-status.sh -a | tee ~/status_before_patching
You can also use rac-status to follow the patching procedure.

3.3.3 - Apply the Patch

A few notes:
  • Note : if your source version is > 12.1.2.1.1, you can use the -allow_active_network_mounts parameter to be able to patch all the DB nodes without taking care of the NFS. In the oposite, if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS manually before patching the DB nodes
  • You may use screen instead of nohup if it is installed
  • You can avoid the -patch_check_prereq step as it should has already been done previously but I personally like to do it right before the patch to be 100% sure.
  • Be sure to be connected to the cell node 1 (myclustercel01)

Copy patchmgr and the ISO

Whether you choose a rolling or a non-rolling manner, you have to copy patchmgr and the ISO file on the cell node 1 first (do not unzip the ISO file).
[root@myclusterdb01 ~]# ssh root@myclustercel01 rm -fr /tmp/SAVE
[root@myclusterdb01 ~]# ssh root@myclustercel01 mkdir /tmp/SAVE
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataDatabaseServer_OL6/p28666206_*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp ~/dbs_group root@myclustercel01:~/.
[root@myclusterdb01 ~]# ssh root@myclustercel01
[root@myclustercel01 ~]# cd /tmp/SAVE
[root@myclustercel01 ~]# unzip -q p21634633_*_Linux-x86-64.zip
This should create a dbserver_patch_5.180405.1 directory (the name may be slightly different if you use a different patchmgr than the one shipped with the Bundle)

Rolling Manner

The rolling manner will allow you to patch every node one by one. You will always have only one node unavailable, all the other nodes will remain up and running. This method of patching is almost online and could be 100% online with a good service rebalancing
-- Double check that the ~/dbs_group file contains the same database nodes as as those shown by the exa-versions.sh script
[root@myclustercel01 ~]# cat ~/dbs_group

-- Apply the patch
[root@myclustercel01 ~]# cd /tmp/SAVE/dbserver_patch_*
[root@myclustercel01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck  -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts
[root@myclustercel01 ~]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts -rolling &

Non-Rolling Manner

In a non-rolling manner, patchmgr will patch all the nodes at the same time in parallel. It will then be quicker, but a whole downtime is required.
-- Double check that the ~/dbs_group file contains the same database nodes as as those shown by the exa-versions.sh script
[root@myclustercel01 ~]# cat ~/dbs_group

-- Apply the patch
[root@myclustercel01 ~]# cd /tmp/SAVE/dbserver_patch_5.180405.1
[root@myclustercel01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck  -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts
[root@myclustercel01 ~]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts &

3.3.4 - Check the Image Version on Each Node

Be sure that each node now runs the expected version.
[root@myclusterdb01 ~]# ./exa-versions.sh -d
                     Cluster is a X6-2 Half Rack HC 8TB

 myclusterdb01        myclusterdb02      myclusterdb03         myclusterdb04
-------------------------------------------------------------------------------
18.1.9.0.0.181006   18.1.9.0.0.181006   18.1.9.0.0.181006   18.1.9.0.0.181006
-------------------------------------------------------------------------------
[root@myclusterdb01 ~]#

3.3.5 - Verify the status of the resources after the DB nodes patching

Here, we check the status of the resources running and compared with the status before the maintenance to be sure we are idempotent.
[root@exadatadb01]# ./rac-status.sh | tee ~/status_after_patching
[root@exadatadb01]# diff ~/status_before_patching ~/status_after_patching

All the infrastructure components are patched (Cells, DB Nodes and IB Switches), so we can now continue with the software components (Grid and Databases ORACLE_HOME) patching in the Part 3 of this blog.


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 coming soon / Part 6

8 comments:

  1. When patching DBnodes, the option : "-rolling" is not necessary with either dbnodeupdate or patchmgr. You can patch only ONE node at the same time.

    ReplyDelete
    Replies
    1. Hi Anonymous,

      If you do not specify -rolling, it will go non rolling then patch all the nodes in the "dbs_group" file in parallel, please have a look at this patchmgr non rolling log:

      [root@exadata_cel01 dbserver_patch_5.170930]# nohup ./patchmgr -dbnodes ~/dbs_group2357 -upgrade -iso_repo /tmp/SAVE/p26774227_122110_Linux-x86-64.zip -target_version 12.2.1.1.3.171017 &
      [root@exadata_cel01 dbserver_patch_5.170930]# tail -f nohup.out
      . . .
      2018-01-27 10:14:45 -0600 :SUCCESS: DONE: Initiate prepare steps on node(s).
      2018-01-27 10:14:45 -0600 :Working: DO: Initiate update on 5 node(s).
      2018-01-27 10:14:45 -0600 :Working: DO: dbnodeupdate.sh running a backup on 5 node(s).
      2018-01-27 10:20:00 -0600 :SUCCESS: DONE: dbnodeupdate.sh running a backup on 5 node(s).
      2018-01-27 10:20:00 -0600 :Working: DO: Initiate update on node(s)
      2018-01-27 10:20:21 -0600 :SUCCESS: DONE: Get information about any required OS upgrades from node(s).
      2018-01-27 10:20:21 -0600 :Working: DO: dbnodeupdate.sh running an update step on all nodes.
      |||||
      2018-01-27 10:29:24 -0600 :INFO : exadata_db02 is ready to reboot.
      2018-01-27 10:29:24 -0600 :INFO : exadata_db03 is ready to reboot.
      2018-01-27 10:29:24 -0600 :INFO : exadata_db05 is ready to reboot.
      2018-01-27 10:29:24 -0600 :INFO : exadata_db06 is ready to reboot.
      2018-01-27 10:29:24 -0600 :INFO : exadata_db07 is ready to reboot.
      2018-01-27 10:29:24 -0600 :SUCCESS: DONE: dbnodeupdate.sh running an update step on all nodes.
      2018-01-27 10:29:24 -0600 :SUCCESS: DONE: dbnodeupdate.sh running an update step on all nodes.
      2018-01-27 10:29:24 -0600 :Working: DO: Initiate reboot on node(s)
      2018-01-27 10:29:56 -0600 :SUCCESS: DONE: Initiate reboot on node(s)
      2018-01-27 10:29:56 -0600 :Working: DO: Waiting to ensure node(s) is down before reboot.
      2018-01-27 10:31:04 -0600 :SUCCESS: DONE: Waiting to ensure node(s) is down before reboot.
      2018-01-27 10:31:04 -0600 :Working: DO: Waiting to ensure node(s) is up after reboot.
      2018-01-27 10:35:07 -0600 :SUCCESS: DONE: Waiting to ensure node(s) is up after reboot.
      2018-01-27 10:35:07 -0600 :Working: DO: Waiting to connect to node(s) with SSH. During Linux upgrades this can take some time.
      2018-01-27 10:54:36 -0600 :SUCCESS: DONE: Waiting to connect to node(s) with SSH. During Linux upgrades this can take some time.
      2018-01-27 10:54:36 -0600 :Working: DO: Wait for node(s) is ready for the completion step of update.
      2018-01-27 10:55:44 -0600 :SUCCESS: DONE: Wait for node(s) is ready for the completion step of update.
      2018-01-27 10:56:10 -0600 :Working: DO: Initiate completion step from dbnodeupdate.sh on node(s)
      2018-01-27 11:50:53 -0600 :SUCCESS: DONE: Initiate completion step from dbnodeupdate.sh on exadata_db02
      |||||
      2018-01-27 12:01:06 -0600 :SUCCESS: DONE: Initiate completion step from dbnodeupdate.sh on exadata_db03
      2018-01-27 12:11:19 -0600 :SUCCESS: DONE: Initiate completion step from dbnodeupdate.sh on exadata_db05
      2018-01-27 12:20:15 -0600 :SUCCESS: DONE: Initiate completion step from dbnodeupdate.sh on exadata_db06
      2018-01-27 12:29:05 -0600 :SUCCESS: DONE: Initiate completion step from dbnodeupdate.sh on exadata_db07
      2018-01-27 12:38:01 -0600 :SUCCESS: DONE: Initiate update on node(s).
      2018-01-27 12:38:01 -0600 :SUCCESS: DONE: Initiate update on 5 node(s).

      => you can see here that the 5 nodes are patched at the same time in parallel

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. does asm rebalance happens during exadata rolling cell patching?

    ReplyDelete
    Replies
    1. Hi,

      No, there is no rebalance during cells patching. Rebalance happens when a disk is offline for more than disk_repair_time and we increase disk_repair_time before the cells patching to avoid the offline disks being dropped and a rebalance to happen.

      Fred

      Delete
  4. Hi Fred,

    Pls see this https://docs.oracle.com/en/engineered-systems/exadata-database-machine/dbmmn/updating-exadata-software.html#GUID-7DD62002-AC56-4A1E-93E4-11E759F9F369

    "Note:Prior to Oracle Exadata System Software release 19.3.9, you must run patchmgr as a non-root user for patching RoCE Network Fabric switches."

    I have done it on an earlier version from a user called dbmadmin and it worked fine. It was mentioned somewhere that you should use this user. I can't recall where. And I think this user is created by default on DB nodes.

    Thanks.

    ReplyDelete
    Replies
    1. Dbmadmin didn't work for us, admin did.

      I will include this doc in the blog, this is very interesting, thanks !!, another documented bug which becomes a feature :)

      Delete
    2. Ok, this is the note that talks about it Procedure for upgrading the RoCE switch firmware (Doc ID 2634626.1)

      Delete

Some bash tips -- 4 -- Prevent concurrent executions of a script

Now that you have followed many tips to improve your shell scripting skills :), you will write useful scripts which will then be frequentl...