An Unknown DBA blog: Exadata: patchmgr -- dbnodeupdate.sh backup failed on one or more nodes

Patchmgr is nice enough to backup a databae node system before patching it which is a very cool feature. You can also only backup (and not patch)a database node (patchmgr -backup) or patch a database node without backing it up (patchmgr -nobackup) -- but the default is to backup the system just before patching it which is perfectly fine.

But every feature can eventually fail and the whole patching session is not started if the backup fails like in the below log:

2021-02-03 07:31:32 +1100        :Working: dbnodeupdate.sh running a backup on 2 node(s).
2021-02-03 07:34:26 +1100        :ERROR  : dbnodeupdate.sh backup failed on one or more nodes 
SUMMARY OF ERRORS FOR dbnode01:
dbnode01: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log
SUMMARY OF ERRORS FOR dbnode02:
dbnode02: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log

You can then check the logfile pointed by patchmgr in the error stack:

[root@dbnode01 ~]# cat /var/log/cellos/dbserver_backup.sh.log
Feb 03 07:34:09  [INFO]
Feb 03 07:34:09  [INFO] Start to backup root /dev/VGExaDb/LVDbSys1 and boot partitions
Feb 03 07:34:09  [INFO] Check for Volume Group "VGExaDb"
Feb 03 07:34:09  [INFO] Check for the LVM root partition "VGExaDb/LVDbSys1"
Feb 03 07:34:10  [INFO] LVM snapshot Logical Volume
Feb 03 07:34:10  [ERROR] LVM snapshot Logical Volume "VGExaDb/LVDbSys1Snap" already exists
Feb 03 07:34:10  [ERROR] Remove snapshot manually and then restart this utility
[root@dbnode01 ~]#

The issue is very clear here, the LVM snapshot used for the backup already exists most likely due to something wrong during a previous patching session. We even have the solution which is to remove it manually which is an easy procedure:

[root@dbnode01 ~]# lvm lvscan | grep -i snap
inactive Snapshot '/dev/VGExaDb/LVDbSys1Snap' [10.00 GiB] inherit
[root@dbnode01 ~]# lvremove -f /dev/VGExaDb/LVDbSys1Snap
Logical volume "LVDbSys1Snap" successfully removed
[root@dbnode01 ~]# lvm lvscan | grep -i snap
[root@dbnode01 ~]#

Now that the snapshot LVM is removed, you can restart your database nodes patching which now will be successful !

3 comments:

GGFebruary 5, 2021 at 1:22 AM
Hello,
thank You for another great blog post.
Could You recommend how to protect yourself from unstable network session during patching.
I can imagine screen would work, but it has limited logging capabilities (on screen buffer or something), do You use any adjustements to .screenrc to make it patch friendly ?
Regards.
Greg
GGFebruary 11, 2021 at 7:01 AM
Thank You.
Regards.
Greg