Twitter

Exadata: patchmgr -- dbnodeupdate.sh backup failed on one or more nodes

Patchmgr is nice enough to backup a databae node system before patching it which is a very cool feature. You can also only backup (and not patch)a database node (patchmgr -backup) or patch a database node without backing it up (patchmgr -nobackup) -- but the default is to backup the system just before patching it which is perfectly fine.

But every feature can eventually fail and the whole patching session is not started if the backup fails like in the below log:
2021-02-03 07:31:32 +1100        :Working: dbnodeupdate.sh running a backup on 2 node(s).
2021-02-03 07:34:26 +1100        :ERROR  : dbnodeupdate.sh backup failed on one or more nodes 
SUMMARY OF ERRORS FOR dbnode01:
dbnode01: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log
SUMMARY OF ERRORS FOR dbnode02:
dbnode02: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log

You can then check the logfile pointed by patchmgr in the error stack:
[root@dbnode01 ~]# cat /var/log/cellos/dbserver_backup.sh.log
Feb 03 07:34:09  [INFO]
Feb 03 07:34:09  [INFO] Start to backup root /dev/VGExaDb/LVDbSys1 and boot partitions
Feb 03 07:34:09  [INFO] Check for Volume Group "VGExaDb"
Feb 03 07:34:09  [INFO] Check for the LVM root partition "VGExaDb/LVDbSys1"
Feb 03 07:34:10  [INFO] LVM snapshot Logical Volume
Feb 03 07:34:10  [ERROR] LVM snapshot Logical Volume "VGExaDb/LVDbSys1Snap" already exists
Feb 03 07:34:10  [ERROR] Remove snapshot manually and then restart this utility
[root@dbnode01 ~]#
The issue is very clear here, the LVM snapshot used for the backup already exists most likely due to something wrong during a previous patching session. We even have the solution which is to remove it manually which is an easy procedure:
[root@dbnode01 ~]# lvm lvscan | grep -i snap
inactive Snapshot '/dev/VGExaDb/LVDbSys1Snap' [10.00 GiB] inherit
[root@dbnode01 ~]# lvremove -f /dev/VGExaDb/LVDbSys1Snap
Logical volume "LVDbSys1Snap" successfully removed
[root@dbnode01 ~]# lvm lvscan | grep -i snap
[root@dbnode01 ~]#

Now that the snapshot LVM is removed, you can restart your database nodes patching which now will be successful !

3 comments:

  1. Hello,
    thank You for another great blog post.
    Could You recommend how to protect yourself from unstable network session during patching.
    I can imagine screen would work, but it has limited logging capabilities (on screen buffer or something), do You use any adjustements to .screenrc to make it patch friendly ?
    Regards.
    Greg

    ReplyDelete
    Replies
    1. Hi Greg,

      You can modify screen scrollback buffer whether it is at startup or inside an active session.

      I do not use screen because screen is not default so I just nohup:

      nohup patchmgr -dbnodes .... & (and tail -f nohup.out)

      And all is fine. Screen is a good tool but you have to learn how to use it, it has specific buffer command lines capabilities, is not default and is more useful to share sessions between people (DBA1 starts something in screen then go away and then DBA2 can grab that session back to continue working on what DBA1 started) which is not really what you need with patchmgr; patchmgr needs no interaction, just to run in background so nohup is the way to go -- and nohup is default so you will find it on any system.

      Actually, I also tee it like that to have a logfile with everything which has been done (I use a script checking things and starting patchmgr):

      nohup patchmgr -dbnodes .... | tee -a /patches/mypatch.log & (and tail -f /patches/mypatch.log)

      Regards,

      Delete

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...