Patchmgr is nice enough to backup a databae node system before patching it which is a very cool feature. You can also only backup (and not patch)a database node (patchmgr -backup) or patch a database node without backing it up (patchmgr -nobackup) -- but the default is to backup the system just before patching it which is perfectly fine.
But every feature can eventually fail and the whole patching session is not started if the backup fails like in the below log:
You can then check the logfile pointed by patchmgr in the error stack:
Now that the snapshot LVM is removed, you can restart your database nodes patching which now will be successful !
But every feature can eventually fail and the whole patching session is not started if the backup fails like in the below log:
2021-02-03 07:31:32 +1100 :Working: dbnodeupdate.sh running a backup on 2 node(s). 2021-02-03 07:34:26 +1100 :ERROR : dbnodeupdate.sh backup failed on one or more nodes SUMMARY OF ERRORS FOR dbnode01: dbnode01: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log SUMMARY OF ERRORS FOR dbnode02: dbnode02: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log
You can then check the logfile pointed by patchmgr in the error stack:
[root@dbnode01 ~]# cat /var/log/cellos/dbserver_backup.sh.log Feb 03 07:34:09 [INFO] Feb 03 07:34:09 [INFO] Start to backup root /dev/VGExaDb/LVDbSys1 and boot partitions Feb 03 07:34:09 [INFO] Check for Volume Group "VGExaDb" Feb 03 07:34:09 [INFO] Check for the LVM root partition "VGExaDb/LVDbSys1" Feb 03 07:34:10 [INFO] LVM snapshot Logical Volume Feb 03 07:34:10 [ERROR] LVM snapshot Logical Volume "VGExaDb/LVDbSys1Snap" already exists Feb 03 07:34:10 [ERROR] Remove snapshot manually and then restart this utility [root@dbnode01 ~]#The issue is very clear here, the LVM snapshot used for the backup already exists most likely due to something wrong during a previous patching session. We even have the solution which is to remove it manually which is an easy procedure:
[root@dbnode01 ~]# lvm lvscan | grep -i snap inactive Snapshot '/dev/VGExaDb/LVDbSys1Snap' [10.00 GiB] inherit [root@dbnode01 ~]# lvremove -f /dev/VGExaDb/LVDbSys1Snap Logical volume "LVDbSys1Snap" successfully removed [root@dbnode01 ~]# lvm lvscan | grep -i snap [root@dbnode01 ~]#
Now that the snapshot LVM is removed, you can restart your database nodes patching which now will be successful !
Hello,
ReplyDeletethank You for another great blog post.
Could You recommend how to protect yourself from unstable network session during patching.
I can imagine screen would work, but it has limited logging capabilities (on screen buffer or something), do You use any adjustements to .screenrc to make it patch friendly ?
Regards.
Greg
Hi Greg,
DeleteYou can modify screen scrollback buffer whether it is at startup or inside an active session.
I do not use screen because screen is not default so I just nohup:
nohup patchmgr -dbnodes .... & (and tail -f nohup.out)
And all is fine. Screen is a good tool but you have to learn how to use it, it has specific buffer command lines capabilities, is not default and is more useful to share sessions between people (DBA1 starts something in screen then go away and then DBA2 can grab that session back to continue working on what DBA1 started) which is not really what you need with patchmgr; patchmgr needs no interaction, just to run in background so nohup is the way to go -- and nohup is default so you will find it on any system.
Actually, I also tee it like that to have a logfile with everything which has been done (I use a script checking things and starting patchmgr):
nohup patchmgr -dbnodes .... | tee -a /patches/mypatch.log & (and tail -f /patches/mypatch.log)
Regards,
Thank You.
ReplyDeleteRegards.
Greg