Storage servers and DB nodes are Linux machines which use RPM as package manager. This RPM has a database to store the installed packages, etc ... It happened a few times that this database became corrupted after an Exadata patching. And you usually discover this during your next patching session with this kind of complaint from the pre-requisites:
You then have to check the hostname.log of the culprit in the patchmgr directory (usually all the "hostname.log" logfiles have a very similar size so if one is bigger, you most likely have the culprit with a simple ls) and you may find:
We can clearly see 2 issues here:
We can confirm the RPM issue just by queying the RPM database:
And you are good to go, everything is now fixed and clean, your pre-requisites and future patch will now be working !
Note that this example is with a cell but it works the same way with a database node (and any Linux server).
2020-11-12 13:12:15 +1100 :FAILED : Check space and state of cell services.
You can note that the error message is Check space and state of cell services so the first thing is to check whether you have a space issue or not then once you have verified it is not that (dcli -g ~/cell_group -l root "df -h /" -- / needs 3 GB for patching), you know that the issue is with the state of your component (cell or DB node).You then have to check the hostname.log of the culprit in the patchmgr directory (usually all the "hostname.log" logfiles have a very similar size so if one is bigger, you most likely have the culprit with a simple ls) and you may find:
cel02: [ERROR] Can not continue. Runtime configuration is not consistent with values configured in /opt/oracle.cellos/cell.conf. cel02: [ERROR] Run ipconf to correct the inconsistencies. Failed check: /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verboseThis is not excessively verbose but it gives you a good hint and the command which has failed, you can then re execute this command and see how it goes:
[root@cel02 ~]# /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages database in /var/lib/rpm [Info]: ipconf command line: /root/_cellupd_dpullec_/_p_/ipconf.pl -check-consistency -at-runtime -semantic -verbose -nocodes Logging started to /var/log/cellos/ipconf.log [Warning]: File not found /etc/ntp.conf . . . many more info not useful in this scenario . . .
We can clearly see 2 issues here:
- Missing /etc/ntp.conf: this can be ignored, it is documented in note 2689297.1
- A RPM issue
We can confirm the RPM issue just by queying the RPM database:
[root@cel02 ~]# rpm -qa error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages database in /var/lib/rpm [root@cel02 ~]#So here we have to rebuild this corrupted RPM database and the good is that it ca be done 100% onlin with no disruption:
[root@cel02 ~]# mkdir /var/lib/rpm/backup [root@cel02 ~]# cp -a /var/lib/rpm/__db* /var/lib/rpm/backup/ [root@cel02 ~]# rm -f /var/lib/rpm/__db* [root@cel02 ~]# rpm --rebuilddb [root@cel02 ~]#Easy, right ? you can now verify the good health of your RPM database:
[root@cel02 ~]# rpm -qa | wc -l 455 [root@cel02 ~]#You can revalidate the whole configuration:
[root@cel02 ~]# cellcli -e alter cell validate configuration Cell cel02 successfully altered [root@cel02 ~]#
And you are good to go, everything is now fixed and clean, your pre-requisites and future patch will now be working !
Note that this example is with a cell but it works the same way with a database node (and any Linux server).
No comments:
Post a Comment