An Unknown DBA blog: Exadata: Repair a corrupted / broken RPM database

Storage servers and DB nodes are Linux machines which use RPM as package manager. This RPM has a database to store the installed packages, etc ... It happened a few times that this database became corrupted after an Exadata patching. And you usually discover this during your next patching session with this kind of complaint from the pre-requisites:

2020-11-12 13:12:15 +1100        :FAILED : Check space and state of cell services.

You can note that the error message is Check space and state of cell services so the first thing is to check whether you have a space issue or not then once you have verified it is not that (dcli -g ~/cell_group -l root "df -h /" -- / needs 3 GB for patching), you know that the issue is with the state of your component (cell or DB node).

You then have to check the hostname.log of the culprit in the patchmgr directory (usually all the "hostname.log" logfiles have a very similar size so if one is bigger, you most likely have the culprit with a simple ls) and you may find:

cel02: [ERROR] Can not continue. Runtime configuration is not consistent with values configured in /opt/oracle.cellos/cell.conf.
cel02: [ERROR] Run ipconf to correct the inconsistencies. Failed check: /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose

This is not excessively verbose but it gives you a good hint and the command which has failed, you can then re execute this command and see how it goes:

[root@cel02 ~]# /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
[Info]: ipconf command line: /root/_cellupd_dpullec_/_p_/ipconf.pl -check-consistency -at-runtime -semantic -verbose -nocodes
Logging started to /var/log/cellos/ipconf.log
[Warning]: File not found /etc/ntp.conf
. . . many more info not useful in this scenario . . .

We can clearly see 2 issues here:

Missing /etc/ntp.conf: this can be ignored, it is documented in note 2689297.1
A RPM issue

We can confirm the RPM issue just by queying the RPM database:

[root@cel02 ~]# rpm -qa
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 23845/140082556999744 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm
[root@cel02 ~]#

So here we have to rebuild this corrupted RPM database and the good is that it ca be done 100% onlin with no disruption:

[root@cel02 ~]# mkdir /var/lib/rpm/backup
[root@cel02 ~]# cp -a /var/lib/rpm/__db* /var/lib/rpm/backup/
[root@cel02 ~]# rm -f /var/lib/rpm/__db*
[root@cel02 ~]# rpm --rebuilddb
[root@cel02 ~]#

Easy, right ? you can now verify the good health of your RPM database:

[root@cel02 ~]# rpm -qa | wc -l
455
[root@cel02 ~]#

You can revalidate the whole configuration:

[root@cel02 ~]# cellcli -e alter cell validate configuration
Cell cel02 successfully altered
[root@cel02 ~]#

And you are good to go, everything is now fixed and clean, your pre-requisites and future patch will now be working !
Note that this example is with a cell but it works the same way with a database node (and any Linux server).

An Unknown DBA blog

Twitter

Exadata: Repair a corrupted / broken RPM database

No comments:

Post a Comment

OCI: Datapump between 23ai ADB and 19c ADB using database link

Linkedin

Twitter