Twitter

ExaWatcher: Manage the archives destination

ExaWatcher is a tool installed by default on Exadata (database nodes and storage cells) which came in replacement of OSWatcher starting with 11.2.3.3. This tool collects system information (ps, top, vmstat, etc ...) which is useful to troubleshoot when an issue occurs.

The thing with ExaWatcher is that the information it collects, even if compressed, can use a lot of space specially as it uses / by default and one day you'll get the below siuation -- your / is a bit too full:
[root@exadatadb01] df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G   24G  4.0G  86% /
[root@exadatadb01]# du -sh /opt/oracle.ExaWatcher/
2.9G    /opt/oracle.ExaWatcher/
[root@exadatadb01]#

You may then want to purge the ExaWatcher archives and/or moving these files to somwehere else like to a NFS for example.
Let's start by having a look at the ExaWatcher configuration file which is located as shown below:
[root@exadatadb01]# locate ExaWatcher.conf
/opt/oracle.ExaWatcher/ExaWatcher.conf
[root@exadatadb01]#

And you'll see the ResultDir directive to specify where ExaWatcher will be writing its files:
<ResultDir> /opt/oracle.ExaWatcher/archive

We then just have to stop ExaWatcher, update the configuration file and then restart ExaWatcher. Before this, we will create some new directories to handle the ExaWatcher files on a NFS (/nfs); one directory by node
[root@exadatadb01]# for i in `cat ~/dbs_group`
> do
> mkdir -p /nfs/exawatcherlogs/$i
> done
[root@exadatadb01]# ls -l /nfs/exawa*/*
/nfs/exawatcherlogs/exadatadb01:
total 0
/nfs/exawatcherlogs/exadatadb02:
total 0
/nfs/exawatcherlogs/exadatadb03:
total 0
/nfs/exawatcherlogs/exadatadb04:
total 0
[root@exadatadb01]#

Now, stop ExaWatcher
[root@exadatadb01]# pwd
/opt/oracle.ExaWatcher
[root@exadatadb01]# ./ExaWatcher.sh --stop
[INFO     ] Stopping ExaWatcher: Post processing vmstat data file...
[1572217463][2019-10-27 19:04:23][INFO][/opt/oracle.ExaWatcher/ExecutorExaWatcher.pl][exadataLogger::Logger][] VmstatPostProcessing for /oracle/nasdev_backup1/exawatcherlogs_/Vmstat.ExaWatcher/2019_10_27_18_58_00_VmstatExaWatcher_.mycompany.com.dat.

[INFO     ] Stopping ExaWatcher: Zipping unzipped ExaWatcher data files...
[INFO     ] Stopping ExaWatcher: All unzipped ExaWatcher results have been zipped accordingly.
[root@exadatadb01]#

Update ExaWatcher.conf with the below lines (node 1 as an example)
# Fred Denis -- Oct 28th 2019 -- ticket 123456
#<ResultDir> /opt/oracle.ExaWatcher/archive
<ResultDir> /nfs/exawatcherlogs/exadatadb01

Move the archives to the new directory (node 1 as an example)
$ mv /opt/oracle.ExaWatcher/archive/* /nfs/exawatcherlogs/exadatadb01/.

Restart ExaWatcher (we can still see the oswatcher legacy here)
[root@exadatadb01]# ps -ef | grep -i exawatch
root     368051 148216  0 19:09 pts/0    00:00:00 grep -i exawatch
[root@exadatadb01]# /opt/oracle.cellos/validations/bin/vldrun.pl -script oswatcher
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -script oswatcher
Run validation oswatcher - PASSED
[root@exadatadb01]#

Let it few seconds to start and check the path where ExaWatcher is now writting
[root@exadatadb01]# ps -ef | grep -i exawatch
. . .
5 -n 720 | sed 's/[ ?]*$//g' | grep --regexp "top - [0-9][0-9]:[0-9][0-9]:[0-9][0-9] up.*load average.*" -A 1000 2>/dev/null >> /nfs/exawatcherlogs/exdatadb01/Top.ExaWatcher/2019_10_27_22_16_23_TopExaWatcher_exadatadb01.mycompany.com.dat
root     233057 232998  0 22:16 pts/0    00:00:00 sh -c /usr/bin/iostat -t -x -p  5  720 2>/dev/null >> /nfs/exawatcherlogs/exdatadb01/Iostat.ExaWatcher/2019_10_27_22_16_23_IostatExaWatcher_exadatadb01.mycompany.com.dat
root     233081 232998  0 22:16 pts/0    00:00:00 sh -c
. . .
[root@exadatadb01]#

It is not because your ExaWatcher is now writing on a NFS that you want to let the ExaWatcher archives on disk forever. If you look at ExaWatcher.conf, you will find the SpaceLimit directive as shown below:
 6047
# Hard limit: 600MB
#       Exadata Cell node: 600MB
#       Exadata DB node/non-Exadata:
#           20% of the file system capacity if mounted on "/"
#           80% of the file system capacity if mounted on other
#   At anytime, the limit will be set to the lower of the specified
#   or the hard limit.
ExaWatcher will keep the archives in a limit of 3 GB on a database server and 600 MB for a storage server so you can keep it like this if it suits your needs. I personnally like the per month file deletion so I personnally add a find in a /etc/cron.daily/oracle file which I use to complete my logrotate configurations to ensure a efficient purge. Also, I am sure that find has no bug and will work 100% of the time whereas, sometimes, the Oracle tools may be a bit buggy . . .
find /nfs/exawatcherlogs/exdatadb* -type f -mtime +30 -delete


And you'll then ensure some space for you precious / forever -- Easy peasy !

3 comments:

  1. Nice to come across this when we were struggling with reduced space due to Exawatcher logs.
    Very crisp and precise

    ReplyDelete

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...