Twitter

Some bash tips -- 2 -- Use trap to make your scripts more robust

This blog is part of a bash tips list I find useful to use on every script -- the whole list of it can be found here.

Now that we know how to efficiently use tempfiles in shell scripts, many of our scripts will look like this:
# 1- Create a tempfile and send an email about the start of this script execution
$  TEMPFILE=$(mktemp)
$  echo "Starting script" | mailx -s "Starting email" maintenance@company.com
# 2- Then many things are happening
$  do_something > "${TEMPFILE}"
$  cat "${TEMPFILE}" | awk . . .  
$  many other things happen . . . 
# 3- When done, we send an email to say we are done and we remove the tempfile
$  echo "End of script" | mailx -s "Script has finished" maintenance@company.com
$  rm -f "${TEMPFILE}"

The above script may seem nice but it has room for improvement; indeed, if something happens during the section "2", in between some steps which really peform what the script has been written for (the sections 1 and 3 are useful but mostly cosmetic), then section 3 may never been executed and then you will never receive the end of the script email nor remove the tempfile you created which may lead to unexpected issues later on. This "something" which could happen to break during the "section 2" above could be:
  • Someone kills the script
  • You manually start the script and CTRL-C it
  • An error is not properly handled in the script and it exits earlier than expected
  • Many other things . . .

Let's write a simple script to simulate what I describe above (the whole code of this example can be found here):
TEMPFILE=$(mktemp)
printf "\033[1;36m%s\033[m\n" "Tempfile is ${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is starting"

printf "\033[1;36m%s\033[m\n" "I am PID $$"
printf "\033[1;36m%s\033[m\n" "A first sleep"
sleep 20
printf "\033[1;36m%s\033[m\n" "A second sleep"
sleep 10

printf "\033[1;36m%s\033[m\n" "Delete tempfile"
rm -f "${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Check if tempfile still exists"
ls -ltr "${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is finished"
Now, let's execute the script and CTRL-C it:
$ ./trap_example1.sh
Tempfile is /tmp/tmp.2efUbu4C9F
Sending an email to let people know the script is starting
I am PID 82
A first sleep
^C
$ ls -ltr /tmp/tmp.2efUbu4C9F
-rw------- 1 fred fred 0 Aug 24 21:20 /tmp/tmp.2efUbu4C9F
$
So here, clearly, the tempfile is still existing after the execution and the final email is not sent. Not really something you want.
Now, let's kill the script (from another session):
$ ./trap_example1.sh
Tempfile is /tmp/tmp.5cSkTvMlpw
Sending an email to let people know the script is starting
I am PID 99
A first sleep
Terminated
$ ls -ltr /tmp/tmp.5cSkTvMlpw
-rw------- 1 fred fred 0 Aug 24 21:24 /tmp/tmp.5cSkTvMlpw
$
PS: I killed the script like this:
$ kill 99
$
Here, same as with CTRL-C, the tempfile is still existing and the last email has not been sent. Not really good.

This is where trap comes into play as it is able to trap exterior signals (kill, CTRL-C, normal exit, ...) and do something in reaction of these signals. You can see that when we kill the process, the error triggered is Terminated, we can then trap this signal and execute a function as below (full code can be found here):
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
}
trap on_term TERM
Let's execute and kill the script again:
$ ./trap_example2.sh
Tempfile is /tmp/tmp.AcJQwh3rx5
Sending an email to let people know the script is starting
I am PID 251
A first sleep
I have been killed !
A second sleep
Delete tempfile
Check if tempfile still exists
ls: cannot access '/tmp/tmp.AcJQwh3rx5': No such file or directory
Sending an email to let people know the script is finished
$
The above example shows that we have well trapped the kill event but the execution of the script continues which is not what we want as ... we have killed the script; it should then stop and not continue when killed; to achieve this, we have to modify the on_term() function like this (full code can be found here):
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
  exit 123
}
trap on_term TERM
Try it again:
$ ./trap_example3.sh
Tempfile is /tmp/tmp.xTt0VMz86u
Sending an email to let people know the script is starting
I am PID 297
A first sleep
I have been killed !
$ echo $?
123
$ ls -ltr /tmp/tmp.xTt0VMz86u
-rw------- 1 fred fred 0 Aug 24 21:57 /tmp/tmp.xTt0VMz86u
$
The script is now better, it traps the kill signal, exits properly. BUT . . . the tempfile is not deleted and the final email not sent; we then have to define an on_exit() function triggered on the EXIT signal which is the exit signal when a script finishes -- and then remove that piece of code from the script itself (full code can be found here):
on_exit() {
  printf "\033[1;36m%s\033[m\n" "Delete tempfile"
  rm -f "${TEMPFILE}"
  printf "\033[1;36m%s\033[m\n" "Check if tempfile still exists"
  ls -ltr "${TEMPFILE}"
  printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is finished"
}
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
  exit 123
}
trap on_term TERM
trap on_exit EXIT
In this version, the on_exit() function will be executed at every exit of the script and then delete the tempfile and send the final email and will also be triggered when we terminate (kill) the script -- as we also exit in this scenario. Looks far better, right ?

The thing now is that CTRL-C won't be trapped by the above code as CTRL-C is the Interrupt signal (SIGINT, kill -2); let's add an ont_int() function to trap CTRL-C, ask a confirmation to the user to be sure he wanted to CTRL-C, quit if the user replies "y" but continue if the user replies "n" (let's say he hit CTRL-C by mistake) -- (full code can be found here):
on_int() {
  printf "\n\033[1;31m%s\033[m\n" "You hit CTRL-C, are you sure ? (y/n)"
  read answer
  if [[ ${answer} = "y" ]]; then
    printf "\033[1;31m%s\033[m\n" "OK, lets quit then"
    exit 456
  else
    printf "\033[1;31m%s\033[m\n" "OK, lets continue then"
  fi
}
trap on_int INT
Let's see it in action:
$ ./trap_example5.sh
^C
You hit CTRL-C, are you sure ? (y/n)
y
OK, lets quit then
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.Kq7ccbIDTp': No such file or directory
Sending an email to let people know the script is finished
$ ./trap_example5.sh
Tempfile is /tmp/tmp.RVO82EVSG6
Sending an email to let people know the script is starting
I am PID 617
A first sleep
^C
You hit CTRL-C, are you sure ? (y/n)
n
OK, lets continue then
A second sleep
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.RVO82EVSG6': No such file or directory
Sending an email to let people know the script is finished
$
We can see that the tempfile is correctly deleted and the email is sent correctly whether we stop the script with CTRL-C or not. It starts to look very good !

To be 100% complete, we need to check what signals we want to trap and react to; all the available signals can be listed with the trap -l or kill -l commands; the most commonly used when people try to stop / kill a process are (usually in this order):
  1. kill PID (the default kill option is -15 which is TERM)
  2. kill -2 PID (Signal is SIGINT -- Interrupt -- CTRL-C)
  3. kill -11 PID (Signal is SIDSEGV -- Segmentation fault)
  4. kill 9 PID (Signal is SIGKILL -- for understandable reasons, this signal is not "trappable")
So to be totally complete with this script, we should trap all these signals; here we will assume that we want to react to kill -11 the same way as we did for the default kill -15 (full code can be found here):
trap on_term TERM SEGV
trap on_exit EXIT
trap on_int INT


We now have a very robust script which will remove the tempfile and send the final email whatever happens to the script. You can add whatever you want into these on_*() fucntions; I have some functions which update some status in some MYSQL tables, you can also kill all the background jobs started during the execution of the script (kill $(jobs -p)) etc ... possibilities are... infinite.

Another cool thing which can be done with trap is to use the SIGUSR1 signal (kill -10) to define your own function and what can be very useful is to implement a function to give information about which step the script is at, or show a number of lines inserted in a table, etc... a kind of dbms_application_info we use in the Oracle world:
show_info() {
  printf "\033[1;36m%s\033[m\n" "Show info about the script, select count(*) from a log table, etc ..."
}
trap show_info USR1
And you can then "query" your script to get information out of it live ! -- classy, right ? :) (full code can be found here)):
$ ./trap_example7.sh
Tempfile is /tmp/tmp.Uc9bAMKWC0
Sending an email to let people know the script is starting
I am PID 752
A first sleep
Show info about the script, select count(*) from a log table, etc ...
A second sleep
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.Uc9bAMKWC0': No such file or directory
Sending an email to let people know the script is finished
$
And this is the way of using this USR1 signal to query your script:
$ kill -10 752
$

As you could see in this blog, trap is the way to go to make robust scripts, try it and for sure you will adopt it !


< Previous bash tip / Next bash tip >

No comments:

Post a Comment

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...