Twitter

Some bash tips -- 4 -- Prevent concurrent executions of a script

Now that you have followed many tips to improve your shell scripting skills :), you will write useful scripts which will then be frequently used and one day you will have to implement a feature to avoid concurrent executions of your script -- this often happens with backup scripts for example.

A bad solution to achieve this goal I often saw implemented is to count the number of process of your script running like:
$ cat backup.sh
#!/bin/bash
ps -ef | grep backup.sh | grep -v grep
$ ./backup.sh
fred       412   167  0 22:51 tty3     00:00:00 /bin/bash ./backup.sh
$
So here, we grep the execution(s) of the script (backup.sh) and avoid the grep which grep it; then usually, when one implements this solution, they consider that a concurrent script is running if this number of running process is >= 1. But . . . what if an extra execution of the script is starting at the same time ? what if a script named another_backup.sh script is running ? we would then grep 2 processes which will be more than 1 then we would wrongly consider that a concurrent script is already running.
$ ./backup.sh
fred       397   167  0 22:52 tty3     00:00:00 /bin/bash ./another_backup.sh  <== ?
fred       400   167  0 22:52 tty3     00:00:00 /bin/bash ./backup.sh
$
We could obviously grep ^backup.sh$ to be sure not to grep another_backup.sh but you cannot prevent anyone else executing his own backup.sh script from another location to backup something different than what you want to backup; on top of that, what if another user also runs the script (which let's say is allowed):
$ ./backup.sh
root       452   433  0 22:59 tty4     00:00:00 /bin/bash ./backup.sh  <== another execution by another user
fred       454   167  0 22:59 tty3     00:00:00 /bin/bash ./backup.sh
$
. . . in short, this is not the best solution to prevent concurrent executions of a script.

A good and robust solution is to:
  1. Check in a "lockfile" the process number of the previous execution
  2. Verify if the previous process is still running
  3. If yes, we stop here
  4. If not, we save the process of the current execution in the "lockfile" and we continue executing the script
As we saw earlier, ps | grep is not the best way of checking the point 2; a stonger way is to use kill with the -0 option which returns 0 if a process is running and something different than 0 if the process no longer exists:
$ kill -0 167
$ echo $?
0  <== process exists
$ kill -0 168
-bash: kill: (168) - No such process  <== process does not exist any more
$ echo $?
1
$
A complete piece of code using a lockfile and kill -0 to check if the process still exists is as below:
      TS="date "+%Y-%m-%d_%H:%M:%S""   # A timestamp for a nice outut in a logfile
LOCKFILE="${HOME}/.lockfile"           # The lockfile
if [[ -s ${LOCKFILE} ]]; then          # File exists and is not empty
    if ! kill -0 $(cat ${LOCKFILE}) 2> /dev/null; then    # pid does not exist
        echo "$($TS) [WARNING] The lockfile ${LOCKFILE} exists but the pid it refers to ($(cat ${LOCKFILE})) does not exist any more, we can then safely ignore it and continue."
    else                                                  # pid exists
        echo "$($TS) [ERROR] Concurrent execution detected; cannot continue."
        exit 2
    fi
fi
echo $$ > "${LOCKFILE}"                # Update the lockfile with the current PID
. . .
There you go, you have a very robust way to avoid 2 concurrent executions of the same script. If you want to allow a different user to use the same script concurrently as your user, put the lockfile in the home directory ($HOME) of each user and if you want to prevent any other user to execute the same script as you, just use a lockfile located in a directory which can be accessed by anyone -- easy-peasy.

I have used this mechanism more than once with great success -- I strongly recommend to use it to prevent concurrent executions of the same script.


< Previous bash tip / Next bash tip (coming soon) >

Some bash tips -- 3 -- Exit codes management in pipelines

This blog is part of a bash tips list I find useful to use on every script -- the whole list of it can be found here.

Return codes (aka exit codes) are heavily used in any language to know if a command has succeded or has failed. In shell script, the return code of the last executed command is accessible in the $? variable and is 0 if the command has completed successfully and different from 0 if the command has failed like we can see in the below example:
$ echo "blabla" > /tmp/iexist
$ cat /tmp/iexist
blabla
$ echo $?
0
$ cat /tmp/idontexist
cat: /tmp/idontexist: No such file or directory
$ echo $?
1
$
A return code can easily be tested and different actions triggered or a message shown depending on the success or the failure of a command:
$ cat /tmp/iexist
blabla
$ if [ $? -eq 0 ]; then echo "Success"; else echo "Failure"; fi
Success
$ cat /tmp/idontexist
cat: /tmp/idontexist: No such file or directory
$ if [ $? -eq 0 ]; then echo "Success"; else echo "Failure"; fi
Failure
$
In the above example, we can clearly see (and test) that the cat command succeeds when a file exists and fails when a file does not exist.

So far so good but now comes the scenario where you also need to sort the output of the file you just cat:
$ cat /tmp/iexist | sort
blabla
$ if [ $? -eq 0 ]; then   echo "Success"; else   echo "Failure"; fi
Success
$ cat /tmp/idontexist | sort
cat: /tmp/idontexist: No such file or directory
$ if [ $? -eq 0 ]; then   echo "Success"; else   echo "Failure"; fi
Success   <== ?? what ??
$
We can see that in this scenario, $? returns 0 even if the file I cat does not exist ! This is because $? returns the last command return code which in my case is 0 because this is the return code of the sort command !

So how to get the return code of the cat and not the sort ? well, bash has a mechanism for this, the return codes are in the PIPESTATUS array which you can check instead of $? in a multiple pipes command line scenario:
$ cat /tmp/iexist | sort
blabla
$ cecho ${PIPESTATUS[0]}":"${PIPESTATUS[1]}
0:0
$ cat /tmp/idontexist | sort
cat: /tmp/idontexist: No such file or directory
$ echo ${PIPESTATUS[0]}":"${PIPESTATUS[1]}
1:0
$
Here, ${PIPESTATUS[0]} (the first element of the PIPESTATUS array) contains the return code of the first command which is the cat and ${PIPESTATUS[1]} the return code of the sort command.

PIPESTATUS is a special array, you can print it as shown in the more complex example below:
$ cat /tmp/iexist | tac | sort | grep "something" | sort | tac
$ echo ${PIPESTATUS[@]}
0 0 0 1 0 0
      ^
      |
grep grepped nothing so it returned 1
$
Alternatively, you can aso use the set -o pipefail directive on top of our script to return the rightmost non zero exit code of a pipeline; you won't know exactly which one has failed but you will know that something in the pipeline has failed -- which may be enough in some cases:
$ cat /tmp/idontexist | sort
cat: /tmp/idontexist: No such file or directory
$ echo $?
0  <== default behavior
$ set -o pipefail
$ cat /tmp/idontexist | sort
cat: /tmp/idontexist: No such file or directory
$ echo $?
1  <== You don't know which part has failed but the overall result of the pipe is failure
$

PIPESTATUS and pipefail are good to be known to write robust scripts -- enjoy !


< Previous bash tip / Next bash tip >

Some bash tips -- 2 -- Use trap to make your scripts more robust

This blog is part of a bash tips list I find useful to use on every script -- the whole list of it can be found here.

Now that we know how to efficiently use tempfiles in shell scripts, many of our scripts will look like this:
# 1- Create a tempfile and send an email about the start of this script execution
$  TEMPFILE=$(mktemp)
$  echo "Starting script" | mailx -s "Starting email" maintenance@company.com
# 2- Then many things are happening
$  do_something > "${TEMPFILE}"
$  cat "${TEMPFILE}" | awk . . .  
$  many other things happen . . . 
# 3- When done, we send an email to say we are done and we remove the tempfile
$  echo "End of script" | mailx -s "Script has finished" maintenance@company.com
$  rm -f "${TEMPFILE}"

The above script may seem nice but it has room for improvement; indeed, if something happens during the section "2", in between some steps which really peform what the script has been written for (the sections 1 and 3 are useful but mostly cosmetic), then section 3 may never been executed and then you will never receive the end of the script email nor remove the tempfile you created which may lead to unexpected issues later on. This "something" which could happen to break during the "section 2" above could be:
  • Someone kills the script
  • You manually start the script and CTRL-C it
  • An error is not properly handled in the script and it exits earlier than expected
  • Many other things . . .

Let's write a simple script to simulate what I describe above (the whole code of this example can be found here):
TEMPFILE=$(mktemp)
printf "\033[1;36m%s\033[m\n" "Tempfile is ${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is starting"

printf "\033[1;36m%s\033[m\n" "I am PID $$"
printf "\033[1;36m%s\033[m\n" "A first sleep"
sleep 20
printf "\033[1;36m%s\033[m\n" "A second sleep"
sleep 10

printf "\033[1;36m%s\033[m\n" "Delete tempfile"
rm -f "${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Check if tempfile still exists"
ls -ltr "${TEMPFILE}"
printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is finished"
Now, let's execute the script and CTRL-C it:
$ ./trap_example1.sh
Tempfile is /tmp/tmp.2efUbu4C9F
Sending an email to let people know the script is starting
I am PID 82
A first sleep
^C
$ ls -ltr /tmp/tmp.2efUbu4C9F
-rw------- 1 fred fred 0 Aug 24 21:20 /tmp/tmp.2efUbu4C9F
$
So here, clearly, the tempfile is still existing after the execution and the final email is not sent. Not really something you want.
Now, let's kill the script (from another session):
$ ./trap_example1.sh
Tempfile is /tmp/tmp.5cSkTvMlpw
Sending an email to let people know the script is starting
I am PID 99
A first sleep
Terminated
$ ls -ltr /tmp/tmp.5cSkTvMlpw
-rw------- 1 fred fred 0 Aug 24 21:24 /tmp/tmp.5cSkTvMlpw
$
PS: I killed the script like this:
$ kill 99
$
Here, same as with CTRL-C, the tempfile is still existing and the last email has not been sent. Not really good.

This is where trap comes into play as it is able to trap exterior signals (kill, CTRL-C, normal exit, ...) and do something in reaction of these signals. You can see that when we kill the process, the error triggered is Terminated, we can then trap this signal and execute a function as below (full code can be found here):
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
}
trap on_term TERM
Let's execute and kill the script again:
$ ./trap_example2.sh
Tempfile is /tmp/tmp.AcJQwh3rx5
Sending an email to let people know the script is starting
I am PID 251
A first sleep
I have been killed !
A second sleep
Delete tempfile
Check if tempfile still exists
ls: cannot access '/tmp/tmp.AcJQwh3rx5': No such file or directory
Sending an email to let people know the script is finished
$
The above example shows that we have well trapped the kill event but the execution of the script continues which is not what we want as ... we have killed the script; it should then stop and not continue when killed; to achieve this, we have to modify the on_term() function like this (full code can be found here):
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
  exit 123
}
trap on_term TERM
Try it again:
$ ./trap_example3.sh
Tempfile is /tmp/tmp.xTt0VMz86u
Sending an email to let people know the script is starting
I am PID 297
A first sleep
I have been killed !
$ echo $?
123
$ ls -ltr /tmp/tmp.xTt0VMz86u
-rw------- 1 fred fred 0 Aug 24 21:57 /tmp/tmp.xTt0VMz86u
$
The script is now better, it traps the kill signal, exits properly. BUT . . . the tempfile is not deleted and the final email not sent; we then have to define an on_exit() function triggered on the EXIT signal which is the exit signal when a script finishes -- and then remove that piece of code from the script itself (full code can be found here):
on_exit() {
  printf "\033[1;36m%s\033[m\n" "Delete tempfile"
  rm -f "${TEMPFILE}"
  printf "\033[1;36m%s\033[m\n" "Check if tempfile still exists"
  ls -ltr "${TEMPFILE}"
  printf "\033[1;36m%s\033[m\n" "Sending an email to let people know the script is finished"
}
on_term() {
  printf "\033[1;31m%s\033[m\n" "I have been killed !"
  exit 123
}
trap on_term TERM
trap on_exit EXIT
In this version, the on_exit() function will be executed at every exit of the script and then delete the tempfile and send the final email and will also be triggered when we terminate (kill) the script -- as we also exit in this scenario. Looks far better, right ?

The thing now is that CTRL-C won't be trapped by the above code as CTRL-C is the Interrupt signal (SIGINT, kill -2); let's add an ont_int() function to trap CTRL-C, ask a confirmation to the user to be sure he wanted to CTRL-C, quit if the user replies "y" but continue if the user replies "n" (let's say he hit CTRL-C by mistake) -- (full code can be found here):
on_int() {
  printf "\n\033[1;31m%s\033[m\n" "You hit CTRL-C, are you sure ? (y/n)"
  read answer
  if [[ ${answer} = "y" ]]; then
    printf "\033[1;31m%s\033[m\n" "OK, lets quit then"
    exit 456
  else
    printf "\033[1;31m%s\033[m\n" "OK, lets continue then"
  fi
}
trap on_int INT
Let's see it in action:
$ ./trap_example5.sh
^C
You hit CTRL-C, are you sure ? (y/n)
y
OK, lets quit then
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.Kq7ccbIDTp': No such file or directory
Sending an email to let people know the script is finished
$ ./trap_example5.sh
Tempfile is /tmp/tmp.RVO82EVSG6
Sending an email to let people know the script is starting
I am PID 617
A first sleep
^C
You hit CTRL-C, are you sure ? (y/n)
n
OK, lets continue then
A second sleep
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.RVO82EVSG6': No such file or directory
Sending an email to let people know the script is finished
$
We can see that the tempfile is correctly deleted and the email is sent correctly whether we stop the script with CTRL-C or not. It starts to look very good !

To be 100% complete, we need to check what signals we want to trap and react to; all the available signals can be listed with the trap -l or kill -l commands; the most commonly used when people try to stop / kill a process are (usually in this order):
  1. kill PID (the default kill option is -15 which is TERM)
  2. kill -2 PID (Signal is SIGINT -- Interrupt -- CTRL-C)
  3. kill -11 PID (Signal is SIDSEGV -- Segmentation fault)
  4. kill 9 PID (Signal is SIGKILL -- for understandable reasons, this signal is not "trappable")
So to be totally complete with this script, we should trap all these signals; here we will assume that we want to react to kill -11 the same way as we did for the default kill -15 (full code can be found here):
trap on_term TERM SEGV
trap on_exit EXIT
trap on_int INT


We now have a very robust script which will remove the tempfile and send the final email whatever happens to the script. You can add whatever you want into these on_*() fucntions; I have some functions which update some status in some MYSQL tables, you can also kill all the background jobs started during the execution of the script (kill $(jobs -p)) etc ... possibilities are... infinite.

Another cool thing which can be done with trap is to use the SIGUSR1 signal (kill -10) to define your own function and what can be very useful is to implement a function to give information about which step the script is at, or show a number of lines inserted in a table, etc... a kind of dbms_application_info we use in the Oracle world:
show_info() {
  printf "\033[1;36m%s\033[m\n" "Show info about the script, select count(*) from a log table, etc ..."
}
trap show_info USR1
And you can then "query" your script to get information out of it live ! -- classy, right ? :) (full code can be found here)):
$ ./trap_example7.sh
Tempfile is /tmp/tmp.Uc9bAMKWC0
Sending an email to let people know the script is starting
I am PID 752
A first sleep
Show info about the script, select count(*) from a log table, etc ...
A second sleep
Delete tempfile
Check if the tempfile still exists
ls: cannot access '/tmp/tmp.Uc9bAMKWC0': No such file or directory
Sending an email to let people know the script is finished
$
And this is the way of using this USR1 signal to query your script:
$ kill -10 752
$

As you could see in this blog, trap is the way to go to make robust scripts, try it and for sure you will adopt it !


< Previous bash tip / Next bash tip >

Some bash tips -- 0 -- Introduction / menu

I have always been coding more or less in my life producing some useful scripts mainly for the Oracle community. During the last 1+ year, I have coded more for a big project to move data from Teradata to Google Cloud and also participated in this Google open source project.

Coding more and for more intensively used scripts made me faced new problems and then found new solutions / tips which I now apply in every of my scripts. Indeed, it is not because a script is not heavily used every 5 minutes to move TBs of data from a system A to a system B that it does no deserve to be nice and robust.

These below tips do not pretend to be exhaustive nor to be a kind of ulimate bash guide nor to be super obfuscated rocket science, it is just few tips I found very useful to know and to use in every script you could write to make them nice, easily maintainable and robust:

Some bash tips -- 4 -- Prevent concurrent executions of a script

Now that you have followed many tips to improve your shell scripting skills :), you will write useful scripts which will then be frequentl...