An Unknown DBA blog: swid -- Schedule WIth Dependencies (and parallelism) !

We sometimes expect things to be easy but they often are not and/or just don't exist. This is the case of scheduling with dependencies as we can see many people asking about this on various forums as cron does not provide this feature. Indeed, some software allow to schedule with dependencies but they are far more complex / expensive / difficult / overwhelmed / name it . . . when you just need to schedule some jobs on a server. Also, some of these software have limitation in term of performance when it comes to complex branching dependencies as I explain in this post.
And it is exactly when I made this airflow alternative in Google Cloud that I rethought about all the people stuck with no easy way to schedule some jobs with dependencies so .. I made it once for all hoping to help many.

The need

So the need here is to be able to schedule some jobs with dependencies as shown on the below graph:

Some explanations on what we want to achieve on the above example:

1/ A start step starts the job chain
2/ 4 steps are started in parallel (do_stuff1, do_stuff2, do_stuff3 and do_stuff4)
3/ A step named do_stuff5 is started after all the 4 do_stuff[1-4] are finished; do_stuff5 has to wait for all the previous steps to finish before being executed
4/ An end step finishes the chain once do_stuff5 has finished

Note that each step can be anything: an email we send, a shell script starting a backup, a script executing some SQL against any database, a Machine Learning model being trained, ... anything. Also note that I shown some parallel executions on top of the dependency chain here as you may want to do parallel processing as well -- but it is not an obligation, everything can be run in serial.

The brain

I made the swid.sh script for this purpose. swid is easy and relies on makefiles as make executes random code with dependencies and parallelism since ... 1976 and knowing how many people and tools uses make everyday, we can be sure that it is a very powerful tool. Another asset is that make exists for any system and is very most likely already available on your system.
I won't explain how makefiles works here but feel free to read this post where I made an introduction of them.

The config

To use swid and then create your jobs dependencies, you will need to write a config file I tried to make as easy as possible. It is composed of 3 sections (the job dependency file described below is the one to run the jobs as described in the above schema):

- config:

This "- config:" section is optional and contains (as of now) only 2 possible parameters:

on_success: defines the action to perform when all the jobs are successfully executed
on_failre: defines the action to perform if an error happens during the execution of the job chain

Here is an example of what this section looks like:

- config:                                                         # Format is "parameter:value"                                                                                                                                              on_success      :       echo "all good !"                                                                                                                                                                                                       on_failure      :       echo "oh no, it failed !!"

- names:

This section is mandatory and is used to define aliases of your commands like show below; this is to keep the configuration file clean and visible when it is about to write the job dependencies:

- names:                                                          # Format is "alias:command_to_execute"                                                                                                                                  start           :       ~/swid/examples/ISleep.sh 1                                                                                                                                                                                             do_stuff1       :       ~/swid/examples/ISleep.sh 2                                                                                                                                                                                         do_stuff2       :       ~/swid/examples/ISleep.sh 6               # This is a comment                                                                                                                                                     do_stuff3       :       ~/swid/examples/ISleep.sh 4                                                                                                                                                                                             do_stuff4       :       ~/swid/examples/ISleep.sh 8                                                                                                                                                                                             do_stuff5       :       ~/swid/examples/ISleep.sh 3                                                                                                                                                                                     #       do_stuff6       :       ~/swid/examples/ISleep.sh 14      # This wont be executed as it is commented out                                                                                                                          end:    ~/swid/examples/ISleep.sh 1

In this example, all the aliases use the same script (available in the example directory of this git repository) I made which just sleeps for the number of seconds specified in parameters. Still, every alias is executing a different script / command.

- dependencies:

This section is mandatory and is the one we use to define the dependencies between each step using the aliases we have defined in the - names: section:

- dependencies:                                                         # Format is "alias:dependency1 dependency2"                                                                                                                                start           :                                                       # First step to execute, no dependency to anything                                                                                                                      do_stuff1       :       start                                           # Depends on "start"                                                                                                                                                    do_stuff2       :       start                                           # Depends on "start"                                                                                                                                                    do_stuff3       :       start                                           # Depends on "start"                                                                                                                                                    do_stuff4       :       start                                           # Depends on "start"                                                                                                                                                    do_stuff5       :       do_stuff1 do_stuff2 do_stuff3 do_stuff4         # Depends on do_stuff1, do_stuff2, do_stuff3 and do_stuff4                                                                                                              end: do_stuff5

You can see that the way of defining the dependencies are very simple, it just has the syntax "alias:dependency1 dependency2", the first "start" step having no dependency as it depends on nothing as being the first step to be executed.

More information on the job dependency file:

Empty lines are ignored
Lines starting with "#" are ignored
Everything which is after a "#" is a comment
You can add spaces or tab before or after the alias or the ":" separator to make your job dependency file nice and visible
The sections have to start with "- " to be detected by swid.sh
There can be as many dependencies as you wish, make is very robust, the other tool I made based on makefiles executes hundreds of steps with dependencies all day long on production and make never shown any trace of weakness.

A first execution

Let's show how to execute this job dependency file (everything to run it is available in the examples directory of the git repository):

You can see on the above example that the do_stuff[1-4] goes in parallel and do_stuff5 waits for the last of the do_stuff[1-4] step to be finished to start to respect the dependencies and the parallelism.

Options:

swid has few options, I won't describe all of them in details as they are self explanatory and well explained in the help which can be triggered with the -h option:

Extra notes

Feel free to git clone the git repository and try with the files provided in the examples directory
swid.sh can indeed be cronned
Temporary makefiles are kept $RETENTION days in a tmp directory for debug and educational purpose
Logfiles are kept $RETENTION days in a logs directory
Tempfiles and logfiles are purged keeping $RETENTION days at the end of each execution
Don't be afraid by the ~ 300 lines of code of swid, the smartness of it takes 10 lines, the rest is variables verification and to make things pretty . .
Feel free to reach out to me for any question about swid

1 comment:

Rodrigo JorgeJuly 8, 2020 at 8:01 AM
Fred, this tool is amazing! I never thought of using makefile to do flow and dependencies control in shellscripts and it makes perfect sense. Thank you for that contribution. If you don't mind, I will fork the repository and maybe one day suggest improvements =D

An Unknown DBA blog

Twitter

swid -- Schedule WIth Dependencies (and parallelism) !