An Unknown DBA blog: Some bash tips -- 18 -- paste

This blog is part of a shell tips list which are good to know -- the whole list can be found here.

I really like finding a real usage for a Unix command you heard of, you have somewhere in your quiver but you never really used because you never found an opportunity to or you never found the good combo which makes it very powerful. Let's explore the power of paste which we will end up combining with fold, shuf and column.

First, let's generate a simple list of numbers, one number per line using seq:

Now let's say you want to organize these numbers by columns of two numbers. Hmmm not that easy right? this where paste shines:

$ seq 1 7 | paste - -
1       2
3       4
5       6
7
$

See these 2 hyphens in paste - -? There are the number of columns you want paste to organize your data in; you want 4 columns? just use 4 hyphens:

$ seq 1 7 | paste - - - - 
1       2       3       4
5       6       7
$

It also makes very easy something oftenly hapenning: rows to columns. paste has the -s/--serial option for that, check below:

$ seq 1 7 | paste -s
1       2       3       4       5       6       7
$

Pretty cool, right? And maybe a CSV type output would be a very good idea as well. paste also got you covered; Indeed, by default paste uses a TAB as a separator which we can change using -d/--delimiters:

$ seq 1 7 | paste -s -d ","
1,2,3,4,5,6,7
$

And what about a CSV with 2 columns per line?

$ seq 1 7 | paste - -  -d ","
1,2
3,4
5,6
7,
$

Note that all of that also works from a file as well:

$ seq 1 7 > test_file
$ cat test_file | paste - -  -d ","
1,2
3,4
5,6
7,
$

Cool but what about a real life example. Let's say we want to create a nice table like list of our OCI Autonomous Databases. The command to get that would be:

$ oci search resource structured-search --limit 1000 --query-text "query AutonomousDatabase resources return allAdditionalFields where lifecyclestate != 'TERMINATED'" |
$

This would generate a JSON with tons (too much) of information and JSON is not really human readable. We could then first use jq to only get the information we are interested in:

$ oci search resource structured-search --limit 1000 --query-text "query AutonomousDatabase resources return allAdditionalFields where lifecyclestate != 'TERMINATED'" | jq -r '.data.items[] | ."additional-details".dbName,."additional-details".ecpuCount,."additional-details".workloadType, ."additional-details".dataStorageSizeInTBs,."lifecycle-state", (."time-created" | gsub("[.][0-9].*$";""))'
PROD2
32.0
ATP
8
AVAILABLE
2024-09-02T07:49:32
PROD1
64.0
ADW
0
AVAILABLE
2023-12-06T02:20:58
TEST1
48.0
ADW
0
AVAILABLE
2023-11-29T02:07:21
$

We indeed get the information we need but it is not really readable. I guess you already guessed how to make it readable: paste! We have 6 values per Autonomous Database, so just use 6 hypens in the paste command:

$ oci search resource structured-search --limit 1000 --query-text "query AutonomousDatabase resources return allAdditionalFields where lifecyclestate != 'TERMINATED'" | jq -r '.data.items[] | ."additional-details".dbName,."additional-details".ecpuCount,."additional-details".workloadType, ."additional-details".dataStorageSizeInTBs,."lifecycle-state", (."time-created" | gsub("[.][0-9].*$";""))' | paste - - - - - -
PROD2        32.0    ATP     8       AVAILABLE       2024-09-02T07:49:32
PROD1        64.0    ADW     0       AVAILABLE       2023-12-06T02:20:58
TEST1        48.0    ADW     0       AVAILABLE       2023-11-29T02:07:21
$

Oh, this is very better! One would tell me that we can also do that using jq using join or @tsv and you would be correct but paste will work with non JSON outputs.

You also know that you can easily add a -d "," to get that output in CSV but let's keep it text based as I want that list from my terminal on my VM quickly when I need it and I would like that output to be nicer. This is where column comes into play! column makes nice columns (adapting the size of the columns) from outputs and can also add some nice column separators to look like a table (I won't re show the whole long oci | jq command for viibility):

$ oci search . . . | paste - - - - - - | sort | column -t -o " | "
PROD1 | 64.0 | ADW | 0 | AVAILABLE | 2023-12-06T02:20:58
PROD2 | 32.0 | ATP | 8 | AVAILABLE | 2024-09-02T07:49:32
TEST1 | 48.0 | ADW | 0 | AVAILABLE | 2023-11-29T02:07:21
$

column can also add a header for each column:

$ oci search . . . | paste - - - - - - | sort | column -t -o " | " -N "DBName,ECPU,Type,TBs,Status,TimeCreated"
DBName | ECPU | Type | TBs | Status    | TimeCreated
PROD1  | 64.0 | ADW  | 0   | AVAILABLE | 2023-12-06T02:20:58
PROD2  | 32.0 | ATP  | 8   | AVAILABLE | 2024-09-02T07:49:32
TEST1  | 48.0 | ADW  | 0   | AVAILABLE | 2023-11-29T02:07:21
$

Last but not least, it is also easy to hide columns; let's hide the ECPU (column2) and the TBs (column 4) numbers (by the way, I contacted support as the TBs for my ADW is 0 which looks like a bug):

$ oci search . . . | paste - - - - - - | sort | column -t -o " | " -N "DBName,ECPU,Type,TBs,Status,TimeCreated" -H2,4
DBName | Type | Status    | TimeCreated
PROD1  | ADW  | AVAILABLE | 2023-12-06T02:20:58
PROD2  | ATP  | AVAILABLE | 2024-09-02T07:49:32
TEST1  | ADW  | AVAILABLE | 2023-11-29T02:07:21
$

Super cool and super easy!

Now, let's get back to another good combo using paste. We sometimes need to anonymize data (when writing a blog for example) and this can be a bit painful; what if we could have a tool automatically shuffling the characters of what we want to anonymize for us? Let's go back to our sequence of 7 numbers I opened with and we let's use the shuf command which will shuffle the lines of this output:

$ seq 1 7 | shuf
4
7
2
5
1
6
3
$

We now automatically recognize the kind of input which can be serialized by paste; let's do it:

$ seq 1 7 | shuf | paste -s -d ''
1547362
$

Note: the output is always different as shuf will shuffle the data differently each time it is executed. This is very cool, we now just need to find something which transform a list of characters to one character per line (indeed, we won't use seq but real data to be anonymized); a kind of unpaste command; as usual, those crazy Unix creators thought about everything and this command is... fold which wraps lines up to a number of character which we can set to 1 for our need:

$ echo "1234567"
1234567
$echo "1234567" | fold -w 1
1
2
3
4
5
6
7
$

We now have the combo we wanted:

$ echo "1234567" | fold -w 1 | shuf | paste -s -d ''
5347162
$ echo "1234567" | fold -w 1 | shuf | paste -s -d ''
3746215
$ echo "1234567" | fold -w 1 | shuf | paste -s -d ''
7253416
$

You have here a string anonymizer; if you use that on an OCID for example:

$ echo "anuwcljt6ubcb2aa6hv52fnv343cvqhvwit7fedl4q7beqd2gw2fkhr4mhma" | fold -w 1 | shuf | paste -s -d ''
4cqah43jdkv23uqu2l22thmlwvrbqbawhnf4ncevewfgh5f6atmb7dvc6i7a
$

Well, good luck to find the original one back!

And all of this for ~ 100 KB:

$ du -sh /usr/bin/shuf
48K     /usr/bin/shuf
$ du -sh /usr/bin/fold
36K     /usr/bin/fold
$ du -sh /usr/bin/paste
36K     /usr/bin/paste
$

Now that you've learnt how to make some cool combos with paste, fold, shuf and column, ask the below question to your colleagues: