Unix-like systems are full of text file and we very ofenly end up with 2 lists of stuff we need to find which one of the first list is not in the second list of which elements are in both lists. The best way to achieve this in a glimpse is to use the comm command.
Let's use 2 files as an example; these are the users with uid < 10 I took from an OCI instance and where randomly removed some users for the sake of explaining comm.
As outlined in this comment below, we can indeed also dynamically sort the files when executing the comm command (note that this won't modify the original files):
Now that you know comm, think back to how you were doing this before? yeah... no, there is no coming back from comm, comm is awesome !
< Previous shell tip / Next shell tip >
Let's use 2 files as an example; these are the users with uid < 10 I took from an OCI instance and where randomly removed some users for the sake of explaining comm.
[fred@onehost]$ cat list1 root bin adm lp sync shutdown halt mail [fred@onehost]$ cat list2 root bin daemon adm lp halt mail [fred@onehost]$Before starting comming the files, we need to know what is AFAIK, the only comm requirement: the files have to be sorted; if you do not sort the files, you will get the below error (unless you specify --nocheck-order but you'll have a wrong output so not sure it is worth mentioning this option):
comm: file 2 is not in sorted orderAs a side note, the way to correctly sort a file is the below one:
[fred@onehost]$ sort -o list1 list1Indeed, reading and sorting the same file is NOT a good idea, do NOT do as below:
[fred@onehost]$ cat list1 | sort > list1 <== do NOT do thatNow that our files are sorted, we can comm them; let's start with no option:
[fred@onehost]$ comm list1 list2 adm bin daemon halt lp mail root shutdown sync [fred@onehost]$The default output shows 3 columns:
- The first column are the elements which are in the first file only
- The second column are the elements which are in the second file only
- The third column are the elements which are in both files
As outlined in this comment below, we can indeed also dynamically sort the files when executing the comm command (note that this won't modify the original files):
[fred@onehost]$ comm <(sort list1) <(sort list2) adm bin daemon halt lp mail root shutdown sync [fred@onehost]$You can also use the sort -u option to also remove the duplicates while sorting the files;
[fred@onehost]$ comm <(sort -u list1) <(sort -u list2) adm bin daemon halt lp mail root shutdown sync [fred@onehost]$I am no big fan of this output even if I can think of some use for it; you can for example use the --total option to show a count of each column and also the --output-delimiter option which you can for example set to semi-column to get a CSV-like output to paste in a spreadsheet tool for a nice show off to management:
[fred@onehost]$ comm list1 list2 --total --output-delimiter ";" ;;adm ;;bin ;daemon ;;halt ;;lp ;;mail ;;root shutdown sync 2;1;6;total [fred@onehost]$Now, what I consider to be the most useful comm options. Their use is a bit counter intuitive as the options hide information instead of showing information (I did this kind of thing in rac-status, it is very powerful):
- -1: do not show column 1
- -2: do not show column 2
- -3: do not show column 3
- hide column 2
- hide column 3
[fred@onehost]$ comm -23 list1 list2 shutdown sync [fred@onehost]$Following the same principle, -12 with only show the 3rd column which are the elements commun to both files:
[fred@onehost]$ comm -12 list1 list2 adm bin halt lp mail root [fred@onehost]$And yes, -123 would show nothing at all:)
[fred@onehost]$ comm -123 list1 list2 [fred@onehost]$And this works very very fast on very big files; see how easy it is?
Now that you know comm, think back to how you were doing this before? yeah... no, there is no coming back from comm, comm is awesome !
in bash: comm <(sort list1) <(sort list2)
ReplyDelete