One of the most useful commands in Bash — sort — is also one of the most underrated. It is engaged in sorting incoming data (whether it is a file or terminal output stream) with the possibility of orienting by numbers or alphabet. With many options available, sort greatly facilitates and speeds up the processing and analysis of any textual information.
Bash Sort Command
In fact, the principle of the command is extremely simple: after its name, specify the name of the desired file. If you use it without options, it sorts the data alphabetically, taking as a basis the first letter of the first field and ranks ascending.
Note! Using sort does not overwrite the file, but simply shows the result of manipulating it.
By the way, the fields are separated by default by a space. This can be changed by specifying a special flag, but more on that later.
Sort also works with pipes, which allows Bash to redirect outgoing data from one utility to another. This is useful when working with the system information. Below is an example of the output of the first five lines of the /etc/passwd file without and with sorting.
If you run the utility without specifying a file, then it will wait for data input from the keyboard. After pressing the keyboard shortcut Ctrl + D, the program will exit and fulfill the specified (if any) conditions.
All the power of the utility is hidden in its flags, with which you can rank data at a higher level.
Extended Use of Sort Command
There are many options that expand the possibilities of using the command. They will be considered on the following example:
The usual first-field ranking is already known to you. To perform this operation in reverse order (in this case, in descending order) add the -r (or –reverse) flag.
Of course, it does not always coincide that you need to sort the data by the first field. And if we want to display the debtors from the largest debt to the smallest, we will have to add the -k (or –key) flag and specify the second field.
However, something is wrong here. The list has been changed, but ranking by the amount of debt has not occurred. The thing is that for working with numbers you should specify the appropriate flag, but more on that later.
Consider another example of a file where the name of the person is also indicated next to the surname.
If you specify sorting by the second field for it, then, obviously, it will be sorted by the letter of the name:
To sort by the last name, you must specify the character number through the dot after the field number. In this case, this is symbol 3 (the first is the letter of the name, the second is the dot):
Consider other data. The first and last names are separated by a space and, therefore, are separate fields. Moreover, the same surnames are found.
The program simply indicates sorting by the second field, but it will be able to arrange people with the same last name in the specified order (ascending or descending) relative to the field on the right (here it is a name, which is quite logical).
If you do not need to consider names, you should limit the operation to only the second field, indicating one more digit 2, separated by a comma after the first one.
The utility allows you to do secondary sorting by another field. To do this, use another -k flag with the appropriate parameters. So, if with the same sorting it is necessary to show the debt of namesakes in a certain order (descending or ascending), this should be indicated after the initial ranking.
But, as we found out earlier, working with numbers is not the same as with letters, so you should consider it now.
Number handling throughout GNU/Linux has some pitfalls. The fact is that, for example, Bash perceives visible numbers as symbols, and not as specific numbers. Therefore, it will sort them by default according to their position in the ASCII table.
For example, we have a file with numbers from 1 to 15.
If you apply sorting without options to them, you get the following result:
Generally, that’s not what we expected. Why? Because at first the utility sees the number 1 and knows that in the ASCII table it is higher than the rest of the presented ones, therefore it puts in front all the numbers starting with one, and sorts them according to the same principle.
To rank numbers correctly, add the -n (or –numeric-sort) option.
We return to our first debtors. We need to sort them by the amount of debt in descending order. Now we know how to do it.
Now it’s time to highlight a few nuances of working with sort:
- Short forms of flags (where only one letter is indicated) can be written together, indicating at the beginning only one hyphen.
- Flags can be set in any order among themselves. The main thing is to put its parameters next to a specific flag (which can also be written together). For example, in the case of indicating the field, you can make the following record:
However, if the flag of K2 stands between two other flags, then the one to the left of it will not work. For example sort -rk2n money.txt makes ranking in ascending order, not decreasing.
Well, an example with debtors, where there are namesakes. We wanted to do their secondary sorting with ranking by the amount of debt. After specifying the fourth field, the flag n stands for its type. The flag can be indicated elsewhere, but it will be more clear.
Sort Numbers in Mathematical Form
A rather useful option is -g (–general-numeric-sort) when it comes to ranking numbers that are given in mathematical form. For floating-point numbers, it is the dot, not the comma that is used.
For example, some numbers are written in the file:
If you use the -n option, the result will be like this:
which, of course, is not true, since the number 10e3 is a 10 × 10³ notation, and it is obviously larger than 12.12. But if you use the -g option, then everything falls into place:
Sort by Month
The program allows you to rank even by month of the year. The first file indicated the date next to the amount. To specify sorting by months, the -M flag (–month-sort) is used, after specifying the corresponding field.
But we would like to arrange the data correctly within one month. We are already familiar with secondary sorting.
By default, as you know, a space is used as a field separator. The -t flag (–field-separator=SEPARATOR) is used to indicate otherwise.
So, ranking the file /etc/passwd beyond the first field is not possible without using this option. For example, take the last 5 entries of a file and let the reference field be the UID and the separator the colon:
The -c (–check) option allows you to check whether data is sorted by the specified condition. If the data is already located as it should, then nothing will be displayed on the screen. Otherwise, the program will notify of the error by specifying the file and line number with the first “error”.
Remove Duplicate Lines
If the same field contains duplicate data (as it was with the last names), then they can be hidden using the -u (–unique) flag. But if we just specify the second field, then we will not get the result that was expected:
The thing is that the program tries to find unique data, starting from the second field to the end of the line. But there are no such fields. Therefore, to clearly indicate the hiding of fields with the same last name, it is necessary to limit the operation of the utility to the same second field, separated by a comma.
Earlier, “ideal” file examples were considered when everything is filled without flaws. But there are situations when, for example, the field is preceded by a space, and the program takes it into account when checking as a character (and it, by the way, is in the ASCII table before the letters). Moreover, there may be several in a row. The -b (–ignore-leading-blanks) flag is used to ignore spaces.
These are the most used options for Bash sort command. There are others, but they are less commonly used (because of their specificity). You can read more about them in man sort.