explain :

(1). This manual only selects info sort Translation of useful information in , To view the whole content , Please info sort.

(2). In Translation , Used in parentheses " notes " Of , For me , Non original content , Help to understand and explain .

(3). In this paper sort The command is CentOS 7.2 On , Version is sort (GNU coreutils) 8.22, Some options are available in CentOS
6 Not supported on , as "--debug".

(4). I don't understand sort When processing fields and sorting mechanisms , It is strongly recommended not to read man sort.

(5).sort Command complete usage : The king of text sorting : Play through sort command
<http://www.cnblogs.com/f-ck-need-u/p/7442886.html>.

Collection of my translations :http://www.cnblogs.com/f-ck-need-u/p/7048359.html
<http://www.cnblogs.com/f-ck-need-u/p/7048359.html#mytranslations>

7.1 'sort': Sort text files

===========================


sort Command for sorting , Merge or compare a given file ( Multiple can be given ) All lines of , If no input file is given or the input file is "-", Then read the standard input . By default ,sort Print operation results in standard output .

grammar :

sort [OPTION]... [FILE]...

sort Yes 3 Operation modes : sort ( default ), Merge and check whether it has been sorted . Use the following 3 Options change operation mode :

'-c'
'--check'
'--check=diagnose-first'

Check whether the given file has been sorted : If unsorted is detected , The diagnostic information will be output and the status code 1 sign out , This diagnostic message contains the first out of order row . Otherwise exit with success status . At most one detection file can be given .

'-C'
'--check=quiet'
'--check=silent'

It's similar to "-c", But diagnostic information will not be output . If files are sorted , Exit with success status , Otherwise, use the status code 1 sign out . At most one file can be given .

'-m'
'--merge'

Merge multiple files , Each input file must be sorted . When merging, groups are merged based on sorted results .sort Generally used for sorting , But it still provides consolidation , Because it's very fast .

 


sort The collation is : Sort the given fields in the order given on the command line , Sorting is based on the sorting options assigned to each field , Until different sorting options are found or the end of sorting . If no sort is given key( notes :key mean -k Specified value ), Sort the whole row . last , If all given key When the comparison results of , The entire row will be sorted completely by default ( notes : In ascending order ), but "-r" Can change this promotion , Descending results . This ranking is called " Last sort ". use "-s" Option can be disabled " Last sort ", Make the rows with the same sorting result retain the original relative order ."-u" The option also disables " Last sort ".

Unless otherwise specified , Otherwise, all comparisons are based on "LC_COLLATE" Sorting by the specified character set's collation .

 

Exit status code :

0 When no errors occur
1 If "-c" or "-C" When the test finds that the input data is not sorted
2 When an error occurs

If the environment variable is set "TMPDIR",sort It will be used as a temporary directory instead of the default "/tmp"."-T" Option will override the value set by the environment variable .

The following options affect the output of the sort . they
Can be specified as a global option , It can also be used as key Part of . If no key, The global option will be applied to the entire row , Otherwise specified key Global options will be inherited , Unless key It also specifies options ( notes : The key Global options will be overridden ).

To consider portability , It is recommended that global options be specified in the "-k"( or "--key") In front of .

'-b'
'--ignore-leading-blanks'

ignore key Leading whitespace for ( Include spaces , Tab ). When this option is not given , Blank symbol pair "-k" Option specifies character position has an effect ( notes : for example "-k
2.2" Designated 2 Characters may be blank ).

'-f'
'--ignore-case'


Treat lowercase characters as uppercase characters . for example ,"b" and "B" Is equal . When and "-u" When used with options ( notes : Duplicate rows can only be output once ), Equivalent lines of those lowercase characters are discarded ( notes : in other words , The output is a line of uppercase characters ).( At present, there is no way to discard the equivalent lines of uppercase characters , Even with "-r" No way , Because at any time ,"-r" Options just reverse the final sort result , Does not affect the sorting process .

'-h'
'--human-numeric-sort'
'--sort=human-numeric'


Sort file size formats . First, sort positive and negative ( Positive number >0> negative ), Sort the size suffix again (0<k=K<M<G<T...), Finally, sort the values . It doesn't care if the conversion accuracy is 1000 still 1024, Because it's always automatically expanding to the closest suffix ( notes : for example 999M and 1G When comparing, the 1000 As conversion unit ,1023M and 1G When comparing, the 1024 As conversion unit ).

'-M'
'--month-sort'
'--sort=month'

Sort by month in character format .
An initial string, consisting of any amount of blanks, followed by a month
name abbreviation, is folded to UPPER case and compared in the order 'JAN' <
'FEB' < ... < 'DEC'. Invalid names compare low to valid names.

'-n'
'--numeric-sort'
'--sort=numeric'

Sort by number . Empty string "" or "\0" Be treated as nothing . Numerical sort is precise sort , Sort without rounding .

( notes :
The difference between numerical sorting and default sorting rules is , When key When non mathematical characters are encountered in , As blank , letter , Special characters, etc , Sort will end directly ( stay sort Internal thought no match found ). in other words ,"-k
2" and "-k
2n" Different , Although these two key Will extend to the end of the line , The former compares from the second field all the way to the end of the line in character set order , The latter may be only for the 2 Field matching , Because there may be special symbols between the second and third fields , Causes the numerical sorting to end directly .

therefore , about "abc 100 200" Such input , Suppose the field separator is a space , When specified "-k 2n" Time , The key by "100
200", But because of the white space , Make the key The sorting of ends in the second field . If it is "abc 100\0200 200","-k
2n" When sorting , Although it seems to be 100200, But it's only right 100 Sort , in other words , If there is another line at this time 2 The field value is 110, It looks big 100200 Will be less than 110. Test statement :
echo -e "b 100:200 200\na 110 300" | tr ':' '\0'|sort -t ' ' -k2n -k1
therefore , about "-n" In terms of , It's absolutely impossible to cross key The boundary of . But the default collation will span key work .)

'-r'
'--reverse'

Reverse the results of the comparison , Make the result larger key Earlier .( notes :"-r" Does not change sorting behavior , Instead, the output after sorting is reversed , Therefore, only output results after sorting are affected )

'-k POS1[,POS2]'
'--keys=POS1[,POS2]'

Specify sorted key, That is, the start and end fields of each row sorting ( If omitted POS2, The ending position is the end of the line ).


POS The format of is "F[.C][OPTS]", among F Indicates the sequence number of the field ,C Indicates the sequence number of the characters in the field . Field and character positions are from 1 Start calculation . If POS2 The character position of is specified as 0, It means POS2 Last character in field . If POS1 Middle ellipsis ".C", The default value is 1( Start character of field ), If POS2 Middle ellipsis ".C", The default value is 0( End character of field ).OPTS Sorting options for , These options override the global options , Make the key You can sort by independent sorting options .keys Can span multiple fields .

( notes :OPTS Specify on POS1 and POS2 It's the same thing , Because of one "-k" Specify a key, Whether it is POS1 still POS2 In OPTS It's all right key Effective , but "b" Except options , See below )

Example : To sort the second field , use "--key=2,2"(-k 2,2). Available "--debug" Options help view , Analyze and determine which fields in each row are used for sorting .

'--debug'

' Show parts of each row for sorting . Additional information will also be provided .

'-o OUTPUT-FILE'
'--output=OUTPUT-FILE'


Write the output of sorting to OUTPUT-FILE in . generally speaking ,sort On OUTPUT-FILE Read all input before , So you can safely save the sorting results to the input file , Like "sort
-o file1 file1" and "cat file1 | sort -o
file1" equally . however ,"-m" Option to open the output file before reading the input , So the following statement is not safe :
"cat file1 | sort -m -o file1 -"
'-s'
'--stable'

prohibit sort implement " Last sort ". When no field or global options are specified , This option will not work , Unless otherwise specified "-r" option .
( notes : Last sort : stay key When the comparison results of are the same ,sort The final method is to sort the whole row again by default , I.e. by letter , Ascending to sort the whole row last . This is called " Last sort ".
If no options are specified , It's completely default , So there's no need for a final sort . If you specify yes "-r" option , because "-r" Is to reverse the final result , So it will affect this time " Last sort " Results of )

'-t SEPARATOR'
'--field-separator=SEPARATOR'

When searching in each line key When , use SEPARATOR Character as field separator . By default , Fields are separated by empty strings between blank and non blank characters .

therefore , If the input behavior " foo bar", Split into two fields by default " foo" and "
bar",( notes : The empty characters between the blank and non blank characters are at the beginning of the line and "oo" Back position ). Field separator is not the content of the separated field , therefore "sort -t ' '" Yes " foo
bar" When separated , Split into 3 Fields : Empty field ,"foo" and "bar". however , Each individual field is extended to the end of the row , Like "-k 2", Or like "-k
2,3" Fields containing ranges , They all retain field separators when extended .
( notes : with sort -t ' ' take as an example ,"-k 2" In fact, it means "foo bar", It extends to the end of the line , And the middle field separator is reserved . and "-k
1,2" In fact, it means " foo", Because it's clearly specified key To end of second field , But the middle field separator remains )
If you want to specify a blank field separator , Use "\0", for example "sort -t '\0'".

'--parallel=N'

set up sort The number of parallel threads running is N. default N Set to available cpu number , But the maximum limit is 8, Because more than 8 Performance benefits decrease after that .

'-u'
'--unique'

Normally ,"-u" Only the first row of the sorted repeated row will be output . This option disables " Last sort "( notes : See previous translation ).

"sort -u" and "sort | uniq" Is equivalent , But it may not be equivalent if more options are extended , for example ,"sort -n
-u" Only the uniqueness of the numerical part will be checked , but "sort -n | uniq" stay sort After sorting the number of rows ,uniq Uniqueness of the entire row will be checked .

'-z'
'--zero-terminated'

use "\0" Split each line instead of using line breaks .

 


"-k" designated key Can be specified later "bfhgnr" Etc , In this case , The key Global options will not be inherited . except "b" option , All options work for the entire key, Whether the option is written in POS1 still POS2 upper . If specified "b" option , It only works independently POS1 or POS2 upper , But if you inherit the global "-b", It will affect the whole key upper . If the input line contains leading white space characters , And not used "-t" option ,"-k" Usually combined "-b" Or some options that implicitly ignore leading white space characters (ghn) Use together , Otherwise, leading white space characters may lead to very confusing fields .

If POS The field or character position specified in exceeds the end of line or field , Then the key Empty . If specified "-b" option ,".C" Section will start with the first non blank character of the field .

Here are some examples , Used to describe the combination of different options :

*   Sort by number , In descending order (reverse) sort -n -r
*   Sort alphabetically , Ignore first and second fields , And the leading blank of the third field is ignored . Single used here key, The key Start with a non blank character in the third field , Extend all the way to the end of the line .
This whole key All in alphabetical order . sort -k 3b
* Sort the second field by numerical value , And by specifying the fifth field 3,4 Alphabetize between characters to break the rule of numerical sorting . use ":" As field separator . sort -t : -k 2
,2n -k5.3,5.4
( notes : anytime , When you only want to sort a field , It is recommended to specify the start and end positions )

be careful , If it's about "-k 2n" instead of "-k 2,2n", The key Extend from the second field all the way to the end of the line , This is the main sort key, And suborder key"-k
5.3,5.4" In main sort key Sort by letter based on . In most cases , Give Way key Backward expansion is generally not the expected behavior .

Attention is also needed ,"n" Option scope is the first key. This is equivalent to "-k 2n,2" or "-k
2n,2n". All modifiers , except "-b", Whether it's written in pos1 still pos2, It's going to work on the whole thing key.

( notes : because n Option cannot span key, So even if it's written as "-k 2n" It's also equivalent , But the following two commands are different :
sort -t : -k 2 -k 5.3,5.4n sort -t : -k 2,2 -k 5.3,5.4n

Because the default character set collation will span key, In the first order key From the 2 Field start , Until the end of the line , So we'll start with the whole key Sort by character , Then on this basis, pair key Sort by number .
Let's take another example : Even if the main key The field of is in the secondary key After the field of , Vice- key Because of character set sorting , So it will still cross the main key.)
sort -t : -k 5n -k 2
* Yes /etc/passwd No 5 field order , And ignore leading blanks . If the 5 Field sorting results are equal , Then further compare the 3 Of field uid Sort . The field separator is ":".
sort -t : -k 5b,5 -k 3,3n /etc/passwd sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
sort -t : -b -k 5,5 -k 3,3n /etc/passwd

The above three commands are equivalent . The first command specifies the first key Of POS1 To ignore leading blanks , And the second key To sort by value . In the other two commands , Missing option's key Global options will be inherited . Why inheritance works correctly here , Because "-k
5b,5b" and "-k 5b,5" Is equivalent .

* To one 系列日志文件进行排序,主排序key为IPv4,副排序key为时间戳.如果两行的主,副key都完全一致,则按照文件被读取时的相对顺序输出.日志
文件包含的行格式大致如下:
4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1
211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2

使用单个空格可以精确分割这些字段.IPV4地址列按照字典顺序排序,例如212.61.52.2小于212.129.233.201,因为61小于129.
sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |\ sort -s -t '.'
-k1,1n -k 2,2n -k 3,3n -k 4,4n

该示例无法仅使用一个sort语句实现,因为IPV4地址需要使用"."分隔,而时间戳需要使用空格分隔.因此,使用两个sort语句:第一个sort语句按照时间戳排序,第二个语句按照IPV4排序.第一个sort命令中使用"-k"将每个字段进行隔离,先按照年排序,再按照月份排序,接着是日,最后对"时:分:秒"排序.除了"时:分:秒"这个key,其余的key都没必要指定key的结束位置,因为"n"和"M"选项作用范围不能跨域每个key的左边界.第二个sort命令是对ipv4地址按照字典顺序排序的.第二个sort语句中使用了"-s"选项,以防止主排序key的关系被副排序key破坏,第一个sort语句中使用"-s"选项是为了保证两个sort语句在"-s"属性上的一致性.

(注:由于n选项无法跨越key边界和非数学字符,因此上面第二个sort命令和下面的命令是等价的:)
sort -s -t '.' -n -k1 -k2 -k3 -k4