sed Serial articles:

sed Practice Series( One): Flower boxing and leg embroidery beginner level chapter <http://www.cnblogs.com/f-ck-need-u/p/7488469.html>
sed Practice Series( Two): Martial arts mentality(info sed translate+ annotation)
<http://www.cnblogs.com/f-ck-need-u/p/7478188.html>
sed Practice Series( Three):sed Window sliding technology for advanced applications <http://www.cnblogs.com/f-ck-need-u/p/7496916.html>
sed Practice Series( Four):sed Difficult and miscellaneous diseases in <http://www.cnblogs.com/f-ck-need-u/p/7499309.html>

<>

1.sed The use of variables and variable substitution in


Use in scriptssed When, It is likely thatsed Citation inshell variable, Even want tosed Using variable substitution on the command line. Maybe a lot of people have encountered this problem, But quotation marks can't be debugged in the right place. It's notsed Problem, Butshell Characteristics. Understandsed How to solve the problem of quotation marks, Understandingshell Quotation marks help a lot, Grasp a typical example and you will grasp the whole category, Later in useawk,mysql I won't be confused when I bring my own parsing tool.

For example, I want to outputa.txt Reciprocal5 Statement of rows. You may have written the following command line:
total=`wc -l <a.txt` sed -n '$((total-4)),$p' a.txt
But unfortunately. This will report wrong.. One side,"$" staysed Special symbol in, When placed in an addressing expression, It represents the tag of the last line of the input stream. and$(())
It also appears in"$" Symbol, This will makesed To parse the symbol. On the other hand,$(())
This part is to useshell Calculate instead of usingsed Computational, Therefore, it must be exposed toshell, So that we can makeshell It can analyze it..

Besidesshell Chinese single quotation marks, Double quotes and no quotes.

* Single quotation mark: All characters in single quotes become literal. But pay attention to: Single quotation mark cannot be used in single quotation mark, Even if backslash escape is used, it is not allowed.
* Double quotation marks: All characters in double quotes become literal, but"\","$","`"( backquote) Except, If it's on"!" When referencing a history command, Exclamation mark is also excluded.
* Do not use Quotes: Almost equivalent to using double quotes, But with braces and tilde extensions.
About double quotes above, The description is not really complete, But enough.. These are just their literal meanings,
The real meaning of quotation marks is: Decide which of the command lines" Word" Need to beshell analysis, It also determines what the literal meaning does not need to beshell analysis. For details, see:
shell The process of parsing the command line andeval command <http://www.cnblogs.com/f-ck-need-u/p/7426371.html>.


Obviously, All characters in single quotes become literal,shell No words in it will be parsed, For example, a single quoted variable is no longer parsed, Command substitution and arithmetic operations are no longer performed, No path extension, etc. in short, The characters in single quotation marks are all ordinary characters, If some characters need to be parsed by the command with parsing function, Must use single quotes. for example,"$","!" and"{}" staysed There is special significance in, Want to letsed Can parse them, You must use single quotes for them, Otherwise, it must be wrong, Or ambiguity. For example, below3 individualsed All symbols in statements must use single quotes to get the correct result.
sed '$d' filename sed '1!d' filename sed -n '2{p;q}' filename
And you want special characters to beshell analysis, It must not be enclosed in single quotes, You can use double quotes, You can also use no quotes, Even if it's not quoted, it might look weird. for example, Arithmetic operations above
$(()) Want to beshell Analytic, Therefore, it must be exposed to theshell. So the correct statement is:
sed -n $((total-4))',$p' a.txt sed -n "$((total-4))"',$p' a.txt sed -n
"$((total-4)),\$p" a.txt
From the naked eye, The quotation marks of this sentence are really weird. butshell No matter how ugly or beautiful, It is dead. It has its own set of rules when dividing the command line, How to divide the rules.

Therefore, aboutsed How andshell The problem of interaction can draw a set of conclusions:

* Need to beshell Parsed without quotes, Or double quotes;
* encountershell When a special character is shared with the command being executed, Want to besed analysis, Single quotation mark is required, Or escape with a backslash in double quotes;
* Those unimportant characters, No matter what quotation marks.
therefore, Use command substitution tosed Reciprocal output5 The statement of line is as follows:
sed -n `expr $(wc -l <a.txt) - 4`',$p' a.txt
In the above statement,`expr $(wc -l <a.txt) - 4` To beshell analysis, Therefore, single quotation marks must not be used. and$p
Partial"$" To besed Resolve to last line, Single quotes must be used to avoid beingshell analysis.

More complicated, staysed Using variable substitution in regular expressions of. for example, outputa.txt Medium to variablestr Line from beginning of string to last line.
str="abc" sed -n /^$str/',$p' a.txt
Because no quotes are used, therefore$str Can be accepted as scheduledshell replace with"abc". There are many ways to write this command:
sed -n '/^'$str'/,$p' a.txt sed -n "/^$str"'/,$p' a.txt sed -n "/^$str/,\$p"
a.txt sed -n"/^$str/,"'$'p a.txt
Give a harder onesed The use of symbols. take/etc/shadow Replace the password part of the last line in with"$1$123456$wOSEtcyiP2N/IfIl15W6Z0".
[[email protected] ~]# tail -n 1 /etc/shadow userX:$6$hS4yqJu7WQfGlk0M$Xj
/SCS5z4BWSZKN0raNncu6VMuWdUVbDScMYxOgB7mXUj./dXJN0zADAXQUMg0CuWVRyZUu6npPLWoyv8eXPA.::
0:99999:7:::
The replacement statement is as follows:
old_pass="$(tail -n 1 /etc/shadow | cut -d':' -f2)" new_pass=
'$1$123456$wOSEtcyiP2N/IfIl15W6Z0' sed -n '$'s%$old_pass%$new_pass%p /etc/shadow
Becauseold_pass andold_pass
It contains"/" and"$" Symbol, therefore"s" The separator of the command uses"%" Replace. Watch carefully againnew_pass, There are"." Symbol, This is the metacharacter of a regular expression, So it can match other situations.

<>

2. Reverse reference failure

When using either option in a regular expression"|" Time, If grouping brackets() Content in does not participate in matching, Backward references will not work. for example(a)\1u|b\1
Will only match"aau" Row, Mismatch"ba" Row, Because in the second rule of one of them\1 Group represented does not participate in matching, So in the second regular\1 Invalid, But in the first regular\1 Effective.

This is the problem of regular matching, Not justsed, Other tools that use basic and extended regular engines have the same problem.

in addition, stays When using reverse references in commands, Will not be referenced"s" Groups outside commands. for example:
echo "ab3456cd" | sed -r "/(ab)/s/([0-9]+)/\1/"
The result will beab3456cd, Instead ofababcd, And if you use\2 Quote, You will report wrong."invalid reference \2 on 's'
command's RHS".

<>

3."-i" File save problem for option

sed By creating a temporary file, And write the output to the temporary file, Then rename the temporary file as the source file to save the. therefore,sed Ignore the read-only nature of the file.

Allow rename or move in or delete files, It is controlled by the permission of the directory where the file is located. If the directory is read-only, besed unavailable"-i" Options save results, Even if the file has read permission.

<>

4. Greedy matching problem


So called greedy matching, When a regular expression can match multiple contents, Take the longest one. The simplest example, Given data"abcdsbaz", regular expression "a.*b" Can match the"ab" and"abcdsb", Because of greedy matching, It will take the longest"abcdsb".
echo "abcdbaz" | grep -o "a.*b" abcdb
One of the disadvantages of basic regular expression and extended regular expression is that they can't overcome greedy matching, imagePerl The regular implementation of regular or other programming languages is relatively complete, stay"*
" or"+" This repeated match is followed by a"?" So we can clearly show that we take the lazy matching mode, for example"a.*?b".
echo "abcdbaz" | grep -P -o "a.*?b" ab
To overcome greedy matching of basic regular or extended regular, Can only" Be opportunistic" Use without symbols"[^]" To achieve. Like the one above:
echo "abcdbaz" | grep -o "a[^b]*b" ab

This opportunistic way, Poor performance, Because engines that base or extend regular expressions always match the longest content first, Then match back, This is called" To flash back". for example"abcdsbaz" In being"a[^b]*b" Matching time, First match out"abcdsb", Character by character fallback matching, Until you go back to the first"b" It's the shortest result.

Another example,/etc/passwd The format of each line of data in the file is as follows:
rootx:0:0:root:/root:/bin/bash
How to usesed towards/etc/passwd Every user in, The output format is roughly:"hello root","hello nobody".

First, You have to take the first column out of the file, User name. But because all lines in the file are colon separated fields, Want to use regular expression matching to get the first paragraph, Greedy matching must be overcome. Statements are as follows:
sed -r 's/^([^:]*):.*/hello \1/' /etc/passwd
Be careful,sed Basic and extended regular engines are used, When overcoming greedy matching, It has to match the longest, Back to the shortest.

If you want to take/etc/passwd The first two fields in? Just repeat the rule that overcomes greed as a whole.
sed -r 's/^([^:]*):([^:]*):.*/hello \1 \2/' /etc/passwd
Take the third field?
sed -r 's/^([^:]*:){2}([^:]*):.*/hello \2/' /etc/passwd
Take the third and fifth fields? No way out, You can only explicitly label the fourth field.
sed -r 's/^([^:]*:){2}([^:]*):([^:]*):([^:]*):/hello \2 \4/' /etc/passwd
Third to third5 field? Simpler, repeat3 Time will do.
sed -r 's/^([^:]*:){2}(([^:]*:){3}).*/hello \2/' /etc/passwd

But in the end, The first3 To the first5 Fields must contain":" Separator, Want to get rid of it? Wash and sleep.!sed I'm not good at dealing with fields, Overcoming greedy matching makes expressions difficult to read, And it's not efficient. Use it to process fields, It's definitely full of food.

<>

5.sed command"a" and"N" Entanglement

sed Of"a" The function of the command is to queue the provided text data in memory, Then, when the mode space content is output, it is added to the tail of the output stream for output.

for example, In matching rows"ccc" Insert a row of data after"matched successful".
echo -e "aaa\nbbb\nccc\nddd" | sed '/ccc/a matched successful' aaa bbb ccc
matched successful ddd
How to use it?"a" command, Very smoothly, Nothing wrong with it.. But combine"N" Have a try?
echo -e "aaa\nbbb\nccc\nddd" | sed '/ccc/{a\ matched successful ;N}' aaa bbb
matched successful ccc ddd

Isn't it added at the end, How to run ahead of the matching line? even if"N" Read next line, It should be added in"ddd" Next line? Want to really understand this problem, Yessed The output mechanism of pattern space must be clear, Can refer to
sed Practice Series( One): Flower boxing and leg embroidery beginner level chapter <http://www.cnblogs.com/f-ck-need-u/p/7488469.html>
. Here is a brief description"N" Command output mechanism.


Whether it issed Read next line automatically, still"n" or"N" Command read next line, As long as there is a read action, In front of it, the content of pattern space must be output. When"N" When reading the next line, First, it determines if there is another line to read, If there is, Lock the mode space first, Then automatically output and clear the mode space, Unlock the mode space again and append a line break to the end"\n", Last read next line append to end of line break. Because the mode space is locked, Make the output flow empty when automatic output, Also cannot empty mode space. Be careful, It's not a disable output, Although the result of outputting the empty stream is the same as that of banning the output, But output air flow has output action, There is output stream. Write standard output, No output action. If there is no next line to read, Then auto output mode space, Empty mode space and exitsed program. The process is described as follows:
if [ "$line" -ne "$last_line_num" ];then lock pattern_space; auto_print;
remove_pattern_space; unlock pattern_space; append"\n" to pattern_space; read
next_line to pattern_space;else auto_print; remove_pattern_space; exit; fi

Go back to"a" Command and"N" On the issue of command combination. Reason why"a" The queued text of the command is inserted before the matching line, The problem is with the output air flow."N" When preparing to read the next line, It has an output action, Even if the output is empty. and"a" Orders are waitingsed Output stream, As long as there is an output stream, We'll catch up and add it to the bottom of the output stream. therefore,"matched
successful" Will be appended to the tail of the air flow, After appended"N" To read the next line, Output the content in the pattern space finally"ccc\nddd", That's how we get to the front" Contrary expectation" Result.

<>

6.sed The winding of the middle exclamation mark reversed

You know how to use"!" Number inversion, But maybe you didn't find that the exclamation point could be placed after the addressing expression, It can also be placed in front of the command. Both of them are opposite, But the meaning is definitely different, The result is different.

* Exclamation mark after addressing expression, Indicates filtering rows. Indicates that the line satisfying the condition does not execute the command, But unsatisfied guild execution.
* Exclamation point in front of the command, Indicates that the line satisfying the condition does not execute the command, And the unsatisfied lines will not be executed( Does not execute because it is not matched to). This is the command to filter the rows in the schema space.
If filea.txt It contains3 That's ok:
djkaldahsdf abcskdf2das chhdsjaj
For the following threesed Script:

* (1)./^abc/!{d}
* (2)./^abc/{!d}
* (3)./^abc/!d

Example(1) Place exclamation mark after addressing expression in, It means it's not in letters"abc" The first line will executed Delete command. And those with"abc" Beginning line, Does not match the addressing expression, Follow upd Command will not be executed. In other words, Thesed The purpose of the script is to: except"abc" Beginning line, Delete all remaining lines, So only output the2 That's ok.


Example(2) Exclamation mark in front of the command, Not after the addressing expression. This means that"abc" The first line does not executed command. And those don't"abc" The first row does not satisfy the addressing condition, Nor will itd command. In other words, Thesed Scriptingd Command is redundant, No lines will be deleted. So all lines are output.

Example(3) Equivalent to example(1), Because address matching takes precedence over command execution, Exclamation marks are directly considered part of the addressing expression.


But in either case, For those that do not satisfy the addressing expression( The exclamation mark after addressing is also part of the addressing expression) That's ok, No subsequent命令,这些行是直接自动输出的,由"-n"选项控制是否将其输出.

<>

7.sed卡死,cpu 100%问题

有些人可能遇到过这种问题,特别是sed处理以UTF-8格式导出的数据库文件.

之所以会出现这样的问题,是因为字符集的问题,确切地说是本地环境(locale)和文件的编码不一致.

如果出现这样的问题,可以将LC_COLLATE和LC_CTYPE环境变量设置为C.也可以简单地设置LANG=C或LC_ALL=C.