Basic regularity ：https://www.cnblogs.com/f-ck-need-u/p/9621130.html
<https://www.cnblogs.com/f-ck-need-u/p/9621130.html>

Perl regular ：https://www.cnblogs.com/f-ck-need-u/p/9648439.html
<https://www.cnblogs.com/f-ck-need-u/p/9648439.html>

1. All matching patterns in regular , Should be understood as " After matching a character or string , Follow closely and match again ". This concept is very important .

2. When caret is used at the beginning of bracket , Represents a character immediately following a match that does not contain a given character , Instead of allowing mismatch of given characters .
Most of the time, they are equivalent , But when matching the end of a line , Different meanings , for example ：Aa[^bcd]\$ The matching row is allowed to be Aaa\$ or Aax\$, But not only Aa\$.
This is regular " Match closely " Meaning of .

3.(\.[0-9]+)? Match decimal part , It can't be written (\.?[0-9]*) , The latter even if it doesn't match the decimal point , It can also match the value after the decimal point

4.perl When regular parentheses are grouped , use (?: Replace left parenthesis (, It can be said that only groups are not captured . The so-called capture means that it can be inverted or saved to variables outside the regular
([-+]?[0-9]+(\.[0-9]+)?) *(cm|mm) ：(cm|mm) Save as \$3
([-+]?[0-9]+(?:\.[0-9]+)?) *(cm|mm) : (cm|mm) Save as \$2

5. Special anchor , The anchor matches the position , Not characters , The beginning of a line ^ And the end of the line \$ the same is true .

Note that some programs don't understand words the same way they define boundaries . Some programs do not fully support all of the following special metacharacters . generally speaking , Words are made of letters , Composed of numbers and underscores , Namely [a-zA-Z0-9_].
for example gnu grep 2.6 Version not supported \s and \d, and gnu grep 2.20 support \s But not supported \d
'\b'： Match empty characters at word boundaries Match the empty string at the edge of a word.
'\B'： Match empty characters at non word boundaries Match the empty string provided it's not at the edge of a
word.
'\<'： Matches empty characters at the beginning of a word Match the empty string at the beginning of word.
'\>'： Matches empty characters at the end of a word Match the empty string at the end of word.
'\w'： Match word components Match word constituent, it is a synonym for `[_[:alnum:]]'.
'\W'： Match non word components Match non-word constituent, it is a synonym for `[^_[:alnum:]]'.
'\s'： Match white space characters Match whitespace, it is a synonym for `[[:space:]]'.
'\S'： Match non white space characters Match non-whitespace, it is a synonym for `[^[:space:]]'.
'\d'： Match numbers it is a synonym for `[0-9]'.
'\D'： Match non numeric it is a synonym for `[^0-9]'.

For example, '\brat\b' matches the separate word 'rat', '\Brat\B' matches
'crate' but not 'furry rat'.

6. Character class , Note that some programs do not fully support all of the following character classes
'[:alnum:]' ：same as '[0-9A-Za-z]'.
'[:alpha:]' ：'[:lower:]' and '[:upper:]', same as '[A-Za-z]'.
'[:lower:]' ：
'[:upper:]' ：
'[:digit:]' ：'0 1 2 3 4 5 6 7 8 9'.
'[:xdigit:]' ：Hex digits: `0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f'.

'[:blank:]' ：space and tab.
'[:space:]' ：tab, newline, vertical tab, form feed, carriage return, and
space.
'[:punct:]' ：Punctuation characters; this is '! " # \$ % & ' ( ) * + , - . / :
; < = > ? @ [ \ ] ^ _ ` { | } ~'.
'[:print:]' ：'[:alnum:]', '[:punct:]', and space.
'[:graph:]' ：Graphical characters: '[:alnum:]' and '[:punct:]'.

'[:cntrl:]' ：Control characters. octal codes 000 through 037, and 177 (`DEL').

7. In the same expression , Matched characters cannot be matched for the second time . Because the purpose of regularization is ： After matching a character or string , Follow closely and match again .
For example, string "#c#", regular expression "(#.)(.#)" Can't match .
Another example is string "#cc#", regular expression "(#.)(.*)(.#)" Can match successfully , Only the second group can match null .

8." Look around " Anchoring , Namely lookaround anchor( also known as " Zero width assertion ", Indicates a match is a location , Not a character ).
with (?= Alternate left parenthesis for a left to right look around , for example (?=\d) Indicates that the condition is met when the right side of the current character is a number
with (?<= Alternate left parenthesis for right to left look around in reverse order , for example (?<=\d) Indicates that the condition is met when the left side of the current character is a number

* Forward looking ：(?=...) and (?!...), Exclamation mark table negative , That is, the characters to the right of the exclamation point cannot be matched .
* Reverse look ：(?<=...) and (?<!...)

An expression that looks backward must only represent a fixed length string , for example (?<=word) or (?<=word|word) sure , but (?<=word?) may not , because ? matching 0 or 1 length , Variable length .
stay PCRE in , Rewritable as (?<=word|words), but perl Not allowed in , because perl It is strictly required that the length must be fixed .

9. about " Look around " Anchoring , The most important thing to note is that the matching result does not take up any characters , It's just anchoring .
for example ：your name is longshuai MA and your name is longfei MA
use (?=longshuai) Will be able to anchor words in the first sentence "longshuai" Empty characters before , But it matches "longshuai" White space before ,
therefore (?=longshuai)long Can represent "long" These strings
So only for the two sentences here ,long(?=shuai) and (?=longshuai)long Is equivalent

10. Greedy matching , Inert matching and possessive priority matching
By default , For the expression of repetitions, it is greedy matching , Represent as many matches as possible .
Some advanced regular engines support lazy matching , Represent as few matches as possible , Stop as soon as conditions are met .

* *,    +,    ?     {M,N} ： All greedy matches (greedy)
* *?  +?  ??   {M,N}? ： It's all inert matching (lazy,Reluctant)
* *+,  ++,  ?+,   {M,N}+ ： It's all a priority match (possessive)
Possession priority is the same as curing group , As long as you own it, you don't exchange it , Backtracking not allowed . See the following for an example (?>...) Curing group method

11. Match pattern

* (?i)： Case insensitive , Available (?-i) Cancel the mode . for example "(?i)abc(?-i)cdB" Only for the middle abc Match case insensitive
*
because (?i) Fail when closing bracket is encountered , You can write the parts that need case insensitive matching into grouping brackets , for example "((?i)abc)cdB",(?:(?i)abc)cdB=(?i:abc)cdB
* (?x)：extend pattern , Multiple consecutive spaces and annotator to line end characters will be ignored
* (?m)：(multiline) Multiline mode , change ^ and \$ Match pattern for . In default mode , They match the beginning and the end of the string . In this mode ：
* ^ Match the first part of the string with the newline character . To match only the first part of a string , use \A.
* \$ Will match the end of the string , Empty characters before line breaks and line breaks . To match only the end of a string and the end of a line , use \Z, To match only the end of a string , use \z
* (?s)：(singleline or dotall) Single line mode , change "." Match pattern for , In default mode , spot "." Cannot match newline ,dotall You can
* (?U)：lazy Match pattern . The default is greedy matching .
12. Force literal interpretation ：\Q...\E. This sequence forces all characters in the middle of it to be literal , Very mandatory .
but perl and pcre Different .perl in , Variables can be referenced in the middle of the sequence for variable replacement , and pcre Middle variable symbols are also treated as normal characters .

13. General grouping and capture

* (),\$1,\$2,\$3,\$4... Used in some places \1,\2,\3,\4,sed Used in & Indicates all matches ,perl Use in \$&
* \g1,\g2,\g3 or \g{1},\g{2},\g{3}.
among \$1,\$2, ... For regular outside , and "\g1", "\g2", ... For regular inner

14. Named groups and captures

*
(?:...)： Unnamed capture , Group only , Not available for reference , Also known as Uncaptured brackets . for example "(1|one)(?:2|two)(3|three)",\$1=(1|one),\$2=(3|three)
* (?<NAME>...)： Named capture , Also named after group capture , Just like variable assignment . have access to \k<NAME> or \k'NAME' or \g{NAME} Method to reference
* (?>...)： Curing group . Once the match is successful, the content will never be returned ( It's easy to understand with the idea of backtracking ).
for example "hello world" Can be "hel.* world" Match , But not by "hel(?>.*) world" matching .
Because normally ,".*" Match to all , Then backtrack to release the matched content until the space " " character . After curing , Matched content will never be returned , So we can't go back .

15. Reset match ：\K Used to reset the matching position .
such as ,foot\Kbar matching ”footbar”, But the matching result is ”bar”. however , \K Use of does not interfere with content within a subgroup , such as
(foot)\Kbar matching ”footbar”, The results in the first subgroup will still be ”foo”.
\$ echo abc123abcfoo | grep -P -o '(abc)123\K\g1foo' abcfoo
16. To reverse a string match . It can be indirectly realized by forward looking anchoring and reverse looking anchoring .
for example ,"-a -3 ac c 3 b" Take negative number out of , Positive numbers and spaces are simple ,"-?[0-9]+|\s" that will do , But I want to get "-a ac c
b", At present, regular expressions can only pass through (?!) Realization of look around reverse ："((?!-?[0-9]+|\s).)*", Outer brackets indicate that the right side is not a positive number , Negative or blank characters are matched and grouped , Then repeat the quantifier *, Connect continuous content .
for example ：
echo "-a -3 ac c 3 b" | grep -P '((?!-?[0-9]+|\s).)*'
...