So far , Many programming languages and tools include support for regular expressions ,C# No exception ,C# The base class library contains a namespace (System.Text.RegularExpressions) And a series of classes that can give full play to the power of regular expressions (Regex,Match,Group etc. ). that , What is regular expression , How to define regular expressions ?


One , Regular expression basis

           What is regular expression

    When writing a handler for a string , There is often a need to find strings that meet certain complex rules . Regular expressions are tools for describing these rules . let me put it another way , A regular expression is code that records text rules .

     usually , We are using WINDOWS When finding files , Use wildcards (* and ?). If you want to find all Word When documenting , You can use *.doc Find , ad locum ,* Is interpreted as an arbitrary string . Similar to wildcards , Regular expressions are also tools for text matching , It's just like a wildcard , It can more accurately describe your needs —— of course , The price is more complicated .

          A simple example —— Verify phone number

The best way to learn regular expressions is to start with examples , Let's start by verifying the phone number , Step by step to understand regular expressions .

In our country , phone number ( as :0379-65624150) Usually contains 3 reach 4 For 0 The area code at the beginning and a 7 or 8 Number for , Hyphen in the middle ’-’ separate . In this case , First we will introduce a metacharacter \d, It's used to match a 0 reach 9 The number of . This regular expression can be written as :^0\d{2,3}-\d{7,8}$

Let's analyze him ,0 Match numbers “0”,\d Match a number ,{2,3} Indicates repetition 2 reach 3 second ,- Match only ”-” oneself , Next \d Also match a number , and  {7,8} Repeat 7 reach 8 second . of course , The phone number can also be written as
(0379)65624150, This is for the reader .

      A.  Metacharacter

In the example above , We're touching a metacharacter \d, As you think , Regular expressions have many more \d Same metacharacter , The following table lists some common metacharacters :





Match any character except newline


Match the beginning or end of a word


Match numbers


Match any whitespace


Match letters or numbers or underscores or Chinese characters


Start of matching string


End of matching string

surface 1, Common metacharacters

       B.  Escape character

    If you want to find the metacharacter itself , For example, you search . perhaps *, There's a problem : You can't specify them , Because they're interpreted differently . Then you have to use it \ To remove the special meaning of these characters . therefore , You should use \. and \*. of course , To find \ itself , You have to use it, too \\.

for example :unibetter\.com matching,C:\\Windows matching C:\Windows.

       C.   qualifier

A qualifier is also called a repeating description character , Indicates the number of times a character will appear . For example, we use when matching phone numbers {3,4} It means that 3 reach 4 second . Common qualifiers are :





Repeat zero or more times


Repeat one or more times


Repeat zero or once


repeat n second


repeat n Times or more


repeat n reach m second

surface 2, Common qualifiers

Two ,.NET Support for regular expressions in

    System.Text.RegularExpressions  Namespace contains some classes , These classes provide .NET Framework
Access to regular expression engine . This namespace provides regular expression capabilities , Can run from Microsoft .NET Framework Use this feature in any platform or language within .


    A, stay C# Using regular expressions in

I'm getting to know C# After regular expression supported classes in , Let's write the above regular expression to verify the phone number C# In code , Realize the verification of telephone number .

Step 1 , Create a SimpleCheckPhoneNumber Of Windows project .

Step 2 , introduce System.Text.RegularExpressions Namespace .

Step 3 , Write regular expressions . The regular expression here is the string of the verification number above . Because the above string can only verify phone numbers by hyphenating area codes and numbers , So we made some changes :0\d{2,3}-\d{7,8}|\(0\d{2,3}\)\d{7,8}. In this expression ,|  Part of the number one side is what we mentioned above , The latter part is used to verify (0379)65624150 This kind of telephone number . because  (   and   )  Also metacharacter , So use escape characters .|  Indicates branch matching , Or match the previous part , Or match the later part .

Step 4 , Constructing a regular expression Regex class .

Step 5 , use Regex Class IsMatch Method validation match .Regex Class IsMatch() Method returns a bool value , If there is a match , return true, Otherwise return false.


Three , Advanced regular expression

     A.  grouping

When matching phone numbers , We've used to repeat a single character . Let's learn how to use grouping to match a IP address .

as everyone knows ,IP The address is represented by a four segment dotted decimal string . therefore , We can group by address , To match . first , Let's match the first paragraph :2[0-4]\d|25[0-5]|[01]?\d\d?  This regular expression can match IP A number of addresses .2[0-4]\d  Match to 2 start , Ten are 0 reach 4, Three digit field with any number of digits ,25[0-5]  Match to 25
start , Bits are 0 reach 5 Three digit field of ,[01]?\d\d?  Match any 1 person 0 head , Fields with any number of digits and tens .?  Indicates zero or one occurrence . therefore , [01]  and
the last one  \d  Can not appear , If we add another one to the string  \.  To match .
You can divide it into sections . Now? , We put  2[0-4]\d|25[0-5]|[01]?\d\d?\.  As a group , It can be written as  (2[0-4]\d|25[0-5]|[01]?\d\d?\.) . Let's use this group next . Repeat this group twice , then , Reuse  2[0-4]\d|25[0-5]|[01]?\d\d?  That's it . The complete regular expression is : (2[0-4]\d|25[0-5]|[01]?\d\d?\.){3}2[0-4]\d|25[0-5]|[01]?\d\d?


    B. Backward reference

After we understand the grouping , So we can use backward references . So called backward reference , That's using the results captured earlier , Match subsequent characters . Multiple for matching repeating characters . Like matching go go
Such repeating characters . We can use (go) \1 To match .

By default , Each group will automatically have a group number , The rule is : Left to right , Marked by the group's left parenthesis , The group number of the first occurrence group is 1, The second is 2, and so on . of course , You can also specify the group name of the subexpression . Group name to specify a subexpression , Use this syntax :(?<Word>\w+)( Or change the angle bracket to ' Yes :(?'Word'\w+)), That's how \w+ The group name of is specified as Word 了 . To reverse reference the content captured by this group , You can use \k<Word>, So the last example can be written like this :\b(?<Word>\w+)\b\s+\k<Word>\b.

There is another benefit of customizing group names , In our C# In process , If you need to get the value of the group , We can clearly use the group name we defined to get , Without Subscripts .

When we don't want to use backward references , There's no need for the capture group to remember anything , In this case, it can be used (?:nocapture) Syntax to proactively tell the regular expression engine , Do not treat the contents of parentheses as capture groups , In order to improve efficiency .

    C. Zero width assertion

In the previous metacharacter introduction , We already know that there are such characters , Can match the beginning of a sentence , end (^
$) Or match the beginning of a word , end (\b). These metacharacters match only one position , Specify this location to meet certain conditions , Instead of matching certain characters , therefore , They are called   Zero width assertion
. So called zero width , They don't match any characters , And match a location ; So called assertion , It's a judgment . In regular expressions, matching continues only when the assertion is true .

In some cases , We match exactly one location , Not just sentences or words , This requires us to write assertions to match . Here is the syntax for assertions :


Assertion syntax



Forward affirmation , matching pattern Front position


Forward negative assertion , Not after matching pattern Location of


Backward affirmative assertion , matching pattern Back position


Backward negative assertion , Not before match pattern Location of

surface 3, Syntax and description of assertions

Is it hard to understand ? Let's take an example .

There is a label :<book>, We want to get the label <book> Tag name of (book), This time , We can use assertions to handle . Look at this expression :(?<=\<)(?<tag>\w*)(?=\>) , Use this expression , Can match <  and  > Characters between , It's here book. You can also write more complex expressions using assertions , There are no more examples here .

One more thing is very important , Is that the parentheses used in the assertion syntax are not used as capture groups , So you can't use numbers or names to reference it .

     D. Greed and laziness

When a regular expression contains a qualifier that accepts duplicates , The usual behavior is ( On the premise that the whole expression can be matched ) Match as many characters as possible . Take a look at this expression :a\w*b , Use it to match strings
aabab Time , The matching result is  aabab . This kind of matching is called greedy matching .

Sometimes , We want it to be as repetitive as possible , That is to say, the matching result obtained by the above example is  aab, And then we're going to use lazy matching . lazy match
You need to add one after the repeat qualifier  ?  Symbol , The above expression can be written as :a\w*?b  Let's match the string aabab Time , The matching result is  aab  and  ab .

Maybe you need to ask ,ab than aab Fewer repetitions , Why not match first ab What about ? In fact, there is more greedy in regular expressions / Rules with higher priority for laziness :
The first match has the highest priority ——The match that begins earliest wins.

     E. notes

grammar :(?#comment)

    for example :2[0-4]\d(?#200-249)|25[0-5](?#250-255)|[01]?\d\d?(?#0-199)

    be careful : If using notes , You need to be very careful not to precede the comment with a space , Some characters such as line breaks , If you can ignore these characters , It is better to use “ Ignore whitespace in pattern ” option , Namely C# in
RegexOptions Enumerated IgnorePatternWhitespace option (C# In RegexOptions Enumeration will be mentioned below ).

      F. C# Processing options in

stay C# in , have access to RegexOptions  Enumeration to select C# How to deal with regular expressions . Here is MSDN in RegexOptions  About members of enumeration :

      C# in Capture class ,Group class ,Match class

Capture class
: Represents the result in a single subexpression capture .Capture Class represents a substring in a single successful capture . This class does not have a public constructor , From Group Class or Match Get a Capture Object collection of class .Capture Class has three common properties , namely Index,Length and Value.Index Represents the position of the first character of the captured substring .Length Represents the length of the captured substring ,Value Represents a captured substring .

Group class
: Represents information grouped in a regular expression . This class provides support for group matching regular expressions . This class does not have a public constructor . From Match Get a Group Set of classes . If a group in a regular expression is named , You can access it by name , If not named , Can be accessed by subscript . be careful : every last Match Of Groups No 0 Elements (Groups[0]) It's all this Match Captured string , It's also Capture Of Value.

Match class
: Represents the result of a single regular expression match . This class also has no public constructor , From Regex Class Match() Method to get an instance of this class , It can also be used Regex Class Matches() Method to get a set of given classes .

All three classes can represent the result of a single regular expression matching , but Match Class to get more details , Contains capture and grouping information . therefore ,Match Class is the most commonly used of these three classes .