Regular Expressions

Home Topic  Previous Topic  Next Topic  Print this Topic

 

Regular expressions are derived from the UNIX utility GREP and enable powerful text searches to be carried out using the special characters ^, $, ., *, +, -, [ ], [^], [-], and \. These characters have the following meanings:

 

^

At the beginning of a line a circumflex matches the start of a line. For instance ^while will find all lines starting with while.

 

$

At the end of a line a dollar matches the end of a line. For instance tomorrow$ will find all lines ending with tomorrow.

 

*

An asterisk after a character will match zero or more occurrences of that character. For instance to* will match t, to, and too.

 

+

A plus sign after a character will match one or more occurrences of that character. For instance to+ will match to and too.

 

?

A question mark after a character will match zero or one occurrence of that character. For instance to? will match t and to.

 

.

A period matches any character. For instance p.n will match pan, pen, pin and pun.

 

|

The vertical line character matches either expression it separates. For example pan|pen will match pan and pen.

 

( )

Characters can be grouped within parentheses. This allows certain expressions to act on more than one character. For instance find(ing)?s will match finds and findings.

[ ]

Characters in square brackets will match any one of the enclosed characters. For instance p[aei]n will match pan, pen, pin but not pun.

 

[^]

A circumflex at the start of an expression within brackets will match any character except one of the enclosed characters. For instance p[^aei]n will match pun but not pan, pen or pin.

 

[-]

A hyphen within brackets indicates a range of characters. For instance p[a-h]n will match pan and pen but not pin or pun.

 

\

A backslash before any of the above special characters treats that character literally. For instance \. will be treated as a period rather than as any character.

 

\w

Matches any word character. Word characters are the characters a-z, A-Z, 0-9, _ and any other character recognised by your system such as é and ä. For instance resum\w will match resume and resumé.

 

\W

Matches any non word character.

 

\s

Matches any white space character including line endings. For instance text\ssearch will match text search even if it spans two lines.

 

\S

Matches any character that is not a white space character.

 

\d

Matches any digit character. For instance \d\d\d will match 999 and 101.

 

\D

Matches any character that is not a digit character.

 

 

Within square brackets the special characters $, ., * and + are treated literally while ^ is only treated as a special character if it immediately follows a [.

 

Further Examples

Colou?r

will match color and colour.

 

p[a-k]+n

will match pan, pen, pin and pain.

 

th.*y

will match thy, they and theoretically.