Regular expressions are derived from the UNIX utility GREP and enable powerful text searches to be carried out using the special characters ^, $, ., *, +, -, [ ], [^], [-], and \. These characters have the following meanings:
^ |
At the beginning of a line a circumflex matches the start of a line. For instance ^while will find all lines starting with while.
|
$ |
At the end of a line a dollar matches the end of a line. For instance tomorrow$ will find all lines ending with tomorrow.
|
* |
An asterisk after a character will match zero or more occurrences of that character. For instance to* will match t, to, and too.
|
+ |
A plus sign after a character will match one or more occurrences of that character. For instance to+ will match to and too.
|
? |
A question mark after a character will match zero or one occurrence of that character. For instance to? will match t and to.
|
. |
A period matches any character. For instance p.n will match pan, pen, pin and pun.
|
| |
The vertical line character matches either expression it separates. For example pan|pen will match pan and pen.
|
( ) |
Characters can be grouped within parentheses. This allows certain expressions to act on more than one character. For instance find(ing)?s will match finds and findings. |
[ ] |
Characters in square brackets will match any one of the enclosed characters. For instance p[aei]n will match pan, pen, pin but not pun.
|
[^] |
A circumflex at the start of an expression within brackets will match any character except one of the enclosed characters. For instance p[^aei]n will match pun but not pan, pen or pin.
|
[-] |
A hyphen within brackets indicates a range of characters. For instance p[a-h]n will match pan and pen but not pin or pun.
|
\ |
A backslash before any of the above special characters treats that character literally. For instance \. will be treated as a period rather than as any character.
|
\w |
Matches any word character. Word characters are the characters a-z, A-Z, 0-9, _ and any other character recognised by your system such as é and ä. For instance resum\w will match resume and resumé.
|
\W |
Matches any non word character.
|
\s |
Matches any white space character including line endings. For instance text\ssearch will match text search even if it spans two lines.
|
\S |
Matches any character that is not a white space character.
|
\d |
Matches any digit character. For instance \d\d\d will match 999 and 101.
|
\D |
Matches any character that is not a digit character.
|
Within square brackets the special characters $, ., * and + are treated literally while ^ is only treated as a special character if it immediately follows a [.
Further Examples