# Python: Apply regular expression with regex

Regular expressions provide a flexible way to search or match string patterns in text. A single expression, commonly called a regex, is a string formed according to the regular expression language.Python’s built-in re module is responsible for applying regular expressions to strings.

In this blog, I’ll first introduce regular expression syntax, and then apply them in some examples.

## Syntax

### Special characters

Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.

• . (dot) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
• ^ (caret) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.

## Examples

This module provides regular expression matching operations similar to those found in Perl. The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with ‘r’. So r”\n” is a two-character string containing ‘' and ‘n’, while “\n” is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

### Match digit

\d matches decimal digit, {1,3} stands for length is 1 to 3. In the “value”, we find 1, 10 and 100.

### Match word

\w+ matches 1 or more alphanumeric characters, so we can find all words but spaces and punctuations.

### Match 0 or 1 or more

\d* matches 0 or more digits, so we can find a ‘1’ and others are ''.

\d* matches at least 1 digits, so we can find only ‘1’.

\d? means 0 or 1 digit, so we get ‘’, ‘1’, ‘’.

### Match starting and ending

^1 means not 1, so ^[^1]{2}[^3]\$ matches 3 characters which don’t start with 1 and don’t end with 3.

### Match character class

This pattern means at least 2 characters which start with 1 or more upper-case letter(s), followed with 1 or more lower-case letter(s).

### Match group

In the above example, I want to separate e-mail address into 3 parts, 1st part is a particular email account (before “@”), the second is second-level domain (after “@”) and the third part, top-level domain. Since I specify the 1st part only contains lower-case letters or numeric characters, so for the second example, regular expression doesn’t match the first 3 upper-case letters.

### Find in dataframe

Moreover, we can also find something with regex pattern. In this example, I created a dataframe, which contains 3 columns, “Name”, “Birthday’ and “Email”. Now I want to find e-mail address whose account and second-level domain contain only lower-case letters, and its top-level domain contains 2 or 3 lower-case letters. Thus, among 4 e-mail addresses, the second one satisfies our pattern.

### Use regex to replace character in dataframe

E-mail addresses contain “@” to specify the second-level. However, we want to replace it to “[at]” in the dataframe. We can use re.sub() to realise it.

### Use regex to modify values in some columns

When we need to change values’ order in one column, we can firstly use re.match() and Match.groups() to separate it into multiple groups, then put them in order.

There is much more to regular expressions in Python, we can find most of them here.