When I started to use python regex, the most commonly used are .
*
?
()
[]
\d
\s
and etc. However, I have sometimes met problems like I cannot partial match strings or control the groups as I needd.
So, here I found some techniques which would be helpful.
Expressions
-
(?:A)
Cancel the group function of
()
. That means the content matchedA
would not form a group. -
(?<=B)A
Positive lookbehind assertion. This matches the expression
A
only ifB
is immediately to its left. This can only matched fixed length expressions. -
A(?=B)
Lookahead assertion. This matches the expression
A
only if it is followed byB
. -
A(?!B)
Negative lookahead assertion. This matches the expression
A
only if it is not followed byB
. -
(...)\1
The number
1
corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from1
up to99
such groups and their corresponding numbers. -
(?aiLmsux)
Here,
a
,i
,L
,m
,s
,u
, andx
are flags:a
— Matches ASCII onlyi
— Ignore caseL
— Locale dependentm
— Multi-lines
— Matches allu
— Matches unicodex
— Verbose
-
(? )
Inside parentheses like this,
?
acts as an extension notation. Its meaning depends on the character immediately to its right. -
(?PAB)
Matches the expression
AB
, and it can be accessed with the group name.
Functions
-
re.findall(A, B)
Matches all instances of an expression
A
in a stringB
and returns them in a list. -
re.search(A, B)
Matches the first instance of an expression
A
in a stringB
, and returns it as a re match object.re.match(A, B)
Similar to
re.search
, whilere.match
will stop and returnNone
, if does not match pattern at the beginning of the string. -
re.split(A, B)
Split a string B into a list using the delimiter
A
. -
re.sub(A, B, C)
Replace
A
withB
in the stringC
.re.subn(A, B, C, D)
Same as
re.sub
, whileD
means the relace times, if not specified, it will replace all matched patterns.