import re
# the pattern here is `Mohammed`
print( re.findall("Mohammed", "My name is Mohammed"))
[ ]
allows you to choose a single character out of the included set of characters. [ ]
is to see it as a dash _____
that must be filled with a single character from the characters inside the[ ]
. # the pattern here is `Mohammed`
print( re.findall("gr[ea]y", "In American English, it is grey, but in British English it is gray."))
[ ]
allow you to choose ONLY one character.print(re.findall("gr[aa]y", "graay"))
print(re.findall("gr[aae]y", "graey"))
range
[0-9]
matches a single digit from 0 to 9[a-f]
is similar to [abcdef]
print(re.findall("0x[0-9a-f]", "0x1f"))
print(re.findall("0x[0-9][a-f]", "0x1f"))
negation
[^a]
means all characters match except a
print(re.findall("gr[^a]y", "grey"))
\d
matches a single digit [0-9]
\w
matches a single string (letter, number, underscore) [a-zA-Z0-9_]
\s
matches a single whitespace.
matches all characters except line break print(re.findall("class_\d", "class_1"))
print(re.findall("class\s\d", "class 10"))
print(re.findall("\w\w\w\w\w\s\d\d", "class 10"))
print(re.findall(".......", "1%0=NaN"))
Anchors match positions and not characters.
^
matches at the beginning of a string$
matches at the end of a string\b
matches a word boundary TODOprint(re.findall("^Ali", "My name is Ali"))
print(re.findall("Ali$", "My name is Ali"))
print(re.findall("^Ali", "Ali is my name"))
|
is similar to or
. It allows choosing between many patterspattern = 'Mohammed|Ali|boxer'
string = "Mohammed is my name. Mohammed Ali is my full name, and I'm a boxer"
print( re.search(pattern, string).group() )
( )
. For example, (pattern1| pattern1|pattern2|...)
.pattern = '(grey|gray) is a nice color'
string = 'Gray is a nice color'
print(re.search(pattern, string, re.I).group()) # re.I ignores letter case
+
matches the precesing token 1 or more times?
makes the preceding token in the regular expression optional (0 or 1)*
matches the precesing token 0 or more times{n,m}
= n
(m
) is the minimum (maximum) number of repetitions. print(re.findall("\w+\s\d", "class 10"))
print(re.findall("colou?r", "color or colour"))
print(re.findall("\d*\stimes", "2 times 33 times 9999 times"))
print(re.findall("\w{5}\s?\d+", "class 10 class_10 class_123"))
print(re.search("data(set)?", "data" ).group())
print(re.search("data(set)?", "dataset" ).group())
<em>
and </em>
and <em>website</em>
. The default behavior is to capture the longest substringprint(re.search("<.+>", "This is my first <em>website</em>").group())
?
after the repetition operator, in this case after the +
.print(re.search("<.+?>", "This is my first <em>website</em>").group())