import re
# the pattern here is `Mohammed`
print( re.findall("Mohammed", "My name is Mohammed"))
[ ] allows you to choose a single character out of the included set of characters. [ ] is to see it as a dash _____ that must be filled with a single character from the characters inside the[ ]. # the pattern here is `Mohammed`
print( re.findall("gr[ea]y", "In American English, it is grey, but in British English it is gray."))
[ ] allow you to choose ONLY one character.print(re.findall("gr[aa]y", "graay"))
print(re.findall("gr[aae]y", "graey"))
range
[0-9] matches a single digit from 0 to 9[a-f] is similar to [abcdef]print(re.findall("0x[0-9a-f]", "0x1f"))
print(re.findall("0x[0-9][a-f]", "0x1f"))
negation
[^a] means all characters match except aprint(re.findall("gr[^a]y", "grey"))
\d matches a single digit [0-9]\w matches a single string (letter, number, underscore) [a-zA-Z0-9_]\s matches a single whitespace. matches all characters except line break print(re.findall("class_\d", "class_1"))
print(re.findall("class\s\d", "class 10"))
print(re.findall("\w\w\w\w\w\s\d\d", "class 10"))
print(re.findall(".......", "1%0=NaN"))
Anchors match positions and not characters.
^ matches at the beginning of a string$ matches at the end of a string\b matches a word boundary TODOprint(re.findall("^Ali", "My name is Ali"))
print(re.findall("Ali$", "My name is Ali"))
print(re.findall("^Ali", "Ali is my name"))
| is similar to or. It allows choosing between many patterspattern = 'Mohammed|Ali|boxer'
string = "Mohammed is my name. Mohammed Ali is my full name, and I'm a boxer"
print( re.search(pattern, string).group() )
( ). For example, (pattern1| pattern1|pattern2|...).pattern = '(grey|gray) is a nice color'
string = 'Gray is a nice color'
print(re.search(pattern, string, re.I).group()) # re.I ignores letter case
+ matches the precesing token 1 or more times? makes the preceding token in the regular expression optional (0 or 1)* matches the precesing token 0 or more times{n,m} = n (m) is the minimum (maximum) number of repetitions. print(re.findall("\w+\s\d", "class 10"))
print(re.findall("colou?r", "color or colour"))
print(re.findall("\d*\stimes", "2 times 33 times 9999 times"))
print(re.findall("\w{5}\s?\d+", "class 10 class_10 class_123"))
print(re.search("data(set)?", "data" ).group())
print(re.search("data(set)?", "dataset" ).group())
<em> and </em> and <em>website</em>. The default behavior is to capture the longest substringprint(re.search("<.+>", "This is my first <em>website</em>").group())
? after the repetition operator, in this case after the +.print(re.search("<.+?>", "This is my first <em>website</em>").group())