Regular expression in Python

  • re is the Python module to handle regular expressions
  • In this tutorial we adapt the following convention:
    • pattern refers to a regular expression pattern
    • string refers to group of characters
    • flags refers to control flags that re module uses to customize its processing
  • re.search(pattern, string, flag)
    • The search method looks for the first match of the pattern in the string and return a group object.
    • The group object has a span field that contains the beginning and the end indices of the found substring. And its match field contains the found substring
In [2]:
import re # import re module to handle regular expressions 

pattern = "Ali"
string = "My name is Ali"
print( re.search(pattern, string))
<_sre.SRE_Match object; span=(11, 14), match='Ali'>

match

  • re.match(pattern, string, flag)
    • The match method is similar to the search method except that it matches the pattern only to the beginning of the string.
In [86]:
pattern = "Ali"
string = "My name is Ali"
print( re.match(pattern, string))

string = "Ali is my name"
print( re.match(pattern, string))
None
<_sre.SRE_Match object; span=(0, 3), match='Ali'>

The group object

  • search and match methods return an group object that contains the found substring or None. Therefore, we can test the returned value before processing it.
In [87]:
pattern = "Ali"
string = "My name is Noor"
groupObj = re.search(pattern, string)
print(bool(groupObj))

pattern = "Noor"
groupObj = re.search(pattern, string)
print(bool(groupObj))
False
True
  • To access the found substring in the return group object invoke the group() method on the returned object.
In [88]:
pattern = "Noor"
string = "My name is Noor"

groupObj = re.search(pattern, string)
if groupObj:
    print(groupObj.group())
Noor

start/end

  • To access the starting (ending) index of the substring in the group object invoke start() (end()) method.
In [89]:
pattern = "Noor"
string = "My name is Noor"

groupObj = re.search(pattern, string)
if groupObj:
    print(groupObj.start())
11

findall

  • re.findall(patter, string, flag)
    • findall returns a list of strings that match the pattern in the string
In [85]:
pattern = 'Mohammed|Ali' # regular expression to look for the word Mohammed or Ali
string = "Mohammed is my name. Mohammed Ali is my full name"
print( re.findall(pattern, string))
['Mohammed', 'Mohammed', 'Ali']

Literal Matching

  • search, match, and findall search on a substring level
  • The search is letter case sensetive, unless the flag re.I is set.
In [5]:
print(re.search('a', 'ali').group())

groupObj =  re.search('a', 'Ali');
if groupObj:
    print('`a` is in `Ali`')
else:
    print('`a` is NOT in `Ali`')
        
groupObj =  re.search('a', 'Ali', re.I)
if groupObj:
    print('`re.I`  ignores letter case. So `a` is in `Ali`')
a
`a` is NOT in `Ali`
`re.I`  ignores letter case. So `a` is in `Ali`
In [11]:
p = re.compile("x*")
p.sub('XY', 'xxx xx x')
Out[11]:
'XY XY XY'