12.14. Group Work: Regular Expressions (Regex)¶
It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home.
Note
If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Learning Objectives
Students will know and be able to do the following.
Content Objectives:
Learn about
search
andfindall
and what they returnLearn about some common quantifiers (*, +, ?, {n})
Learn about character classes (\d, \w, \s, .)
Learn about character sets ([an]) and ranges ([0-9])
Learn how to negate a character set using [^0-9]
Learn how to escape a special character to match it
Learn about greedy matching and how to make it not greedy
Learn how to return just part of a match using parentheses
12.14.1. Regex Methods¶
Two of the methods that you can use with regular expressions are search
and
findall
. Note that you must import re
to use these.
Run the code below to see what it prints.
12.14.2. Quantifiers¶
You can specify how many items to match using quantifiers. They refer to the
item to their left. The quantifiers are ?
, +
, *
, {n}
, and {n,m}
.
Run the code below to see what it prints.
- 0 to many
- No, this would be 'c*'
- 0 to 2
- No, this would be just 'c'
- exactly 2
- No, it will match strings that have more than 2 c's in a row.
- 2 or more
- This will match 2 c's but there can be more in the string.
11-9-6: How many c’s must there be in a row for c{2} to match at least part of the string?
-
11-9-8: Drag each symbol to the number of items it matches.
Look at the code above.
- ?
- Zero to one
- *
- Zero to many
- +
- One to many
- {2}
- Two
- {1,3}
- One, two, or three
12.14.3. Character Sets¶
You can use []
to specify that you need to match any one item in the []
.
Run the code below to see what it prints.
- Match either an 'e' or 'a' one time
- It will match one of the items listed in []
- Match 'ae' one time
- It will match one of the items listed in []
- Match either an 'e' or 'a' one to many times
- This would be true if it was [ae]+
- Match 'ae' one to many times
- This would be true if it was (ae)+
11-9-10: What does [ea]
mean?
12.14.4. Character Ranges¶
You can specify a range of items to match.
Run the code below to see what it prints.
- Match any digit or period one or more times. Special characters like '.' just match themselves in [].
- Items in the [] match themselves and are not treated as special characters other than '-'
- Match any digit or anything that isn't a new line one or more times
- The period in a [] just means match a period
- Match any digit or period zero to many times
- The + outside of the [] means match one or more
- Match any digit or anything that isn't a new line zero to many times
- The period in a [] just means match a period and the + means match one or more times
11-9-12: What does [0-9.]+
mean?
- Match anything other than 0-9 and a period zero to one times
- The + means one to many times
- Match anything other than 0-9 and a period one to many times
- Correct!
- Match ^ or 0-9 or a period zero to one times
- The ^ negates the items
- Match ^ or 0-9 or a period one to many times
- The ^ negates the items
11-9-13: What does [^0-9.]+
mean?
12.14.5. Character Classes¶
Run the code below to see what it prints.
-
11-9-15: Drag each item to what it matches
Look at the code above.
- .
- Any single character other than a newline
- \d
- A digit (0-9)
- \w
- A word character which is alphanumeric plus underscore
- \s
- A whitespace character (including space, tab, and newline)
Run the code below to see what it prints.
-
11-9-17: Drag each item to what it matches
Look at the code above.
- \W
- Any non-word character (not alphanumeric or underscore)
- \S
- Any non-whitespace character (not space, tab, or newline)
- \D
- Any non-digit character (not 0-9).
12.14.6. Escaping Special Characters¶
If you want to match something that is normally a special character in regex you must escape it by adding a \ in front of it.
Run the code below to see what it prints.
- 1
- It will match three digits followed by a period and then 2 digits
- 2
- It will match three digits followed by a period and then 2 digits
- 3
- It will match three digits followed by a period and then 2 digits
- 4
- It will match three digits followed by a period and then 2 digits
11-9-19: How many items will be in the list that the following code prints?
import re
str = "302.33 64.52 204.24 532.2 1.23 323.320"
res = re.findall("\d{3}\.\d{2}",str)
print(res)
12.14.7. Greedy and Non-Greedy Matching¶
Matching is usually greedy.
Run the code below to see what it prints.
If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share answers.
The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.