12.15. Group Work: More Regular Expressions (Regex)¶
It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home.
Note
If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Learning Objectives
Students will know and be able to do the following.
Content Objectives:
Learn about using | as a logical or
Learn about matching groups and non matching groups
Learn about anchor characters (^, $, and \b)
Learn about raw strings
Learn how to negate a character set
12.15.1. Using a logical “or”¶
What if you want to match a month from 1 to 12 in MM/DD/YYYY? You can’t use [1-12] since it matches a character at a time. You have to match either a digit from 1 to 9 or a 1 followed by 0, 1, or 2.
To use a logical or to match one of two expressions use (left|right)
. This will match either the expression on the left or the one on the right.
Run the code below to see what it prints.
- "0([1-9]|1[0-2])/\d{2}/\d{4}"
- This would require a 0 before a 1-9
- "0*([1-9]|1[0-2])/\d{2}/\d{4}"
- This would match 0 to many 0's
- "0+([1-9]|1[0-2])/\d{2}/\d{4}"
- This would require at least one 0
- "0?([1-9]|1[0-2])/\d{2}/\d{4}"
- This matches 0 to 1 0's
11-9-2: Sometimes dates have a leading zero if the month is from 1 to 9. Which of the following would match that case as well but still match if there isn’t a 0?
12.15.2. Specifying What to Extract - Matching Groups¶
There are times when you want to return just part of what was matched.
Run the code below to see what it prints.
Note
Parentheses are used to define a capture group - only what is in the parentheses will be returned.
12.15.3. Specifying What to Extract - Non-Matching Groups¶
What if we need the parentheses because we are using a logical or but want the whole match to be returned? We can add a “?:” after the first parenthesis to group items for the logical or but return the entire match.
Run the code below to see what it prints.
Another approach is to enclose everything in a set of outer parentheses if you have any inner parentheses.
Run the code below to see what it prints.
- l.append(match)
- This would add the tuple not the date
- l.extend(match)
- Use extend to add two lists together
- l.append(match[0])
- This will add the date to the list (the first element in the tuple)
- l.extend(match[0])
- Use extend to add two lists together
11-9-7: Given the following code which of the following would you use to get the current date and add it to the list?
import re
str = "The dates were 9/11/2022, 10/15/2022, 11/20/2022, and 12/01/2022"
# get the dates
l = []
matches = re.findall("(([1-9]|1[0-2])/\d{2}/\d{4})", str)
for match in matches:
# line to get current date and add to the list
12.15.4. Boundary or Anchor Characters¶
Run the code below to see what it prints.
- Return the first match that it finds.
- It does not do this.
- Return a match if it is at the beginning of the string.
- Correct. It returns a match only if it is at the beginning of a string.
- Return a match if it is at the end of the string.
- It does not do this, however any anchor character does.
- Return a match if it is a whole word, not just part of a word.
- It does not do this.
11-9-9: What does the ‘^’ do?
Run the code below to see what it prints.
- Return the first match that it finds.
- It does not do this.
- Return a match if it is at the beginning of the string.
- It does not do this, but the '^' does.
- Return a match if it is at the end of the string.
- Correct! It matches only at the end of the string.
- Return a match if it is a whole word, not just part of a word.
- It does not do this.
11-9-11: What does the ‘$’ do?
Note
Since ‘$’ is an anchor character if you want to match a ‘$’ use ‘\$’.
Run the code below to see what it prints.
- Return the first match that it finds.
- It does not do this.
- Return a match if it is at the beginning of the string.
- It does not do this, but the '^' does.
- Return a match if it is at the end of the string.
- It does not do this, but the '$' does.
- Return a match if it is a whole word, not just part of a word.
- Correct! It matches if it is a whole word, not just part of a word.
11-9-13: What does the ‘\b’ do?
Note
Since ‘\b’ usually represents a backspace in a Python string you must use ‘r’ before the string to treat it as a raw string. You only need to add the r in front of the string if the expression has a ‘\b’ in it.
12.15.5. Negating a Character Set¶
You can negate a character set using the ‘^’ after the ‘[‘.
Run the code below to see what it prints.
- If the string has only uppercase and lowercase alphabetic characters.
- It also allows digits.
- If the string has only uppercase and lowercase alphabetic characters or numeric digits.
- Correct! It returns true if the string only has alphabetic characters or numeric digits.
- If the string has only numeric digits.
- It also allows alpabetic characters.
- If the string has only uppercase and lowercase alphabetic characters, numeric digits, or special characters like '!{}[]'.
- It does not do this.
11-9-15: Which of the following best describes when passwordChecker
returns true?
-
11-9-16: Drag each symbol to what it matches.
Look at the code above.
- $
- Match only at the end of the string
- ^
- Match only at the beginning of the string
- \b
- Match if a whole word (not part of a word)
- [^]
- Match the opposite of the character set
If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share answers.
The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.