Regular expressions (regexes) are a powerful tool for matching patterns in text. They are widely used in programming, text processing, and data validation.
Now picture trying to write code that can identify each of these as a phone number and extract the digits. Writing code using selection logic to handle each possible format would be tedious and error-prone.
Bracket Groups: Square brackets can be used to indicate a set of characters to match, e.g., [abc] matches a single character that is either βaβ, βbβ, or βcβ.
Ranges: Inside brackets you can use - to indicate a range of characters. For example, [A-F] matches any letter in the range A to F. You can specify multiple ranges like [A-Fa-f] to match both uppercase and lowercase letters.
For example, a{3} matches exactly three βaβs, while a* matches zero or more βaβs. ab? makes the preceding element (βbβ in this case) optional. It would match either "a" or "ab".
We will not attempt to cover every aspect of regular expression syntax here. Instead, you are encouraged to explore a resource like RegexOne if you want to learn more.
We can use ( ) to form groups in our patterns. This allows us to apply quantifiers to entire groups. For instance, to say βthere may be a group of 3 digitsβ, you could use (\d{3})? .
Groups also allow you to capture parts of the match for further processing. For example, the regex (\d{3})-(\d{4}) matches a pattern like "456-7890", capturing the prefix (456) and the local number (7890) as separate groups.
Next there may or may not be a separator symbol like a dash or space or period. We will allow it to be any character by using the . symbol and make it optional with a question mark: .?
This isnβt a perfect regex for phone numbers, but it would match the examples above. It also illustrates the fact that regexes can get complex quickly and end up being quite difficult to read.