The Concept of Regular Expressions

A regular expression specifies a pattern of characters. You can, for example, specify a pattern like we saw for data like a North American phone number that has three digits followed by a dash followed by three digits followed by a dash and four more digits. Because regular expressions are designed to be very flexible, you can also specify patterns where a character or group of characters are repeated a certain number of times, or patterns in which a certain sequence of characters appears in a specific place in a string (e.g., at the beginning or end).

Once you’ve written a regular expression (pattern) you can then match strings against it. That is, applying a regex to a string will tell you whether the string contains the pattern in question. For example, you might write one regex that specifies the pattern that phone numbers must have to be valid, and another for valid e-mail addresses. When the user enters these data into a form, you might then match his/her input against your regular expressions. If the strings the user enters match your patterns, you might let form submission proceed; but if not, you might cancel the form submission and alert the user that some of the data he/she entered is invalid. We’ll see this exact usage of regular expressions in Chapter 14, which discusses Web forms and validation. For now we’ll focus on how to specify patterns against which you can check strings. Later, we’ll discuss how to do more advanced tasks such as replace a portion of a string that matches a particular pattern.

Note

The term “regular expression” comes from a branch of computer science that deals with the recognition of languages. The kinds of strings you can match using regular expressions are called “regular” because they’re very simple for computers to recognize. It’s straightforward but extremely tedious to write such pattern matching code ourselves, so instead we specify the pattern and the computer generates and then runs the recognition code.