In Chapter 5, "Basic SQL," I discuss and demonstrate how you can use the terms LIKE and NOT LIKE to match a string against the values in a column. Furthermore, you can use the single (_) and multiple (%) wildcard characters to add flexibility to what is matched. For example,
SELECT * FROM users WHERE first_name LIKE 'John%'
In the preceding section of the chapter, I show you can do different types of searches, using FULLTEXT indexes. Thanks to regular expressions, which MySQL also supports, you can perform a third kind of search. With regular expressions you can define more elaborate patterns to match, rather than variations on literal values. You then use REGEXP and NOT REGEXP (or RLIKE and NOT RLIKE):
SELECT * FROM users WHERE first_name REGEXP'^(Jo)h?n+.*'
This query differs from the LIKE query that precedes it in that it will match John, Johnathon, Jon, Jonathon, etc., whereas the earlier one would match only John or Johnathon. Since this may not make much sense to you if you are unfamiliar with regular expressions, I'll go over how patterns are written in a little more detail.
To define a pattern, you use a combination of literals (e.g., a matches only the letter a) and special characters (Table 10.2).
Table 10.2. Here are the different special characters you can use when writing regular expressions.
Special Regular Expression Characters
any single character
zero or one q
zero or more q's
at least one q
x instances of q
at least x instances of q
up to x instances of q
between x and y instances of q
starts with q
ends with q
grouping (matches pqr)
either q or z
character classes (e.g., [a-z], [0-9])
escapes a special character (\., \*, etc.)
To match a string beginning with ab, you would use ^ab.*: the literal ab plus zero or more of anything (represented by the period). To match a string that contains the word color or colour, you would write col(or)|(our): the literal col plus either or or our.
Obviously, the hardest part of using regular expressions (in MySQL or anywhere else) is coming up with accurate and useful patterns. Defining this pattern is frequently a matter of finding a balance between one that is too lenient and another that is too strict.
To use REGEX and NOT REGEX:
Log in to the mysql monitor and select the accounting database.
Find which clients have a toll-free number (Figure 10.16
Figure 10.16. Regular expressions can add flexibility to your queries...
SELECT client_name, client_phone
FROM clients WHERE client_phone
Toll-free numbers (in the United States and Canada) have an 800, 888, or 877 area code. The pattern in this regular expression will match any column whose value begins with one of those three.
The pattern starts by saying that it must match the beginning of a string (^
). Then it looks for zero or one (i.e., optional) opening parenthesis. Since this character already has special meaning within patterns, it's put within brackets to treat it literally. Next, exactly one 8 is required. After that, the string must have 00, 88, or 77. So this pattern matches anything that starts with (800, (888, (877, 800, 888, or 877
Retrieve all of the invalid contact email addresses (Figure 10.17
Figure 10.17. ...or assist in validating column values.
SELECT client_name, contact_first_name, contact_email FROM clients
WHERE contact_email NOT REGEXP
The pattern defined here for recognizing usable email addresses is very basic and perhaps overly lenient but should suffice for testing purposes. It just checks that an email address contains letters, numbers, the underscore or a period, followed by the ampersand, followed by anything, followed by a period, ending with between two and four letters. You can always make your patterns more (or less) exacting, per your needs.
For more information about regular expressions, do a quick search online. You can come up with more thorough tutorials as well as example patterns.
Unless you specifically indicate that a pattern should match the beginning or end of a value, a match will be made if the pattern is found anywhere within the value. In other words, the query
SELECT client_name, client_phone FROM clients WHERE client_phone
would actually match a phone number like 312-888-1234 as well as anything starting with those three prefixes.
Regular expression searches can never take advantage of indexes, making them noticeably less efficient.