Now that you have looked at how to match sequences of characters, each of which occur exactly once,
the next section moves on to look at matching characters that can occur a variable number of times.
Matching Optional Characters
Matching literal characters is straightforward, particularly when you are aiming to match exactly one lit-
eral character for each corresponding literal character that you include in a regular expression pattern. The
next step up from that basic situation is where a single literal character may occur zero times or one time.
In other words, a character is optional. Most regular expression dialects use the question mark (
ter to indicate that the preceding chunk is optional. I am using the term “chunk” loosely here to mean the
thing that precedes the question mark. That chunk can be a single character or various, more complex reg-
ular expression constructs. For the moment, you will deal with the case of the single, optional character.
For example, suppose you are dealing with a group of documents that contain both U.S. English and
You may find that words such as
(in U.S. English) appear as
(British English) in some
documents. You can express a pattern to match both words like this:
You may want to standardize the documents so that all the spellings are U.S. English spellings.
Matching an Optional Character
Try this out using the Komodo Regular Expression Toolkit:
Open the Komodo Regular Expression Toolkit and clear any regular expression pattern or text
that may have been retained.
Insert the text
into the area for the text to be matched.
Enter the regular expression pattern
into the area for the regular expression pattern.
is matched, as shown in Figure A-14.
Try this regular expression pattern with text such as that shown in the sample file
Red is a color.
His collar is too tight or too colouuuurful.
These are bright colours.
These are bright colors.
Calorific is a scientific term.
“Your life is very colorful,” she said.
in the line
Red is a color.
will match the pattern
Appendix A: Simple Regular Expressions
bapp01.qxd:bapp01 10:47 326