Character Sets and Collations
Along with the storage engine, another decision you can make when creating tables involves the character set and the collation. Both affect how text is handled by MySQL. These are relatively new features, having become prominent as of MySQL 4.1 (they existed in earlier versions of MySQL but not in this distinct manner).
The character set is the body of letters, numbers, and symbols that can be stored in a text column (it only applies to the text types). To specify the character set for an individual column, add the clause CHARACTER SET setname to your column definition, where setname is a specific set. You can find out what sets are available to you by running SHOW CHARACTER SET within the mysql client. The default character set is latin1, which covers English and other Western European languages, but there are also sets for Greek, Russian, Chinese, Korean, and more. At the time of this writing, MySQL supports over 70 character sets!
Related to this is the collation, which dictates how characters are compared. For example, a case-insensitive collation treats upper- and lowercase letters as equal. Another collation might address how accented characters in French are ordered. Each character set has a handful of available collations, from the over 30 that MySQL supports. For the default latin1 character set, latin1_swedish_ci is the default collation. Although it may not seem that way, this is a fine combination for handling English and most Western languages.
To choose a character set and collation:
Log in to the mysql client.
As with the storage engine, you should first know what options are available. To do so, you can ask your MySQL server from within the mysql client (see Chapter 2 for platform-specific instructions).
Determine what character sets are supported by your version of MySQL by running this query (Figure 4.3
): SHOW CHARACTER SET;
These results will differ from one installation of MySQL to the next. On my installation of MySQL 5.0.18 on Windows, there were 36 total.
Figure 4.3. Part of the list of available character sets.
Decide which character set to use.
The character set will correspond to the languages and types of characters being stored in your database. You can pick a different character set for each column, if necessary, so one column could have some text in English while another has a translation of that same text in Thai.
See which collations are available for that character set by running the following query (Figure 4.4
SHOW COLLATION LIKE 'latin1%'
To find the possible collations for the chosen character set, run this query. Change latin1
to whatever character set you are using.
Figure 4.4. The list of available collations for one particular character set.
Pick your collation.
From the list provided by MySQL (Figure 4.4), first note the default collation and which ones are compiled (meaning they are supported). Then, if you think you might need other than the default, read up on the other collations in the MySQL manual and choose accordingly.
Both the character set and the collation can be established on a server, database, table, or column level. If you don't specify the character set or collation for a column, the table's default set and collation will be used. The same applies to a table (which would use the database's defaults) and a database (which would use the server's defaults).
You can also set the character set for interactions when connecting to a MySQL database (e.g., when using the mysql client). See the MySQL manual for instructions.
The collation can be adjusted within a single SQL query. This will affect the ordering and grouping of your results for just that query.