Source code editor What Is Ajax
The following sections discuss various aspects of character set collations.
COLLATE clause, you can override whatever the default collation is for a comparison.
COLLATE may be used in various parts of SQL statements. Here are some examples:
SELECT k FROM t1 ORDER BY k COLLATE latin1_german2_ci;
SELECT k COLLATE latin1_german2_ci AS k1 FROM t1 ORDER BY k1;
SELECT k FROM t1 GROUP BY k COLLATE latin1_german2_ci;
With aggregate functions:
SELECT MAX(k COLLATE latin1_german2_ci) FROM t1;
SELECT DISTINCT k COLLATE latin1_german2_ci FROM t1;
SELECT * FROM t1 WHERE _latin1 'Mьller' COLLATE latin1_german2_ci = k;
SELECT * FROM t1 WHERE k LIKE _latin1 'Mьller' COLLATE latin1_german2_ci;
SELECT k FROM t1 GROUP BY k HAVING k = _latin1 'Mьller' COLLATE latin1_german2_ci;
COLLATE clause has high precedence (higher than
||), so the following two expressions are equivalent:
x || y COLLATE z x || (y COLLATE z)
BINARY operator casts the string following it to a binary string. This is an easy way to force a comparison to be done byte by byte rather than character by character.
BINARY also causes trailing spaces to be significant.
SELECT 'a' = 'A';-> 1 mysql>
SELECT BINARY 'a' = 'A';-> 0 mysql>
SELECT 'a' = 'a ';-> 1 mysql>
SELECT BINARY 'a' = 'a ';-> 0
BINARY is shorthand for
str AS BINARY)
BINARY attribute in character column definitions has a different effect. A character column defined with the
BINARY attribute is assigned the binary collation of the column's character set. Every character set has a binary collation. For example, the binary collation for the
latin1 character set is
latin1_bin, so if the table default character set is
latin1, these two column definitions are equivalent:
CHAR(10) BINARY CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin
The effect of
BINARY as a column attribute differs from its effect prior to MySQL 4.1. Formerly,
BINARY resulted in a column that was treated as a binary string. A binary string is a string of bytes that has no character set or collation, which differs from a non-binary character string that has a binary collation. For both types of strings, comparisons are based on the numeric values of the string unit, but for non-binary strings the unit is the character and some character sets allow multi-byte characters. Section 11.4.2, “The
The use of
CHARACTER SET binary in the definition of a
TEXT column causes the column to be treated as a binary data type. For example, the following pairs of definitions are equivalent:
CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T;
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should this query use the collation of the column
x, or of the string literal
Standard SQL resolves such questions using what used to be called “coercibility” rules. Basically, this means: Both
'Y' have collations, so which collation takes precedence? This can be difficult to resolve, but the following rules cover most situations:
COLLATE clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by functions such as
VERSION()) has a coercibility of 3.
A literal's collation has a coercibility of 4.
NULL or an expression that is derived from
NULL has a coercibility of 5.
The preceding coercibility values are current as of MySQL 5.0.3. In MySQL 5.0 prior to 5.0.3, there is no system constant or ignorable coercibility. Functions such as
USER() have a coercibility of 2 rather than 3, and literals have a coercibility of 3 rather than 4.
Those rules resolve ambiguities in the following manner:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement will not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It will return a result, and the character set of the result will be
utf8. The collation of the result will be the collation of
utf8_column. Values of
latin1_column will be automatically converted to
utf8 before concatenating.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
|Use collation of |
|Use collation of |
COERCIBILITY() function can be used to determine the coercibility of a string expression:
SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);-> 0 mysql>
SELECT COERCIBILITY(VERSION());-> 3 mysql>
SELECT COERCIBILITY('A');-> 4
Each character set has one or more collations, but each collation is associated with one and only one character set. Therefore, the following statement causes an error message because the
latin2_bin collation is not legal with the
latin1 character set:
SELECT _latin1 'x' COLLATE latin2_bin;ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1'
Suppose that column
X in table
T has these
latin1 column values:
Muffler Mьller MX Systems MySQL
Suppose also that the column values are retrieved using the following statement:
SELECT X FROM T ORDER BY X COLLATE
The following table shows the resulting order of the values if we use
ORDER BY with different collations:
|Mьller||MX Systems||MX Systems|
The character that causes the different sort orders in this example is the U with two dots over it (
ь), which the Germans call “U-umlaut.”
The first column shows the result of the
SELECT using the Swedish/Finnish collating rule, which says that U-umlaut sorts with Y.
The second column shows the result of the
SELECT using the German DIN-1 rule, which says that U-umlaut sorts with U.
The third column shows the result of the
SELECT using the German DIN-2 rule, which says that U-umlaut sorts with UE.
Source code editor What Is Ajax