浅谈MySQL字符集

 
Preface
 
    MySQL use character set & collation to organize the different charater.It provides a flexible way in setting individual character set on a database,a table even on a single column of table.Each character set has a series of collations with one default collation.We can generally see the character set in MySQL as the combination of code page & character encoding.
    In the early version of MySQL(eg. version 5.5) ,latin1 is the default character set which does not support Chinese characters.There're some other commonly used character set such as GBK,UTF-8.MySQL 5.7 chooses UTF-8 as default character set nowadays in order to support more characters of different languages.
    Messy code is a general issue about character set.It always occur in our MySQL databases if we do not use character rules appropriately.Worse,it leads to data loss in some cases what is really a big trouble we should avoid.
 
Introduce
 
    MySQL provides a lot of parameters to specify character set in various dimentionality.We should know clearly about the principle and function of each parameter to avert underlying messy code issue.Let's see details about it.
 
Procedure
 
Check the character set supported by MySQL(version 5.7).
 1 ([email protected] mysql3306.sock)[(none)]>show character set;
 2 +----------+---------------------------------+---------------------+--------+
 3 | Charset  | Description                     | Default collation   | Maxlen |
 4 +----------+---------------------------------+---------------------+--------+
 5 | big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
 6 | dec8     | DEC West European               | dec8_swedish_ci     |      1 |
 7 | cp850    | DOS West European               | cp850_general_ci    |      1 |
 8 | hp8      | HP West European                | hp8_english_ci      |      1 |
 9 | koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
10 | latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
11 | latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
12 | swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
13 | ascii    | US ASCII                        | ascii_general_ci    |      1 |
14 | ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
15 | sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
16 | hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
17 | tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
18 | euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
19 | koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
20 | gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
21 | greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
22 | cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
23 | gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
24 | latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
25 | armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
26 | utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
27 | ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
28 | cp866    | DOS Russian                     | cp866_general_ci    |      1 |
29 | keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
30 | macce    | Mac Central European            | macce_general_ci    |      1 |
31 | macroman | Mac West European               | macroman_general_ci |      1 |
32 | cp852    | DOS Central European            | cp852_general_ci    |      1 |
33 | latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
34 | utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
35 | cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
36 | utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
37 | utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
38 | cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
39 | cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
40 | utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
41 | binary   | Binary pseudo charset           | binary              |      1 |
42 | geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
43 | cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
44 | eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
45 | gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
46 +----------+---------------------------------+---------------------+--------+
47 41 rows in set (0.00 sec)
48 
49 //There're 41 results of the supported character set.
50 //Each character set has a default collation.
51 //Maxlen is the max bytes of corresponding character set(eg. utf8mb4 supports 4 bytes).

 

Check character set parameters of current MySQL server.
 1 ([email protected] mysql3306.sock)[(none)]>show variables like 'character%';
 2 +--------------------------+----------------------------------------------------------------+
 3 | Variable_name            | Value                                                          |
 4 +--------------------------+----------------------------------------------------------------+
 5 | character_set_client     | utf8                                                           |
 6 | character_set_connection | utf8                                                           |
 7 | character_set_database   | utf8                                                           |
 8 | character_set_filesystem | binary                                                         |
 9 | character_set_results    | utf8                                                           |
10 | character_set_server     | utf8                                                           |
11 | character_set_system     | utf8                                                           |
12 | character_sets_dir       | /usr/local/mysql-5.7.21-linux-glibc2.12-x86_64/share/charsets/ |
13 +--------------------------+----------------------------------------------------------------+
14 8 rows in set (0.00 sec)
15 
16 //character_set_client:It's used by client when connect to servers for requesting data.
17 //character_set_connection:It's used for those literals not have a character set introducer for conversion. 
18 //character_set_database:It's used by default database.The value of "character_set_server" will be inherited if it is not specified.
19 //character_set_filesystem:It's used to interpret string literals refer to file names.
20 //character_set_results:It's used to return query results to the client.
21 //character_set_server:It's the default character set of server.
22 //character_set_system:It's used by server for storing identifiers,the value is utf8 forever.
23 //character_sets_dir:It's the directory where contains the xml files of installed character set.

 

The relationship of  above character set parameters shows below.
浅谈MySQL字符集

 

 Check collation parameters of current MySQL server.
 1 ([email protected] mysql3306.sock)[(none)]>show variables like 'collation%';
 2 +----------------------+-----------------+
 3 | Variable_name        | Value           |
 4 +----------------------+-----------------+
 5 | collation_connection | utf8_general_ci |
 6 | collation_database   | utf8_general_ci |
 7 | collation_server     | utf8_general_ci |
 8 +----------------------+-----------------+
 9 3 rows in set (0.01 sec)
10 
11 //collation_connection:The collation of connection character set.
12 //collation_database:The collation of default database.It will inherite the value of "collation_server" if not specified.
13 //collation_server:The default collation of server.
14 //"ci" means Case Insensitive.

 

Case of change character set from utf8 to latin1 with "set names ...;".
 1 ([email protected] mysql3306.sock)[(none)]>\s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        [email protected]
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    utf8
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 33 min 11 sec
21 
22 Threads: 1  Questions: 52  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.003
23 --------------
24 
25 ([email protected] mysql3306.sock)[(none)]>set names latin1;
26 Query OK, 0 rows affected (0.00 sec)
27 
28 ([email protected] mysql3306.sock)[(none)]>\s
29 --------------
30 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
31 
32 Connection id:        7
33 Current database:    
34 Current user:        [email protected]
35 SSL:            Not in use
36 Current pager:        stdout
37 Using outfile:        ''
38 Using delimiter:    ;
39 Server version:        5.7.21-log MySQL Community Server (GPL)
40 Protocol version:    10
41 Connection:        Localhost via UNIX socket
42 Server characterset:    utf8
43 Db     characterset:    utf8
44 Client characterset:    latin1
45 Conn.  characterset:    latin1
46 UNIX socket:        /tmp/mysql3306.sock
47 Uptime:            4 hours 33 min 18 sec
48 
49 Threads: 1  Questions: 56  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.003
50 --------------
51 
52 ([email protected] mysql3306.sock)[(none)]>select @@character_set_client;
53 +------------------------+
54 | @@character_set_client |
55 +------------------------+
56 | latin1                 |
57 +------------------------+
58 1 row in set (0.00 sec)
59 
60 ([email protected] mysql3306.sock)[(none)]>select @@character_set_connection;
61 +----------------------------+
62 | @@character_set_connection |
63 +----------------------------+
64 | latin1                     |
65 +----------------------------+
66 1 row in set (0.00 sec)
67 
68 ([email protected] mysql3306.sock)[(none)]>select @@character_set_results;
69 +-------------------------+
70 | @@character_set_results |
71 +-------------------------+
72 | latin1                  |
73 +-------------------------+
74 1 row in set (0.00 sec)
75 
76 ([email protected] mysql3306.sock)[(none)]>show variables like '%collation%';
77 +----------------------+-------------------+
78 | Variable_name        | Value             |
79 +----------------------+-------------------+
80 | collation_connection | latin1_swedish_ci |
81 | collation_database   | utf8_general_ci   |
82 | collation_server     | utf8_general_ci   |
83 +----------------------+-------------------+
84 3 rows in set (0.00 sec)
85 
86 //The influence of command "set names latin1" to character set is to change "character_set_client","character_set_connection","character_set_results" into latin1.
87 //The influence of command "set names latin1" to collation is to change "collation_connection" into latin1.

 

Case of change character set from utf8 to latin1 with "set character set ...;".
 1 ([email protected] mysql3306.sock)[(none)]>\s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        [email protected]
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    latin1
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 44 min 8 sec
21 
22 Threads: 1  Questions: 72  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.004
23 --------------
24 
25 ([email protected] mysql3306.sock)[(none)]>select @@character_set_client;
26 +------------------------+
27 | @@character_set_client |
28 +------------------------+
29 | latin1                 |
30 +------------------------+
31 1 row in set (0.00 sec)
32 
33 ([email protected] mysql3306.sock)[(none)]>select @@character_set_connection;
34 +----------------------------+
35 | @@character_set_connection |
36 +----------------------------+
37 | utf8                       |
38 +----------------------------+
39 1 row in set (0.00 sec)
40 
41 ([email protected] mysql3306.sock)[(none)]>select @@character_set_results;
42 +-------------------------+
43 | @@character_set_results |
44 +-------------------------+
45 | latin1                  |
46 +-------------------------+
47 1 row in set (0.00 sec)
48 
49 ([email protected] mysql3306.sock)[(none)]>show variables like '%collation%';
50 +----------------------+-----------------+
51 | Variable_name        | Value           |
52 +----------------------+-----------------+
53 | collation_connection | utf8_general_ci |
54 | collation_database   | utf8_general_ci |
55 | collation_server     | utf8_general_ci |
56 +----------------------+-----------------+
57 3 rows in set (0.00 sec)
58 
59 //The two variables about "connection" was not altered this time.They were still utf8 relevant.

 

Case of change character set of server & database from utf8 to latin1.
 1 ([email protected] mysql3306.sock)[(none)]>\s
 2 --------------
 3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 4 
 5 Connection id:        7
 6 Current database:    
 7 Current user:        [email protected]
 8 SSL:            Not in use
 9 Current pager:        stdout
10 Using outfile:        ''
11 Using delimiter:    ;
12 Server version:        5.7.21-log MySQL Community Server (GPL)
13 Protocol version:    10
14 Connection:        Localhost via UNIX socket
15 Server characterset:    utf8
16 Db     characterset:    utf8
17 Client characterset:    utf8
18 Conn.  characterset:    utf8
19 UNIX socket:        /tmp/mysql3306.sock
20 Uptime:            4 hours 50 min 33 sec
21 
22 Threads: 1  Questions: 92  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.005
23 --------------
24 
25 ([email protected] mysql3306.sock)[(none)]>set character_set_server=latin1;
26 Query OK, 0 rows affected (0.00 sec)
27 
28 ([email protected] mysql3306.sock)[(none)]>set character_set_database=latin1;
29 Query OK, 0 rows affected, 1 warning (0.00 sec)
30 
31 ([email protected] mysql3306.sock)[(none)]>show warnings;
32 +---------+------+-------------------------------------------------------------------------------------------------+
33 | Level   | Code | Message                                                                                         |
34 +---------+------+-------------------------------------------------------------------------------------------------+
35 | Warning | 1681 | Updating 'character_set_database' is deprecated. It will be made read-only in a future release. |
36 +---------+------+-------------------------------------------------------------------------------------------------+
37 1 row in set (0.00 sec)
38 
39 ([email protected] mysql3306.sock)[(none)]>\s
40 --------------
41 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
42 
43 Connection id:        7
44 Current database:    
45 Current user:        [email protected]
46 SSL:            Not in use
47 Current pager:        stdout
48 Using outfile:        ''
49 Using delimiter:    ;
50 Server version:        5.7.21-log MySQL Community Server (GPL)
51 Protocol version:    10
52 Connection:        Localhost via UNIX socket
53 Server characterset:    latin1
54 Db     characterset:    latin1
55 Client characterset:    utf8
56 Conn.  characterset:    utf8
57 UNIX socket:        /tmp/mysql3306.sock
58 Uptime:            4 hours 51 min 0 sec
59 
60 Threads: 1  Questions: 98  Slow queries: 0  Opens: 111  Flush tables: 1  Open tables: 104  Queries per second avg: 0.005
61 --------------
62 
63 //It shows that change "character_set_server" online is not supported in future release because of safety concern.
64 //Change character set of database may bring about risk of data loss if your client program using the supersetwhile database using subset.(eg. client->utf8mb4,database->utf8,will lost emoji data.) 

 

Example of messy code.
  1 ([email protected] mysql3306.sock)[zlm]>create table test_charset(
  2     -> s1 char(10) character set latin1 not null,
  3     -> s2 char(10) char set gbk,
  4     -> s3 varchar(10) charset utf8,
  5     -> s4 varchar(10)) character set=utf8mb4 engine=innodb;
  6 Query OK, 0 rows affected (0.01 sec)
  7 
  8 ([email protected] mysql3306.sock)[zlm]>show create table test_charset;
  9 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 10 | Table        | Create Table                                                                                                                                                                                                                                             |
 11 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 12 | test_charset | CREATE TABLE `test_charset` (
 13   `s1` char(10) CHARACTER SET latin1 NOT NULL,
 14   `s2` char(10) CHARACTER SET gbk DEFAULT NULL,
 15   `s3` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
 16   `s4` varchar(10) DEFAULT NULL
 17 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
 18 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 19 1 row in set (0.00 sec)
 20 
 21 ([email protected] mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情');
 22 Query OK, 1 row affected (0.00 sec)
 23 
 24 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
 25 +-------+--------+--------+--------+
 26 | s1    | s2     | s3     | s4     |
 27 +-------+--------+--------+--------+
 28 | ASCII | 国标   | 万国   | 表情   |
 29 +-------+--------+--------+--------+
 30 1 row in set (0.00 sec)
 31 
 32 ([email protected] mysql3306.sock)[zlm]>\s
 33 --------------
 34 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 35 
 36 Connection id:        9
 37 Current database:    zlm
 38 Current user:        [email protected]
 39 SSL:            Not in use
 40 Current pager:        stdout
 41 Using outfile:        ''
 42 Using delimiter:    ;
 43 Server version:        5.7.21-log MySQL Community Server (GPL)
 44 Protocol version:    10
 45 Connection:        Localhost via UNIX socket
 46 Server characterset:    utf8
 47 Db     characterset:    utf8
 48 Client characterset:    utf8
 49 Conn.  characterset:    utf8
 50 UNIX socket:        /tmp/mysql3306.sock
 51 Uptime:            5 hours 26 min 54 sec
 52 
 53 Threads: 1  Questions: 123  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
 54 --------------
 55 
 56 ([email protected] mysql3306.sock)[zlm]>set names latin1;
 57 Query OK, 0 rows affected (0.00 sec)
 58 
 59 ([email protected] mysql3306.sock)[zlm]>\s
 60 --------------
 61 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 62 
 63 Connection id:        9
 64 Current database:    zlm
 65 Current user:        [email protected]
 66 SSL:            Not in use
 67 Current pager:        stdout
 68 Using outfile:        ''
 69 Using delimiter:    ;
 70 Server version:        5.7.21-log MySQL Community Server (GPL)
 71 Protocol version:    10
 72 Connection:        Localhost via UNIX socket
 73 Server characterset:    utf8
 74 Db     characterset:    utf8
 75 Client characterset:    latin1
 76 Conn.  characterset:    latin1
 77 UNIX socket:        /tmp/mysql3306.sock
 78 Uptime:            5 hours 18 min 0 sec
 79 
 80 Threads: 1  Questions: 114  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.008
 81 --------------
 82 
 83 ([email protected] mysql3306.sock)[zlm]>
 84     
 85 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
 86 +-------+------+------+------+
 87 | s1    | s2   | s3   | s4   |
 88 +-------+------+------+------+
 89 | ASCII | ??   | ??   | ??   |
 90 +-------+------+------+------+
 91 1 row in set (0.00 sec)
 92 
 93 ([email protected] mysql3306.sock)[zlm]>set names gbk;
 94 Query OK, 0 rows affected (0.00 sec)
 95 
 96 ([email protected] mysql3306.sock)[zlm]>\s
 97 --------------
 98 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 99 
100 Connection id:        9
101 Current database:    zlm
102 Current user:        [email protected]
103 SSL:            Not in use
104 Current pager:        stdout
105 Using outfile:        ''
106 Using delimiter:    ;
107 Server version:        5.7.21-log MySQL Community Server (GPL)
108 Protocol version:    10
109 Connection:        Localhost via UNIX socket
110 Server characterset:    utf8
111 Db     characterset:    utf8
112 Client characterset:    gbk
113 Conn.  characterset:    gbk
114 UNIX socket:        /tmp/mysql3306.sock
115 Uptime:            5 hours 29 min 22 sec
116 
117 Threads: 1  Questions: 129  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
118 --------------
119 
120 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
121 +-------+------+------+------+
122 | s1    | s2   | s3   | s4   |
123 +-------+------+------+------+
124 | ASCII | ¹螠   | β¹ | ±     |
125 +-------+------+------+------+
126 1 row in set (0.00 sec)
127 
128 ([email protected] mysql3306.sock)[zlm]>set names utf8mb4;
129 Query OK, 0 rows affected (0.00 sec)
130 
131 ([email protected] mysql3306.sock)[zlm]>\s
132 --------------
133 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
134 
135 Connection id:        9
136 Current database:    zlm
137 Current user:        [email protected]
138 SSL:            Not in use
139 Current pager:        stdout
140 Using outfile:        ''
141 Using delimiter:    ;
142 Server version:        5.7.21-log MySQL Community Server (GPL)
143 Protocol version:    10
144 Connection:        Localhost via UNIX socket
145 Server characterset:    utf8
146 Db     characterset:    utf8
147 Client characterset:    utf8mb4
148 Conn.  characterset:    utf8mb4
149 UNIX socket:        /tmp/mysql3306.sock
150 Uptime:            5 hours 30 min 15 sec
151 
152 Threads: 1  Questions: 134  Slow queries: 0  Opens: 118  Flush tables: 1  Open tables: 111  Queries per second avg: 0.006
153 --------------
154 
155 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
156 +-------+--------+--------+--------+
157 | s1    | s2     | s3     | s4     |
158 +-------+--------+--------+--------+
159 | ASCII | 国标   | 万国   | 表情   |
160 +-------+--------+--------+--------+
161 1 row in set (0.00 sec)
162 
163 //MySQL support define character set on database,table even on a single column.
164 //Messy code will occur when "character_set_result" is subset of the value of character set which has been stored only if it turns back to the value equal or bigger than the stored value.

 

Example of losting data.
  1 ([email protected] mysql3306.sock)[(none)]>\s
  2 --------------
  3 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
  4 
  5 Connection id:        4
  6 Current database:    
  7 Current user:        [email protected]
  8 SSL:            Not in use
  9 Current pager:        stdout
 10 Using outfile:        ''
 11 Using delimiter:    ;
 12 Server version:        5.7.21-log MySQL Community Server (GPL)
 13 Protocol version:    10
 14 Connection:        Localhost via UNIX socket
 15 Server characterset:    utf8
 16 Db     characterset:    utf8
 17 Client characterset:    utf8
 18 Conn.  characterset:    utf8
 19 UNIX socket:        /tmp/mysql3306.sock
 20 Uptime:            1 min 45 sec
 21 
 22 Threads: 2  Questions: 23  Slow queries: 0  Opens: 108  Flush tables: 1  Open tables: 101  Queries per second avg: 0.219
 23 --------------
 24 
 25 ([email protected] mysql3306.sock)[(none)]>set @@character_set_server=latin1;
 26 Query OK, 0 rows affected (0.00 sec)
 27 
 28 ([email protected] mysql3306.sock)[(none)]>set @@character_set_database=latin1;
 29 Query OK, 0 rows affected, 1 warning (0.01 sec)
 30 
 31 ([email protected] mysql3306.sock)[(none)]>set @@character_set_connection=latin1;
 32 Query OK, 0 rows affected (0.00 sec)
 33 
 34 ([email protected] mysql3306.sock)[(none)]>\s
 35 --------------
 36 mysql  Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using  EditLine wrapper
 37 
 38 Connection id:        4
 39 Current database:    
 40 Current user:        [email protected]
 41 SSL:            Not in use
 42 Current pager:        stdout
 43 Using outfile:        ''
 44 Using delimiter:    ;
 45 Server version:        5.7.21-log MySQL Community Server (GPL)
 46 Protocol version:    10
 47 Connection:        Localhost via UNIX socket
 48 Server characterset:    latin1
 49 Db     characterset:    latin1
 50 Client characterset:    utf8
 51 Conn.  characterset:    latin1
 52 UNIX socket:        /tmp/mysql3306.sock
 53 Uptime:            2 min 16 sec
 54 
 55 Threads: 2  Questions: 29  Slow queries: 0  Opens: 108  Flush tables: 1  Open tables: 101  Queries per second avg: 0.213
 56 --------------
 57 
 58 ([email protected] mysql3306.sock)[(none)]>insert into test_charset values('ASCII','国标','万国','表情');
 59 ERROR 1046 (3D000): No database selected
 60 ([email protected] mysql3306.sock)[(none)]>use zlm
 61 Reading table information for completion of table and column names
 62 You can turn off this feature to get a quicker startup with -A
 63 
 64 Database changed
 65 ([email protected] mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情');
 66 Query OK, 1 row affected, 3 warnings (0.01 sec)
 67 
 68 ([email protected] mysql3306.sock)[zlm]>show warnings;
 69 +---------+------+-----------------------------------------------------------+
 70 | Level   | Code | Message                                                   |
 71 +---------+------+-----------------------------------------------------------+
 72 | Warning | 1300 | Invalid utf8 character string: '\xE5\x9B\xBD\xE6\xA0\x87' |
 73 | Warning | 1300 | Invalid utf8 character string: '\xE4\xB8\x87\xE5\x9B\xBD' |
 74 | Warning | 1300 | Invalid utf8 character string: '\xE8\xA1\xA8\xE6\x83\x85' |
 75 +---------+------+-----------------------------------------------------------+
 76 3 rows in set (0.00 sec)
 77 
 78 ([email protected] mysql3306.sock)[zlm]>select @@character_set_results;
 79 +-------------------------+
 80 | @@character_set_results |
 81 +-------------------------+
 82 | utf8                    |
 83 +-------------------------+
 84 1 row in set (0.00 sec)
 85 
 86 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
 87 +-------+--------+--------+--------+
 88 | s1    | s2     | s3     | s4     |
 89 +-------+--------+--------+--------+
 90 | ASCII | 国标   | 万国   | 表情   |
 91 | ASCII | ??     | ??     | ??     |
 92 +-------+--------+--------+--------+
 93 2 rows in set (0.00 sec)
 94 
 95 ([email protected] mysql3306.sock)[zlm]>set @@character_set_results=latin1;
 96 Query OK, 0 rows affected (0.00 sec)
 97 
 98 ([email protected] mysql3306.sock)[zlm]>select @@character_set_results;
 99 +-------------------------+
100 | @@character_set_results |
101 +-------------------------+
102 | latin1                  |
103 +-------------------------+
104 1 row in set (0.00 sec)
105 
106 ([email protected] mysql3306.sock)[zlm]>select * from test_charset;
107 +-------+------+------+------+
108 | s1    | s2   | s3   | s4   |
109 +-------+------+------+------+
110 | ASCII | ??   | ??   | ??   |
111 | ASCII | ??   | ??   | ??   |
112 +-------+------+------+------+
113 2 rows in set (0.00 sec)
114 
115 //The data of first row has been correctly resored in database.
116 //The data of second row has lost the Chinese character data.
117 //The value of "character_set_result" only influence the screen output.
118 //There's a data loss risk while value of character set of client is superset of the one of database.

 

Summary
  • MySQL character set is flexible and various,be more careful when modify data.
  • The parameter "default_character_set" only affect original mysql client not for the other client tools.
  • Make sure your character set of client is a subset but superset of the value of database when modifying data.Meanwhile it needs to be small than character set of connection to avoid data loss.
  • It's recommended to set character set of database to a big set such as utf8 even utf8mb4 to be compatible with most characters of various languages.