MySQL's utf8 character set is a misnomer — it only stores up to 3 bytes per character (BMP only), which excludes emoji, some CJK extension characters, and several mathematical symbols. utf8mb4 is the correct 4-byte UTF-8 implementation and is the right choice for any table that will hold user-generated content. Failing to use it causes silent data truncation on emoji and error ERROR 1366: Incorrect string value for 4-byte characters.
Impact
Inserting a 4-byte Unicode character (any emoji, many special symbols) into a utf8 column silently truncates or errors depending on sql_mode. Columns declared as utf8 but receiving utf8mb4 data produce different behavior in different client drivers — some truncate silently, others raise an exception. Collation mismatches between columns in a JOIN produce ERROR 1267: Illegal mix of collations and prevent index use.
Common Causes
- Database or table created with the default character set when
character_set_serverwaslatin1orutf8 - ORM that creates tables without specifying charset, inheriting the server default
- Migrating from MySQL 5.7 (where
utf8was the historical default for many installations) to 8.0 without converting data - JOINing a
utf8column against autf8mb4column (different charsets prevent index use) utf8mb4_unicode_civs.utf8mb4_general_civs.utf8mb4_0900_ai_cimismatches across tables that are JOINed
Checking Current Character Sets
-- Server defaults
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
-- Database defaults
SELECT schema_name, default_character_set_name, default_collation_name
FROM information_schema.schemata
WHERE schema_name = 'mydb';
-- Table and column character sets
SELECT
table_name,
table_collation
FROM information_schema.tables
WHERE table_schema = 'mydb';
SELECT
column_name, character_set_name, collation_name, column_type
FROM information_schema.columns
WHERE table_schema = 'mydb'
AND table_name = 'users'
AND character_set_name IS NOT NULL;
Converting to utf8mb4
Set the server default for new databases and tables:
[mysqld] character-set-server = utf8mb4 collation-server = utf8mb4_0900_ai_ci # MySQL 8.0 default — fast, Unicode 9.0 awareOr at runtime (for new connections and new objects):
SET GLOBAL character_set_server = 'utf8mb4'; SET GLOBAL collation_server = 'utf8mb4_0900_ai_ci';Convert an existing database:
ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_ci;This changes the default for new tables created in this database but does not convert existing tables or columns.
Convert existing tables (non-blocking for InnoDB with ALGORITHM=INPLACE in many cases):
-- Convert the table and all text columns ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;CONVERT TOrewrites allCHAR,VARCHAR,TEXT,ENUM, andSETcolumns. This is a full table rebuild (ALGORITHM=COPY) if the column byte widths change (which they do forutf8→utf8mb4onVARCHAR). For large tables, usept-online-schema-changeorgh-ost.For large tables, convert column by column with online DDL:
-- ALGORITHM=INPLACE may work for column charset changes in MySQL 8.0 ALTER TABLE posts MODIFY COLUMN body TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, ALGORITHM=INPLACE, LOCK=NONE;If INPLACE is not supported for this operation, the statement will error (safe — nothing changes).
Generate conversion statements for all tables in a database:
SELECT CONCAT( 'ALTER TABLE `', table_schema, '`.`', table_name, '` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;' ) FROM information_schema.tables WHERE table_schema = 'mydb' AND table_collation NOT LIKE 'utf8mb4%';
Choosing the Right Collation
Collation comparison:
Collation Speed Case Accent Unicode version Notes utf8mb4_general_ciFastest insensitive insensitive Basic Legacy; avoid for new tables utf8mb4_unicode_ciFast insensitive insensitive Unicode 4.0 Good compatibility utf8mb4_0900_ai_ciFastest insensitive insensitive Unicode 9.0 MySQL 8.0 default; best choice for new tables utf8mb4_0900_as_csFast sensitive sensitive Unicode 9.0 For case/accent-sensitive lookups utf8mb4_binFast sensitive sensitive Binary Byte-exact comparison For most applications:
utf8mb4_unicode_ci(MySQL 5.7 compat) orutf8mb4_0900_ai_ci(MySQL 8.0 only).
Connection Character Set
- Ensure the client connection uses utf8mb4:
In application drivers:SET NAMES utf8mb4; -- or equivalently: SET character_set_client = utf8mb4; SET character_set_results = utf8mb4; SET character_set_connection = utf8mb4;# JDBC jdbc:mysql://host/db?useUnicode=true&characterEncoding=UTF-8 # Python (mysql-connector-python) charset='utf8mb4' # SQLAlchemy create_engine("mysql+pymysql://user:pass@host/db?charset=utf8mb4") # Node (mysql2) charset: 'utf8mb4'
Index Length Considerations
utf8mb4uses up to 4 bytes per character. AVARCHAR(191)inutf8mb4is 191 × 4 = 764 bytes — within InnoDB's 767-byte index key limit on older row formats. For MySQL 5.7 withCOMPACTrow format:-- If you hit "Specified key was too long; max key length is 767 bytes": SHOW VARIABLES LIKE 'innodb_large_prefix'; -- MySQL 5.7 -- Enable large prefix if not on SET GLOBAL innodb_large_prefix = ON; SHOW VARIABLES LIKE 'innodb_file_format'; SET GLOBAL innodb_file_format = Barracuda;In MySQL 8.0 with
DYNAMICrow format (the default), the index key limit is 3072 bytes —VARCHAR(767)in utf8mb4 is fine.For
TEXTandBLOBcolumns, prefix indexes are required:ALTER TABLE articles ADD INDEX idx_title (title(191));
Fixing Collation Mismatch Errors
- Error 1267: Illegal mix of collations appears when JOINing or comparing columns with different charsets or collations:
-- Example: users.email is utf8mb4_unicode_ci but logins.email is utf8_general_ci SELECT u.* FROM users u JOIN logins l ON u.email = l.email; -- ERROR 1267: Illegal mix of collations -- Fix: COLLATE cast in the query (temporary) ON u.email = l.email COLLATE utf8mb4_unicode_ci -- Permanent fix: convert the column ALTER TABLE logins MODIFY COLUMN email VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Additional Information
- MySQL 8.0 sets
utf8mb4as the server default. New installations on 8.0 are correct out of the box; only upgrades from 5.7 or older need conversion. - The
utf8alias in MySQL isutf8mb3internally (confirmed in MySQL 8.0.28). Usingutf8in new code works but issues a deprecation warning in MySQL 8.0.28+ and will be removed in a future release. Always useutf8mb4explicitly. ENUMandSETcolumns also have character sets and must be converted explicitly.- Converting tables with
CONVERT TOis safe for data that was correctly stored as valid Unicode. Data stored aslatin1that represents actual binary data (image bytes, etc.) may be corrupted by charset conversion — inspect before converting. - Amazon Aurora and RDS respect the
character_set_serverparameter group setting, which defaults toutf8mb4on newer parameter groups.
Frequently Asked Questions
Q: After converting my table, emoji are still truncated. Why?
A: Check the connection charset. Even if the column is utf8mb4, if the client connection is negotiated as utf8 (3-byte), the data is truncated before reaching the server. Verify with SHOW SESSION VARIABLES LIKE 'character_set%'.
Q: Will CONVERT TO CHARACTER SET utf8mb4 corrupt existing data?
A: No, if the existing data is valid UTF-8. utf8mb4 is a superset of MySQL's utf8 — all characters that fit in 3 bytes are identical in both charsets. Only 4-byte characters are new to utf8mb4.
Q: My VARCHAR(255) index fails after converting to utf8mb4. What happened?
A: On MySQL 5.7 with the COMPACT row format, utf8mb4 × 255 characters × 4 bytes = 1020 bytes — over the 767-byte key limit. Solutions: enable innodb_large_prefix (5.7), switch to DYNAMIC row format, or reduce the column length to VARCHAR(191).
Q: Is utf8mb4_0900_ai_ci backward compatible with utf8mb4_unicode_ci?
A: For most strings, yes — they sort and compare identically for common characters. However, Unicode 9.0 (0900) makes different ordering decisions for some edge cases and handles more Unicode codepoints. Do not mix them in a JOIN predicate without a COLLATE cast.
Q: I changed the server collation but existing rows still compare with the old collation. Why?
A: The server/database collation only affects new objects. Existing columns retain their declared collation. You must ALTER TABLE ... CONVERT TO or MODIFY COLUMN to change the collation on existing data.