How to Fix MySQL Error 1300: Invalid Character String (UTF-8)

ERROR 1300 (HY000): Invalid utf8 character string: '<hex bytes>' is raised when MySQL encounters a string value whose byte sequence is not valid for the column's or connection's declared character set. The error symbol is ER_INVALID_CHARACTER_STRING.

Impact

The statement that triggered the error is aborted and no rows are written or modified. In strict SQL mode (STRICT_TRANS_TABLES or STRICT_ALL_TABLES), MySQL raises this as a hard error — the transaction is left open but the offending statement is rolled back. Outside of strict mode some older MySQL versions may silently replace invalid sequences with a replacement character, but this behavior is unreliable and should not be relied upon.

Developers most often encounter this error when inserting user-supplied text that was encoded in Latin-1 or Windows-1252 but declared as utf8 or utf8mb4, when loading data from files with mixed encodings, or when a client application does not set the connection character set before sending queries. ORMs such as SQLAlchemy, ActiveRecord, and Hibernate surface this as a database-level exception because the rejection happens inside MySQL, not in the driver.

Common Causes

  1. Mismatched connection encoding. The MySQL client connection is set to latin1 (MySQL's default in older versions) while the column or table uses utf8mb4. Characters above the ASCII range are encoded differently in each charset and the bytes are invalid in the other.

  2. Legacy utf8 instead of utf8mb4. MySQL's utf8 charset only supports the BMP (Basic Multilingual Plane, up to U+FFFF). Emoji and other characters above U+FFFF require 4-byte sequences that are only valid in utf8mb4. Inserting them into a utf8 column raises error 1300.

  3. Importing a file with a different encoding. A CSV or SQL dump exported with latin1 or cp1252 encoding contains byte sequences (e.g., 0xe9 for é) that are single-byte in Latin-1 but are invalid incomplete sequences in UTF-8.

  4. Hardcoded binary string literals. A _utf8 or _utf8mb4 introducer prefix on a string literal that contains raw non-UTF-8 bytes will cause this error immediately at parse time.

  5. Application-level string truncation or corruption. Middleware, proxies, or application code that truncates multi-byte UTF-8 sequences mid-character before writing to the database produces invalid byte sequences.

Troubleshooting and Resolution Steps

  1. Check the current connection character set.

    SHOW VARIABLES LIKE 'character_set%';
    SHOW VARIABLES LIKE 'collation%';
    

    If character_set_client, character_set_connection, or character_set_results is latin1, the connection encoding does not match your schema. Fix it by adding SET NAMES utf8mb4; at the start of every session, or configure it in the DSN.

  2. Set the connection charset in your application or DSN.

    For MySQL command-line:

    SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;
    

    For a JDBC URL:

    jdbc:mysql://host/db?useUnicode=true&characterEncoding=UTF-8
    

    For Python (mysql-connector):

    cnx = mysql.connector.connect(charset='utf8mb4', ...)
    
  3. Check whether the target column uses utf8 instead of utf8mb4.

    SHOW CREATE TABLE your_table\G
    

    If the column or table uses CHARACTER SET utf8, emoji and 4-byte characters will fail. Migrate it:

    ALTER TABLE your_table
      MODIFY your_column TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    

    Or convert the whole table at once:

    ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    
  4. Convert your import file to UTF-8 before loading.

    On Linux/macOS:

    iconv -f latin1 -t utf-8 input.csv > input_utf8.csv
    

    Then load the converted file:

    LOAD DATA INFILE '/path/to/input_utf8.csv'
      INTO TABLE your_table
      CHARACTER SET utf8mb4
      FIELDS TERMINATED BY ',';
    
  5. Identify the invalid bytes in the offending string. The error message includes a snippet of the hex bytes that failed (e.g., Invalid utf8 character string: 'E9636F'). You can decode these to pinpoint where in the input the encoding breaks and trace it back to the source system.

  6. Validate data already in the database. Columns that were written with a mismatched charset may store garbage bytes. Use HEX() to inspect suspicious rows:

    SELECT id, HEX(your_column) FROM your_table WHERE your_column LIKE '%?%';
    

Additional Information

  • Related error: 1366 ER_TRUNCATED_WRONG_VALUE_FOR_FIELD — raised for similar encoding problems but typically on numeric type coercion. Error 1300 is specific to character-set validation of string values.
  • utf8 vs utf8mb4: MySQL's utf8 is a 3-byte-only subset of real UTF-8. Use utf8mb4 for all new schemas. As of MySQL 8.0, utf8mb4 is the default charset, which eliminates most occurrences of this error in fresh deployments.
  • Strict mode: STRICT_TRANS_TABLES is enabled by default from MySQL 5.7.5 onward. Without it, MySQL 5.6 and earlier may silently drop or mangle the invalid character instead of raising 1300 — which can cause data loss that is harder to detect.
  • my.cnf server default: Set character-set-server = utf8mb4 and collation-server = utf8mb4_unicode_ci in [mysqld] to ensure new tables default to utf8mb4 without requiring explicit per-table declarations.
  • ORM configuration: Hibernate requires hibernate.connection.charSet=UTF-8; SQLAlchemy needs create_engine(..., connect_args={"charset": "utf8mb4"}; ActiveRecord uses encoding: utf8mb4 in database.yml.

Frequently Asked Questions

Why does the error say utf8 even when my column is utf8mb4? The error is triggered at the connection/session layer before MySQL even looks at the column definition. If your client sends bytes over a latin1 connection, MySQL interprets them as Latin-1 and then tries to convert them to the column's charset — that conversion fails. Setting SET NAMES utf8mb4 on connect tells MySQL the bytes arriving from the client are already valid UTF-8.

What is the difference between MySQL's utf8 and utf8mb4? MySQL's utf8 charset was originally implemented as a 3-byte variant and cannot store Unicode code points above U+FFFF (which includes all emoji, some CJK extension characters, and mathematical symbols). utf8mb4 is the full 4-byte UTF-8 implementation that covers the entire Unicode range. For any production schema you should use utf8mb4.

Can I convert an existing database from utf8 to utf8mb4 without downtime? You can run ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4 on a live table, but it acquires a metadata lock and rewrites the table. For large tables, use an online schema change tool such as pt-online-schema-change or gh-ost to minimize locking impact.

Why did this start happening after I upgraded from MySQL 5.6 to 5.7? MySQL 5.7 enables STRICT_TRANS_TABLES in the default sql_mode, which promotes previously silent encoding mismatches into hard errors. The data or encoding mismatch existed before the upgrade — you are now seeing failures that were previously silently corrupting data.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.