How to set the default charset to utf-8 for creating table when using hibernate with java persistence annotations

4 min read >

Setting default charset to utf-8 for tables using hibernate with JPA annotations

Engineering Insights & Enterprise solutions

Yesterday I encountered a problem when trying to persist a String value into a MySQL column of type ‘text‘ (the problem also occurs for column types ‘tinytext‘, ‘mediumtext‘ etc.)

The first confusing thing was that the error message returned by MySQL was
java.sql.BatchUpdateException: Data truncation: Data too long for column 'my_column'

The confusing thing was that the column was of type ‘text‘ (length=65536), and I then changed it to type ‘mediumtext‘ (length=16777215), but the value that I was trying to persist had only around 6000 characters, so it was not a problem with the data length.
Instead, the one thing that was noticeable about my String value was that it contained non-ASCII characters (particularly Romanian characters with diacritics).

After a little search on the web, I found the following MySQL bug description http://bugs.mysql.com/bug.php?id=17872
A comment on that page ([8 Mar 2006 9:53] [ name withheld ]) also contains some more links to similar bug descriptions.

Now it was a little clearer that the problem was caused by the Romanian characters (and the error message was just stupid), so the next thing to do was to figure out why the database table didn’t store UTF-8 characters.

Although I created the database with
create database mydb character set utf8 collate utf8_general_ci;
still, the tables created inside it had the collation latin1_swedish_ci.

My application uses Hibernate for persistence and uses java persistence annotations to specify the hibernate mappings.

In my case the code is something like this:


import javax.persistence.*;

@Entity
@Table(name = "my_table")
public class MyClass implements Serializable {
    private String myValue;

    @Column(name = "my_column", columnDefinition = "mediumtext", length = 16777215)
    public String getMyValue() {
        return this.myValue;
    }

    public void setMyValue(String value) {
        this.myValue = value;
    }
}

The database schema is automatically generated based on the classes and hibernate mappings, by using the following ant target:


    <taskdef name="hibernatetool"
             classname="org.hibernate.tool.ant.HibernateToolTask"
             classpathref="project.classpath"/>

    <target name="hbm2ddl-schema"
            description="Generates the database schema from hibernate mappings">
        <hibernatetool destdir="">
            <classpath refid="project.classpath"/>
            <annotationconfiguration configurationfile="${classes.dir}/hibernate.cfg.xml"/>
            <hbm2ddl export="true" drop="false" create="true" haltonerror="true"/>
        </hibernatetool>
    </target>

If I had created the database tables by writing the SQL DDL by hand, I would have used the following script:


  create table `my_table` (
      `Id` int(11) NOT NULL auto_increment,
      `my_column` mediumtext NOT NULL default ”,
      ......
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

But in my case, the database schema is automatically generated by HibernateTools, so I have to specify the default charset for the table somewhere in the @Table annotation or hibernate configuration, or hibernatetool parameters.

After trying a few options (and I must admit I did not search thoroughly, so if you know of some better way to specify the default charset, please leave a comment), I chose the following solution:
I implemented my own custom org.hibernate.dialect.Dialect, by subclassing org.hibernate.dialect.MySQL5InnoDBDialect:


import org.hibernate.dialect.MySQL5InnoDBDialect;

/**
 * Extends MySQL5InnoDBDialect and sets the default charset to be UTF-8
 * @author Sorin Postelnicu
 * @since Aug 13, 2007
 */
public class CustomMysqlDialect extends MySQL5InnoDBDialect {

    public String getTableTypeString() {
        return " ENGINE=InnoDB DEFAULT CHARSET=utf8";
    }
}

and used it in my hibernate.cfg.xml:


<hibernate-configuration>
    <session-factory>
        <property name="hibernate.dialect">my.package.CustomMysqlDialect</property>
        .....
    </session-factory>
</hibernate-configuration>