Create and populate your database independent of the db server using ant, dbunit and hibernate

Engineering Insights & Enterprise solutions

Database creation is such a common task that you must have some scripts and tips and tricks at hand, in order to have it done as quickly as possible. What about when there’s more than one type of RDBMS in your project that you need to handle. Things get a little messy.
How about using just one XML file and two or three libraries, regardless of the RDBMS type. Wouldn’t that be cool?
I’m saying, all you need is ant, dbunit, and Hibernate as libraries and a couple of files for data and database connection configuration. With this, you’ll be able to populate with only one file init-dataset.xml different rdbms like Oracle, MySql, MsSql, Postgres. For a complete list of supported rdbms have a look at dbunit SupportedRDBMS
Database creation and population process are in four steps:
1. Create an ant build.xml file and set up a project with a path for used libraries
2. Generate DDL from Hibernate mapping
3. Create database schema running generated DDL
4. With DBUnit populate database created above, providing as input a seed XML file with data you’re database must be populated with.

<project name=”OMS” basedir=”.” default=”deploy-reload-classes”>
<property file=”web/WEB-INF/classes/db.properties“/>
<property name=”ddl.file” value=”ddl.sql”/>
<property name=”lib.dir” value=”lib”/>

<target name=”hbm2ddl-schema” description=”Generate DDL from Hibernate mapping”     depends=”build”>
<java classname=”net.sf.hibernate.tool.hbm2ddl.SchemaExport” fork=”true”>

<!– Hibernate Properties –>
<jvmarg value=”-Dhibernate.dialect=${databaseDialect}”/>

<!– Mapping Files –>
<arg value=”ObjectModel.hbm.xml”/>

…

<classpath refid=”project.classpath”/>

</java>
</target>

<target name=”sql-run-ddl” description=”Run DDL script” depends=”hbm2ddl- schema”>
<sql url=”${databaseURL}” driver=”${databaseDriver}”
userid=”${databaseUser}” password=”${databasePassword}”
onerror=”continue” autocommit=”true” print=”yes”>
<classpath refid=”project.classpath”/>
<transaction src=”${ddl.file}”/>
</sql>
</target>

all database-related variables should be defined into a properties file, I called it db.properties, see step 1.

<target name=”dbunit-init-db” description=”using dbunit for testing” depends=”sql-run-ddl”>
<dbunit driver=”${databaseDriver}”
url=”${databaseURL}”
userid=”${databaseUser}”
password=”${databasePassword}”>
<operation type=”DELETE”
src=”db/defaults/init-dataset.xml”/>
<operation type=”INSERT” format=”flat”
src=”db/defaults/init-dataset.xml”/>
</dbunit>
</target>

after initializing the database we can populate it.

For a simple database setup, the script above should do it.
But sometimes we need to initialize more complex databases, where we have some foreign key circular dependency.

Some Tips & Tricks

Unfortunately, dbunit is not perfect, meaning it has some bugs. One is that it doesn’t handle too well circular references between tables, and the second is when a DB is initialized. At initialization operation, in the seed dataset file each new row for each new table must have all fields populated, otherwise unpopulated fields (from the first row, but populated in subsequent rows) will be set to null;
Here are some workarounds.
1. instead of using a row with the missing column, use one with complete columns.
For example:
using

<dataset>
<user ID=’1′ />
<user ID=’2′ NAME=’user name’/>
</dataset>

will insert null in all NAME columns of the user table. The solution is to use

<dataset>
<user ID=’2′ NAME=’user name’ />
<user ID=’1′ />
</dataset>

1. instead of using INSERT operation, use REFRESH operation. At REFRESH operation, data of existing rows are updated and non-existing rows get inserted. For clean initialization, perform CLEAN before.

<dataset>
<user ID=’1′ col2=’val1’/>
<user ID=’2′ NAME=’user name’ col2=’val2’/>
<!– if met twice id=2, than the second time it means there’s an update –>
<user ID=’2′ NAME=’user name’ />
</dataset>

In the update part, you must update only fields that you know were set up to null.
The second method is also useful when you need to populate a db with data that contains circular reference. Let’s suppose we have users and portfolios. And for some reason we would like to have a reference of an portfolio into user’s table, and a reference of user into portfolio table. The data will look something like this:

<dataset>
<user ID=’1′ name=’default user’ portfolio_id=’1′ />
<portfolio ID=’1′ user_id=’1′ name=”default portfolio”/>
</dataset>

If we’ll use the dataset above with INSERT operation, dbunit will report an error, that portfolio with id=1 doesn’t exist.

Using the REFRESH operation, with an appropriate dataset the problems will disappear.

<dataset>
<user ID=’1′ name=’default user’ />
<portfolio ID=’1′ user_id=’1′ name=”default portfolio”/>
<!– if met twice id=2, than the second time it means there’s an update –>
<user ID=’2′ NAME=’user name’ portfolio_id=’1’/>
</dataset>

As you can see portfolio_id is missing from the first row. So dbunit gets fooled. The next record will be inserted with reference to the user table. And the last record will update the reference of the user record to the portfolio record.

The bottom line: in spite of its imperfections, dbunit has reached the 2.2 version after more than one-year breaking, it can be used very efficiently for other purposes than database testing. A database setup operation independent of the RDBMS type can be done easily enough with DBUnit. I hope you find this interpretation useful.

Good luck!