Sunday, June 15, 2008

Querying Big Datasets with MySql and Java

I received this error when calling the executeQuery() method on a PreparedStatement object. You'll note that within the JDBC MySql code, there is a method invoked called "readAllResults." It appears that the default behavior of the executeQuery method is to read all data records into memory before returning! This simply will not do when dealing with large datasets.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
...
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
...
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1885)


The solution was non-intuitive but an easy find with Google. From MySQL JDBC Memory Usage on Large ResultSet:

If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, you need to create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.

There are a few caveats, which are described in the following links:

1 comment:

nuria said...

cool. happy to hear that you found the fix! n