Archive

Archive for March, 2010

Bulk insert into a MySql Database

March 23, 2010 10 comments

At RezzMap we use Java and MySQL for the RezzMap application and have the need to bulk insert lots of records into a large table.  These inserts have started to become a bottleneck in the processing of the data required to run the service.  So I started to look for a better way.  Up until this point we were doing the classic batch inserts using a PreparedStatment and executeBatch.  But even after adding the parameter rewriteBatchedStatements to our JDBC url the inserts that were too slow.  So after reading all over the place that LOAD DATA INFILE was “up to 20 times faster” than batched inserts I was considering doing that.  In looking at this solution I wasn’t really thrilled about writing the data out to temporary file and then running SQL to pull from that file.  So then did a little digging around and found an old announcement by MySql of a, at the time, new method setLocalInfileInputStream (included in Connector/J 5.1.3 and later):

* New methods on com.mysql.jdbc.Statement: setLocalInfileInputStream() and
getLocalInfileInputStream():

    * setLocalInfileInputStream() sets an InputStream instance that will be used
     to send data to the MySQL server for a "LOAD DATA LOCAL INFILE"
     statement rather than a FileInputStream or URLInputStream that represents the path given as an argument to the statement.

So after reading this the path seemed easy: build a InputStream that has a tab-delimited set of data that I can pass to this method and then call LOAD DATA LOCAL INFILE and get all of the benefits of the bulk loader.  To get this working I took the path of least resistance and built up a String using StringBuilder and then used the method toInputStream(String input) of class org.apache.commons.io.IOUtils to convert my String to an InputStream.  After doing this I compared the two methods of inserting data on my wimpy Dell Laptop and got the following results:

image

Here is what a stripped down version of this method looks like:

import com.mysql.jdbc.Connection;
import com.mysql.jdbc.Statement;
import org.apache.commons.io.IOUtils;
import java.io.InputStream;
import java.lang.reflect.Method;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;
public class DemoBulkInsert {
    public void bulkInsert(Connection conn, Long personId, HashMap<String, String> hashOfNameValues) throws SQLException {

            // First create a statement off the connection and turn off unique checks and key creation
            Statement statement = (com.mysql.jdbc.Statement)conn.createStatement();
            statement.execute("SET UNIQUE_CHECKS=0; ");
            statement.execute("ALTER TABLE real_big_table DISABLE KEYS");

            // Define the query we are going to execute
            String statementText = "LOAD DATA LOCAL INFILE 'file.txt' " +
                    "INTO TABLE real_big_table " +
                    "(name, value) " +
                    " SET owner_id = " + personId + ", " +
                    " version = 0; ";

            // Create StringBuilder to String that will become stream
            StringBuilder builder = new StringBuilder();

            // Iterate over map and create tab-text string
            for (Map.Entry<String, String> entry : hashOfNameValues.entrySet()) {
                builder.append(entry.getKey());
                builder.append('\t');
                builder.append(entry.getValue());
                builder.append('\n');
            }

            // Create stream from String Builder
            InputStream is = IOUtils.toInputStream(builder.toString());

            // Setup our input stream as the source for the local infile
            statement.setLocalInfileInputStream(is);

            // Execute the load infile
            statement.execute(statementText);

            // Turn the checks back on
            statement.execute("ALTER TABLE affinity ENABLE KEYS");
            statement.execute("SET UNIQUE_CHECKS=1; ");
    }
}

Follow

Get every new post delivered to your Inbox.