Archive
Bulk insert into a MySql Database
At RezzMap we use Java and MySQL for the RezzMap application and have the need to bulk insert lots of records into a large table. These inserts have started to become a bottleneck in the processing of the data required to run the service. So I started to look for a better way. Up until this point we were doing the classic batch inserts using a PreparedStatment and executeBatch. But even after adding the parameter rewriteBatchedStatements to our JDBC url the inserts that were too slow. So after reading all over the place that LOAD DATA INFILE was “up to 20 times faster” than batched inserts I was considering doing that. In looking at this solution I wasn’t really thrilled about writing the data out to temporary file and then running SQL to pull from that file. So then did a little digging around and found an old announcement by MySql of a, at the time, new method setLocalInfileInputStream (included in Connector/J 5.1.3 and later):
* New methods on com.mysql.jdbc.Statement: setLocalInfileInputStream() and
getLocalInfileInputStream():
* setLocalInfileInputStream() sets an InputStream instance that will be used
to send data to the MySQL server for a "LOAD DATA LOCAL INFILE"
statement rather than a FileInputStream or URLInputStream that represents the path given as an argument to the statement.
So after reading this the path seemed easy: build a InputStream that has a tab-delimited set of data that I can pass to this method and then call LOAD DATA LOCAL INFILE and get all of the benefits of the bulk loader. To get this working I took the path of least resistance and built up a String using StringBuilder and then used the method toInputStream(String input) of class org.apache.commons.io.IOUtils to convert my String to an InputStream. After doing this I compared the two methods of inserting data on my wimpy Dell Laptop and got the following results:
Here is what a stripped down version of this method looks like:
import com.mysql.jdbc.Connection;
import com.mysql.jdbc.Statement;
import org.apache.commons.io.IOUtils;
import java.io.InputStream;
import java.lang.reflect.Method;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;
public class DemoBulkInsert {
public void bulkInsert(Connection conn, Long personId, HashMap<String, String> hashOfNameValues) throws SQLException {
// First create a statement off the connection and turn off unique checks and key creation
Statement statement = (com.mysql.jdbc.Statement)conn.createStatement();
statement.execute("SET UNIQUE_CHECKS=0; ");
statement.execute("ALTER TABLE real_big_table DISABLE KEYS");
// Define the query we are going to execute
String statementText = "LOAD DATA LOCAL INFILE 'file.txt' " +
"INTO TABLE real_big_table " +
"(name, value) " +
" SET owner_id = " + personId + ", " +
" version = 0; ";
// Create StringBuilder to String that will become stream
StringBuilder builder = new StringBuilder();
// Iterate over map and create tab-text string
for (Map.Entry<String, String> entry : hashOfNameValues.entrySet()) {
builder.append(entry.getKey());
builder.append('\t');
builder.append(entry.getValue());
builder.append('\n');
}
// Create stream from String Builder
InputStream is = IOUtils.toInputStream(builder.toString());
// Setup our input stream as the source for the local infile
statement.setLocalInfileInputStream(is);
// Execute the load infile
statement.execute(statementText);
// Turn the checks back on
statement.execute("ALTER TABLE affinity ENABLE KEYS");
statement.execute("SET UNIQUE_CHECKS=1; ");
}
}