MySQL Design – Concurrent Inserts or Write to Separate Tables and Consolidate?

apache-stormdesignMySQL

I'm working on Apache Storm (but anyone who knows MySQL well could help too). Topology is like this:

Single spout —-emit—> Multiple instances of a Bolt

Each instance of the Bolt inserts a batch of rows to table ABC. The database table is located on a single server. The Bolt instances could be distributed across multiple servers.

Objective: To be able to speed up and scale the program by increasing the number of Bolt instances (ie: if processing x amount of data and writing to MySQL took 1 hour with 5 Bolts, it should take maybe 35 minutes if I use 10 Bolts).

Problem:
1. No matter how many Bolts I use, the processing time will reduce, but each Bolt will have to wait for the other n-1 Bolts to finish inserting, before it can insert (not because the program logic makes it wait, but because SQL does not allow a process/bolt to write to SQL while another process/bolt is writing to the same table). This waiting time does not allow the program to scale.
2. A long wait time will cause Zookeeper to timeout.
3. Before this topology starts, the table ABC already has existing data, and whatever new inserts are done will eventually have to be added to table ABC.

A solution I thought of:
To let each Bolt write to its own temporary table
ABC_TempBolt1
ABC_TempBolt2
ABC_TempBolt3
ABC_TempBolt4
ABC_TempBolt5
and when the Spout is finished, use SQL's INSERT INTO command to take contents of the Temp tables and insert it into ABC.

Is this the most efficient && fast && scalable way to accomplish this or is there a better design || technique?

ps: If there are other databases (even noSQL DB's) which can help speed up the application by allowing concurrent inserts, I'm open to the idea.

Best Answer

If you have the locking mechanism in place on the table in the database, then the concurrent writing in it is not possible, so there is no other way to parallelize the writing. Simply, the database is the bottleneck. The only gain that you get is that the processing done by bolts before the data is written into database is done in parallel.

Therefore, the approach you used is pretty much optimal. The only way I see that you could speed up the inserts is by using batch inserts and by tweaking the database so that it is optimized for what you are trying to do. Take a look at the following links:

Bulk Data Loading for InnoDB Tables

Insert Speeds for large batches

As far as concurrent inserts, this might help:

MySQL Concurrent Inserts

I believe this would be of interest to you:

If there are multiple INSERT statements, they are queued and performed in sequence

Basically, you cannot insert two records simultaneously. They will always be queued, which means that the locking mechanism is in place implicitly.

Hope this helps.

Related Topic