Good question - well explained.
how can I tune MySQL to increase the speed of the LOAD DATA INFILE call?
You've already got a high(ish) setting for the key buffer - but is it enough? I'm assuming this is a 64-bit installation (if not then the first thing you need to do is upgrade) and not running on MSNT. Have a look at the output of mysqltuner.pl after running a few tests.
In order to use the cache to best effect, you may find benefits in batching/pre-sorting the input data (most recent versions of the 'sort' command have a lot of functionality for sorting large datasets). Also if you generate the ID numbers outside of MySQL, then it may be more efficient.
would using a cluster of computers to load different csv files
Assuming (again) that you want to have the output set behave as a single table, then the only benefits you'll get are by distributing the work of sorting and generating ids - which you don't need more databases for. OTOH using a database cluster, you will get problems with contention (which you shouldn't see other than as performance problems).
If you can shard the data and handle the resulting datasets independently, then yes, you will get performance benefits - but this does not negate the the need to tune each node.
Check you've got at least 4 Gb for the sort_buffer_size.
Beyond that, the limiting factor on performance is all about disk I/O. There's lots of ways to address this - but you should probably be considering a mirrored set of striped datasets on SSDs for optimal performance.
This has probably to do with table_cache
and/or max_connections
.
With MyISAM tables each new thread/connection needs to open the table. In other words, if you have one table and ten clients accessing that table, you have ten open tables according to MySQL status line. The table_cache
helps to reduce that constant open/close table work from happening by keeping tables open for threads.
Your table cache seems to be working quite well, since there has been 43 041 queries, but tables were actually needed to open only 2605 times.
InnoDB has its own connection pool and behaves a bit differently.
For more information, see how MySQL opens and closes tables.
Best Answer
Have you found the MySQL documentation? It's extensive and useful.
When you run
OPTIMIZE TABLE
against InnoDB tables it outputs theTable does not support optimize, doing recreate + analyze instead
message.From the documentation:
MySQL 5.1 OPTIMIZE TABLE syntax