Centos – MySQL Fulltext 3 letter limit

centosfull-text-searchMySQL

We have a site thats using MySQL fulltext searching but has an issue that a number of strings that need to be indexed are 3 characters long. I know that the default for MySQL is not to index strings of this length but I also know that it is possible to override this in my.cnf. I assume the reason for the limit is primarily performance based so i'm not sure if this is something we should consider changing? What options do we have here? There are a number of sites on this server so any performance degredation could cause issues to a number of sites.

mysql version: 5

Best Answer

The reason is performance based, yes. Lowering the default limit will increase the space required to store the indexes, and the increased index size takes longer to search. Impact will depend upon usage (the type of queries performed) and size of current data set. The default minimum is 4, you can lower it like so:

[mysqld]
ft_min_word_len=3

When you rebuild your indexes (as you must) be sure not to repair, but drop and rebuild the indexes. This is considerably quicker than repairing them.

mysql> ALTER TABLE tbl_name DROP INDEX ft_index;
Query OK, 9999 rows affected (0.00 sec)
Records: 9999  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE tbl_name CREATE INDEX ft_index( searchable_text );
Query OK, 9999 rows affected (0.00 sec)
Records: 9999  Duplicates: 0  Warnings: 0

Probably your best solution is to monitor the size of the indexes on a dev server before and after the index length change.

The best option (that I have deferred to) is to ignore MySQL fulltext matching (which has serious limitations including MyISAM only, inability to match prefixed wildcards, a prohibitive default stop words list) and implement a third party solution. The best options available are:

I would personally opt for Lucene, although it requires a local java instance. If this is not possible, Sphinx is very easy to configure for PHP (walkthrough here) and many other languages.

Here are some good benchmarks and sombody else's first impressions on the matter.