Search multiple tables

database-designMySQLsearch-engine

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphinx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

Best Answer

Why not let the text search engine like lucene or sphinx do all the work? I use Lucene and have found it to be very good at searching multiple tables and fields for whatever you set as a target.

You don't say what your web front end is written in but both of these tools can be used with most languages.

You will be adding a lot of code to maintain in the triggers that watch for created/modified/deleted in all the tables. Wouldn't it be easier to add the option for admins to mark a column as "to be searched". Then write code to add this column to indexed fields?

I see that Sphinx has "offline index construction and incremental on-the-fly index updates" which would seem to allow you to add a field, rebuild the index offline and then bring the new index up.

Related Topic