Sql – Is SQL Server’s Full Text Search the right tool for searching phrases, not documents

full-text-searchlucenesql serversql-server-2008

30 million distinct phrases, not documents, ranging from one word to a 10 word sentence and I need to support word/phrase searching. Basically what where contains(phrase, "'book' or 'stack overflow'") offers.

I have an instance of SQL Server 2005 (32 bit, 4 proc, 4gb) going against several full text catalogs and performance is awful for word searches with high cardinality.

Here are my thoughts to speed things up, perhaps someone can offer guidance–

1) Upgrade to 2008 iFTS, 64bit. Sql Server 2005 FTS's windows service is never more than 50mb. From what I have gathered, it uses the file system cache for looking up catalog indexes. My populated catalogs on disk are only around 300mb, so why can't this all be in memory? Might iFTS's new memory architecture, which is part of the sqlserver process help here?

2) Scale out the catalogs to several servers. Will the queries to the linked FTS servers run in parallel?

3) Since I'm searching phrases here and not documents, maybe Sql Server's Full Text Search isn't the answer. Lucene.NET? Put the catalog index on a ram drive?

Best Answer

Lucene.Net can offer very high performance for this kind of application along with a pretty simple API. Release 2.3.2 is nearing completion, which offers additional performance increases over release 2.1. While putting the Lucene index in a RAMDirectory (Lucene's memory-based index structure) will offer even better performance, we see great results even with the FSDirectory (a disk-based index).