How does Lucene work

lucene

I would like to find out how lucene search works so fast. I can't find any useful docs on the web. If you have anything (short of lucene source code) to read, let me know.

A text search query using mysql5 text search with index takes about 18 minutes in my case. A lucene search for the same query takes less than a second.

Best Answer

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Lucene does not have to optimize for transaction processing. When you add a document, it need not ensure that queries see it instantly. And it need not optimize for updates to existing documents.

However, at the end of the day, if you really want to know, you need to read the source. Both things you reference are open source, after all.

Related Topic