Php – How to find “FooBar” when seaching “Foo Bar” in Zend Lucene

fuzzy-searchlucenePHPzend-frameworkzend-search-lucene

I'm building a search function for a php website using Zend Lucene and i'm having a problem.
My web site is a Shop Director (something like that).

For example i have a shop named "FooBar" but my visitors seach for "Foo Bar" and get zero results. Also if a shop is named "Foo Bar" and visitor seaches "FooBar" nothing is found.

I tried to seach for " foobar~ " (fuzzy seach) but did not found articles named "Foo Bar"

Is there a speciar way to build the index or to make the query?

Best Answer

Option 1: Break the input query string in two parts at various points and search them. eg. In this case query would be (+fo +bar) OR (+foo +bar) OR (+foob +ar) The problem is this tokenization assumes there are two tokens in input query string. Also, you may get extra, possibly irrelevant, results such as results of (+foob +ar)

Option 2: Use n-gram tokenization while indexing and querying. While indexing the tokens for "foo bar" would be fo, oo, ba, ar. While searching with foobar, tokens would be fo, oo, ob, ba, ar. Searching with OR as operator will give you the documents with maximum n-gram matches at the top. This can achieved with NGramTokenizer

Related Topic