Lucene search and underscores

lucenelucene.net

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE.
When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

Is there a simple way to escape the underscore (_) character so that it will search for it?

EDIT:

4/1/2010 11:08AM PST

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before.
Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

"bb hhh_ffff5_ssss"

After some testing, I've found that this is because of the number. If I input

"BB_HHH_FFFF_SSSS", I get

"bb hhh ffff ssss"

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

Can anyone confirm this?

Best Answer

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

Related Topic