Sorting on multivalued fields does not work fine with Solr.
Documentation
Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either
non-tokenized (ie: has no Analyzer) or uses an Analyzer that only
produces a single Term (ie: uses the KeywordTokenizer)
When you want to sort the products from low to high or high to low, what price will Solr pick ? As from the example you have a Sell price of 0 as well as 195 ?
The function queries also do not allow to use max or min on the multivalued fields.
The option you have to save the highest and lowest sell price as single valued fields, and use those fields for sorting.
highest_sell_price=195
lowest_sell_price=0
and use these fields for sorting.
You can just run your query q=field_name:David
with debugQuery=on
and see what happens.
These are the results (included the score through fl=*,score
) sorted by score desc
:
<doc>
<float name="score">0.4451987</float>
<str name="id">2</str>
<arr name="text_ws">
<str>David Letterman</str>
</arr>
</doc>
<doc>
<float name="score">0.44072422</float>
<str name="id">3</str>
<arr name="text_ws">
<str>David Hasselhoff</str>
<str>David Michael Hasselhoff</str>
</arr>
</doc>
<doc>
<float name="score">0.314803</float>
<str name="id">1</str>
<arr name="text_ws">
<str>David Bowie</str>
<str>David Robert Jones</str>
<str>Ziggy Stardust</str>
<str>Thin White Duke</str>
</arr>
</doc>
And this is the explanation:
<lst name="explain">
<str name="2">
0.4451987 = (MATCH) fieldWeight(text_ws:David in 1), product of: 1.0 = tf(termFreq(text_ws:David)=1) 0.71231794 = idf(docFreq=3, maxDocs=3) 0.625 = fieldNorm(field=text_ws, doc=1)
</str>
<str name="3">
0.44072422 = (MATCH) fieldWeight(text_ws:David in 2), product of: 1.4142135 = tf(termFreq(text_ws:David)=2) 0.71231794 = idf(docFreq=3, maxDocs=3) 0.4375 = fieldNorm(field=text_ws, doc=2)
</str>
<str name="1">
0.314803 = (MATCH) fieldWeight(text_ws:David in 0), product of: 1.4142135 = tf(termFreq(text_ws:David)=2) 0.71231794 = idf(docFreq=3, maxDocs=3) 0.3125 = fieldNorm(field=text_ws, doc=0)
</str>
</lst>
The scoring factors here are:
- termFreq: how often a term appears in the document
- idf: how often the term appears across the index
- fieldNorm: importance of the term, depending on index-time boosting and field length
In your example the fieldNorm
makes the difference. You have one document with lower termFreq
(1 instead of 1.4142135) since the term appears just one time, but that match is more important because of the field length.
The fact that your field is multiValued doesn't change the scoring. I guess it would be the same with a single value field with the same content. Solr works in terms of field length and terms, so, yes, David Bowie is punished for having many more tokens than the others. :)
UPDATE
I actually think David Bowie deserves his opportunity. Like explained above, the fieldNorm
makes the difference. Add the attribute omitNorms=true
to your text_ws
field in the schema.xml
and reindex. The same query will give you the following result:
<doc>
<float name="score">1.0073696</float>
<str name="id">1</str>
<arr name="text">
<str>David Bowie</str>
<str>David Robert Jones</str>
<str>Ziggy Stardust</str>
<str>Thin White Duke</str>
</arr>
</doc>
<doc>
<float name="score">1.0073696</float>
<str name="id">3</str>
<arr name="text">
<str>David Hasselhoff</str>
<str>David Michael Hasselhoff</str>
</arr>
</doc>
<doc>
<float name="score">0.71231794</float>
<str name="id">2</str>
<arr name="text">
<str>David Letterman</str>
</arr>
</doc>
As you can see now the termFreq
wins and the fieldNorm
is not taken into account at all. That's why the two documents with two David occurences are on top and with the same score, despite of their different lengths, and the shorter document with just one match is the last one with the lowest score. Here's the explanation with debugQuery=on
:
<lst name="explain">
<str name="1">
1.0073696 = (MATCH) fieldWeight(text:David in 0), product of: 1.4142135 = tf(termFreq(text:David)=2) 0.71231794 = idf(docFreq=3, maxDocs=3) 1.0 = fieldNorm(field=text, doc=0)
</str>
<str name="3">
1.0073696 = (MATCH) fieldWeight(text:David in 2), product of: 1.4142135 = tf(termFreq(text:David)=2) 0.71231794 = idf(docFreq=3, maxDocs=3) 1.0 = fieldNorm(field=text, doc=2)
</str>
<str name="2">
0.71231794 = (MATCH) fieldWeight(text:David in 1), product of: 1.0 = tf(termFreq(text:David)=1) 0.71231794 = idf(docFreq=3, maxDocs=3) 1.0 = fieldNorm(field=text, doc=1)
</str>
</lst>
Best Answer
A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed. so if we have tags field as multivalued then solr response will return a list instead of a string value. One point to note is that you need to submit multiple lines for each value of the tags like:
Once you have all the values index you can search or filter results by any value, e,g. you can find all documents with tag1 using query like
or use the tags to filter out results like