Is it possible to set a Solr Score threshold ‘reasonably’, independent of results returned? (i.e. Is Solr Scoring standardized in any way)

solr

I have a Solr index with many entries, and upon query some subset is returned – each entry having some score, (Obvious). Once the results are returned with scores, I want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?

I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.

Ideally I'm just looking for a method to take the top x entries as long as they are of at least a certain quality.

Best Answer

I think you should not do this. With the TF-IDF scoring model, there is no way to compute a score above which all results are relevant and vice-versa. And if you manage to do this, it is very likely that this threshold will not be valid anymore after a few updates to your index (because document frequencies will change).

If you still want to do this, I think it is achievable using function queries : there are a if (in trunk), and a query functions available in Solr. Just filter your results so that you only keep entries which have a higher score than a given threshold.

Related Topic