Java – fast, accurate Highlighter for Lucene

javalucene

I've been using the (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this isn't really very accurate when it comes to matching the correct terms in search results – it works well for simple queries, for example searching for two separate words will highlight both code fragments in the results.

However, it doesn't act well with more complicated queries. In the simplest case, phrase queries such as "Stack Overflow" will match all occurrences of Stack or Overflow in the highlighting, which gives the impression to the user that it isn't working very well.

I tried applying the fix here but that came with a lot of performance caveats, and at the end of the day was just plain unusable. The performance is especially an issue on wildcard queries. This is due to the way that the highlighting works; instead of just working on the querystring and the text it parses it as Lucene would and then looks for all the matches that Lucene has made; unfortunately this means that for certain wildcard queries it can be looking for matches to 2000+ clauses on large documents, and it's simply not fast enough.

Is there any faster implementation of an accurate highlighter?

Best Answer

There is a new faster highlighter (needs to be patched in but will be part of release 2.9)

https://issues.apache.org/jira/browse/LUCENE-1522

and a back-reference to this question

Android R Update:

From Android R, this method always returns false. Google says that this is done "to protect goat privacy":

/**
 * Used to determine whether the user making this call is subject to
 * teleportations.
 *
 * <p>As of {@link android.os.Build.VERSION_CODES#LOLLIPOP}, this method can
 * now automatically identify goats using advanced goat recognition technology.</p>
 *
 * <p>As of {@link android.os.Build.VERSION_CODES#R}, this method always returns
 * {@code false} in order to protect goat privacy.</p>
 *
 * @return Returns whether the user making this call is a goat.
 */
public boolean isUserAGoat() {
    if (mContext.getApplicationInfo().targetSdkVersion >= Build.VERSION_CODES.R) {
        return false;
    }
    return mContext.getPackageManager()
            .isPackageAvailable("com.coffeestainstudios.goatsimulator");
}

Previous answer:

From their source, the method used to return false until it was changed in API 21.

/**
 * Used to determine whether the user making this call is subject to
 * teleportations.
 * @return whether the user making this call is a goat 
 */
public boolean isUserAGoat() {
    return false;
}

It looks like the method has no real use for us as developers. Someone has previously stated that it might be an Easter egg.

In API 21 the implementation was changed to check if there is an installed app with the package com.coffeestainstudios.goatsimulator

/**
 * Used to determine whether the user making this call is subject to
 * teleportations.
 *
 * <p>As of {@link android.os.Build.VERSION_CODES#LOLLIPOP}, this method can
 * now automatically identify goats using advanced goat recognition technology.</p>
 *
 * @return Returns true if the user making this call is a goat.
 */
public boolean isUserAGoat() {
    return mContext.getPackageManager()
            .isPackageAvailable("com.coffeestainstudios.goatsimulator");
}

Here is the source and the change.

Best Answer

Related Solutions

Java – Why is char[] preferred over String for passwords

Java – Proper use cases for Android UserManager.isUserAGoat()

Android R Update:

Previous answer:

Related Topic