Java – How to perform a lucene query containing special character using QueryParser

javalucene

Here is the thing. I have a term stored in the index, which contains special character, such as '-', the simplest code is like this:

Document doc = new Document();
doc.add(new TextField("message", "1111-2222-3333", Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);

And then I create a query using QueryParser, like this:

String queryStr = "1111-2222-3333";
QueryParser parser = new QueryParser(Version.LUCENE_36, "message", new StandardAnalyzer(Version.LUCENE_36));
Query q = parser.parse(queryStr);

And then I use a searcher to search the query and get no result. I have also tried this:

Query q = parser.parse(QueryParser.escape(queryStr));

And still no result.

Without using QueryParser and instead using TermQuery directly can do what I want, but this way is not flexible enough for user input texts.

I think maybe the StandardAnalyzer did something to omit the special character in the query string. I tried debug, and I found that the string is splited and the actual query is like this:"message:1111 message:2222 message:3333". I don't know what exactly lucene has done…

So if I want to perform the query with special character, what should I do? Should I rewrite an analyzer or inherit a queryparser from the default one? And how to?…

Update:

1 @The New Idiot @femtoRgon, I've tried QueryParser.escape(queryStr) as stated in the problem but it still doesn't work.

2 I've tried another way to solve the problem. I derived a QueryTokenizer from Tokenizer and cut the word only by space, pack it into a QueryAnalyzer, which derives from Analyzer, and finally pass the QueryAnalyzer into QueryParser.

Now it works. Originally it doesn't work because the default StandardAnalyzer cut the queryStr according to default rules(which recognize some of the special characters as splitters), when the query is passed into QueryParser, the special characters are already deleted by StandardAnalyzer. Now I use my own way to cut the queryStr and it only recognize space as splitter, so the special characters remain into the query waiting for processing and this works.

3 @The New Idiot @femtoRgon, thank you for answering my question.

Best Answer

I am not sure about this , but I guess you need to escape - with \ . As per the Lucene docs.

The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.

Again ,

Lucene supports escaping special characters that are part of the query syntax. The current list special characters are

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

To escape these character use the \ before the character.

Also remember, some characters you'll need to escape twice if they have special meaning in Java.

Related Solutions

Java – How to create an executable JAR with dependencies using Maven

<build>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <configuration>
        <archive>
          <manifest>
            <mainClass>fully.qualified.MainClass</mainClass>
          </manifest>
        </archive>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
  </plugins>
</build>

and you run it with

mvn clean compile assembly:single

Compile goal should be added before assembly:single or otherwise the code on your own project is not included.

See more details in comments.

Commonly this goal is tied to a build phase to execute automatically. This ensures the JAR is built when executing mvn install or performing a deployment/release.

<plugin>
  <artifactId>maven-assembly-plugin</artifactId>
  <configuration>
    <archive>
      <manifest>
        <mainClass>fully.qualified.MainClass</mainClass>
      </manifest>
    </archive>
    <descriptorRefs>
      <descriptorRef>jar-with-dependencies</descriptorRef>
    </descriptorRefs>
  </configuration>
  <executions>
    <execution>
      <id>make-assembly</id> <!-- this is used for inheritance merges -->
      <phase>package</phase> <!-- bind to the packaging phase -->
      <goals>
        <goal>single</goal>
      </goals>
    </execution>
  </executions>
</plugin>

Java – How to avoid Java code in JSP files, using JSP 2

The use of scriptlets (those <% %> things) in JSP is indeed highly discouraged since the birth of taglibs (like JSTL) and EL (Expression Language, those ${} things) way back in 2001.

The major disadvantages of scriptlets are:

Reusability: you can't reuse scriptlets.
Replaceability: you can't make scriptlets abstract.
OO-ability: you can't make use of inheritance/composition.
Debuggability: if scriptlet throws an exception halfway, all you get is a blank page.
Testability: scriptlets are not unit-testable.
Maintainability: per saldo more time is needed to maintain mingled/cluttered/duplicated code logic.

~~Sun~~ Oracle itself also recommends in the JSP coding conventions to avoid use of scriptlets whenever the same functionality is possible by (tag) classes. Here are several cites of relevance:

From JSP 1.2 Specification, it is highly recommended that the JSP Standard Tag Library (JSTL) be used in your web application to help reduce the need for JSP scriptlets in your pages. Pages that use JSTL are, in general, easier to read and maintain.

...

Where possible, avoid JSP scriptlets whenever tag libraries provide equivalent functionality. This makes pages easier to read and maintain, helps to separate business logic from presentation logic, and will make your pages easier to evolve into JSP 2.0-style pages (JSP 2.0 Specification supports but de-emphasizes the use of scriptlets).

...

In the spirit of adopting the model-view-controller (MVC) design pattern to reduce coupling between the presentation tier from the business logic, JSP scriptlets should not be used for writing business logic. Rather, JSP scriptlets are used if necessary to transform data (also called "value objects") returned from processing the client's requests into a proper client-ready format. Even then, this would be better done with a front controller servlet or a custom tag.

How to replace scriptlets entirely depends on the sole purpose of the code/logic. More than often this code is to be placed in a fullworthy Java class:

If you want to invoke the same Java code on every request, less-or-more regardless of the requested page, e.g. checking if a user is logged in, then implement a filter and write code accordingly in doFilter() method. E.g.:

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws ServletException, IOException {
      if (((HttpServletRequest) request).getSession().getAttribute("user") == null) {
          ((HttpServletResponse) response).sendRedirect("login"); // Not logged in, redirect to login page.
      } else {
          chain.doFilter(request, response); // Logged in, just continue request.
      }
  }

When mapped on an appropriate <url-pattern> covering the JSP pages of interest, then you don't need to copypaste the same piece of code overall JSP pages.

If you want to invoke some Java code to process a GET request, e.g. preloading some list from a database to display in some table, if necessary based on some query parameters, then implement a servlet and write code accordingly in doGet() method. E.g.:
```
  protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
      try {
          List<Product> products = productService.list(); // Obtain all products.
          request.setAttribute("products", products); // Store products in request scope.
          request.getRequestDispatcher("/WEB-INF/products.jsp").forward(request, response); // Forward to JSP page to display them in a HTML table.
      } catch (SQLException e) {
          throw new ServletException("Retrieving products failed!", e);
      }
  }
```
This way dealing with exceptions is easier. The DB is not accessed in the midst of JSP rendering, but far before the JSP is been displayed. You still have the possibility to change the response whenever the DB access throws an exception. In the above example, the default error 500 page will be displayed which you can anyway customize by an <error-page> in web.xml.

If you want to invoke some Java code to process a POST request, such as gathering data from a submitted HTML form and doing some business stuff with it (conversion, validation, saving in DB, etcetera), then implement a servlet and write code accordingly in doPost() method. E.g.:

  protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
      String username = request.getParameter("username");
      String password = request.getParameter("password");
      User user = userService.find(username, password);

      if (user != null) {
          request.getSession().setAttribute("user", user); // Login user.
          response.sendRedirect("home"); // Redirect to home page.
      } else {
          request.setAttribute("message", "Unknown username/password. Please retry."); // Store error message in request scope.
          request.getRequestDispatcher("/WEB-INF/login.jsp").forward(request, response); // Forward to JSP page to redisplay login form with error.
      }
  }

This way dealing with different result page destinations is easier: redisplaying the form with validation errors in case of an error (in this particular example you can redisplay it using ${message} in EL), or just taking to the desired target page in case of success.

If you want to invoke some Java code to control the execution plan and/or the destination of the request and the response, then implement a servlet according to the MVC's Front Controller Pattern. E.g.:

  protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
      try {
          Action action = ActionFactory.getAction(request);
          String view = action.execute(request, response);

          if (view.equals(request.getPathInfo().substring(1)) {
              request.getRequestDispatcher("/WEB-INF/" + view + ".jsp").forward(request, response);
          } else {
              response.sendRedirect(view);
          }
      } catch (Exception e) {
          throw new ServletException("Executing action failed.", e);
      }
  }

Or just adopt an MVC framework like JSF, Spring MVC, Wicket, etc so that you end up with just a JSP/Facelets page and a JavaBean class without the need for a custom servlet.

If you want to invoke some Java code to control the flow inside a JSP page, then you need to grab an (existing) flow control taglib like JSTL core. E.g. displaying List<Product> in a table:
```
  <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
  ...
  <table>
      <c:forEach items="${products}" var="product">
          <tr>
              <td>${product.name}</td>
              <td>${product.description}</td>
              <td>${product.price}</td>
          </tr>
      </c:forEach>
  </table>
```
With XML-style tags which fit nicely among all that HTML, the code is better readable (and thus better maintainable) than a bunch of scriptlets with various opening and closing braces ("Where the heck does this closing brace belong to?"). An easy aid is to configure your web application to throw an exception whenever scriptlets are still been used by adding the following piece to web.xml:
```
  <jsp-config>
      <jsp-property-group>
          <url-pattern>*.jsp</url-pattern>
          <scripting-invalid>true</scripting-invalid>
      </jsp-property-group>
  </jsp-config>
```
In Facelets, the successor of JSP, which is part of the Java EE provided MVC framework JSF, it is already not possible to use scriptlets. This way you're automatically forced to do things "the right way".
If you want to invoke some Java code to access and display "backend" data inside a JSP page, then you need to use EL (Expression Language), those ${} things. E.g. redisplaying submitted input values:
```
  <input type="text" name="foo" value="${param.foo}" />
```
The ${param.foo} displays the outcome of request.getParameter("foo").
If you want to invoke some utility Java code directly in the JSP page (typically public static methods), then you need to define them as EL functions. There's a standard functions taglib in JSTL, but you can also easily create functions yourself. Here's an example how JSTL fn:escapeXml is useful to prevent XSS attacks.
```
  <%@ taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
  ...
  <input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />
```
Note that the XSS sensitivity is in no way specifically related to Java/JSP/JSTL/EL/whatever, this problem needs to be taken into account in every web application you develop. The problem of scriptlets is that it provides no way of builtin preventions, at least not using the standard Java API. JSP's successor Facelets has already implicit HTML escaping, so you don't need to worry about XSS holes in Facelets.

Best Answer

Related Solutions

Java – How to create an executable JAR with dependencies using Maven

Java – How to avoid Java code in JSP files, using JSP 2

See also:

Related Topic