Java – Dealing with “Xerces hell” in Java/Maven

classloaderdependency-managementjavamavenxerces

In my office, the mere mention of the word Xerces is enough to incite murderous rage from developers. A cursory glance at the other Xerces questions on SO seem to indicate that almost all Maven users are "touched" by this problem at some point. Unfortunately, understanding the problem requires a bit of knowledge about the history of Xerces…

History

  • Xerces is the most widely used XML parser in the Java ecosystem. Almost every library or framework written in Java uses Xerces in some capacity (transitively, if not directly).

  • The Xerces jars included in the official binaries are, to this day, not versioned. For example, the Xerces 2.11.0 implementation jar is named xercesImpl.jar and not xercesImpl-2.11.0.jar.

  • The Xerces team does not use Maven, which means they do not
    upload an official release to Maven Central.

  • Xerces used to be released as a single jar (xerces.jar), but was split into two jars, one containing the API (xml-apis.jar) and one containing the implementations of those APIs (xercesImpl.jar). Many older Maven POMs still declare a dependency on xerces.jar. At some point in the past, Xerces was also released as xmlParserAPIs.jar, which some older POMs also depend on.

  • The versions assigned to the xml-apis and xercesImpl jars by those who deploy their jars to Maven repositories are often different. For example, xml-apis might be given version 1.3.03 and xercesImpl might be given version 2.8.0, even though both are from Xerces 2.8.0. This is because people often tag the xml-apis jar with the version of the specifications that it implements. There is a very nice, but incomplete breakdown of this here.

  • To complicate matters, Xerces is the XML parser used in the reference implementation of the Java API for XML Processing (JAXP), included in the JRE. The implementation classes are repackaged under the com.sun.* namespace, which makes it dangerous to access them directly, as they may not be available in some JREs. However, not all of the Xerces functionality is exposed via the java.* and javax.* APIs; for example, there is no API that exposes Xerces serialization.

  • Adding to the confusing mess, almost all servlet containers (JBoss, Jetty, Glassfish, Tomcat, etc.), ship with Xerces in one or more of their /lib folders.

Problems

Conflict Resolution

For some — or perhaps all — of the reasons above, many
organizations publish and consume custom builds of Xerces in their
POMs. This is not really a problem if you have a small application and are only using Maven Central, but it quickly becomes an issue for enterprise software where Artifactory or Nexus is proxying multiple repositories (JBoss, Hibernate, etc.):

xml-apis proxied by Artifactory

For example, organization A might publish xml-apis as:

<groupId>org.apache.xerces</groupId>
<artifactId>xml-apis</artifactId>
<version>2.9.1</version>

Meanwhile, organization B might publish the same jar as:

<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.3.04</version>

Although B's jar is a lower version than A's jar, Maven does not know
that they are the same artifact because they have different
groupIds. Thus, it cannot perform conflict resolution and both
jars will be included as resolved dependencies:

resolved dependencies with multiple xml-apis

Classloader Hell

As mentioned above, the JRE ships with Xerces in the JAXP RI. While it would be nice to mark all Xerces Maven dependencies as <exclusion>s or as <provided>, the third-party code you depend on may or may not work with the version provided in JAXP of the JDK you're using. In addition, you have the Xerces jars shipped in your servlet container to contend with. This leaves you with a number of choices: Do you delete the servlet version and hope that your container runs on the JAXP version? Is it better to leave the servlet version, and hope that your application frameworks run on the servlet version? If one or two of the unresolved conflicts outlined above manage to slip into your product (easy to happen in a large organization), you quickly find yourself in classloader hell, wondering which version of Xerces the classloader is picking at runtime and whether or not it will pick the same jar in Windows and Linux (probably not).

Solutions?

We've tried marking all Xerces Maven dependencies as <provided> or as an <exclusion>, but this is difficult to enforce (especially with a large team) given that the artifacts have so many aliases (xml-apis, xerces, xercesImpl, xmlParserAPIs, etc.). Additionally, our third party libs/frameworks may not run on the JAXP version or the version provided by a servlet container.

How can we best address this problem with Maven? Do we have to exercise such fine-grained control over our dependencies, and then rely on tiered classloading? Is there some way to globally exclude all Xerces dependencies, and force all of our frameworks/libs to use the JAXP version?


UPDATE: Joshua Spiewak has uploaded a patched version of the Xerces build scripts to XERCESJ-1454 that allows for upload to Maven Central. Vote/watch/contribute to this issue and let's fix this problem once and for all.

Best Answer

There are 2.11.0 JARs (and source JARs!) of Xerces in Maven Central since 20th February 2013! See Xerces in Maven Central. I wonder why they haven't resolved https://issues.apache.org/jira/browse/XERCESJ-1454...

I've used:

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

and all dependencies have resolved fine - even proper xml-apis-1.4.01!

And what's most important (and what wasn't obvious in the past) - the JAR in Maven Central is the same JAR as in the official Xerces-J-bin.2.11.0.zip distribution.

I couldn't however find xml-schema-1.1-beta version - it can't be a Maven classifier-ed version because of additional dependencies.