Java – Reading a list of XML elements in Java

javaxml

I would like to iterate over an XML document that is essentially a list of identically structured XML elements. The elements will be serialized into Java objects.

<root>
    <element attribute="value" />
    <element attribute="value" />
    <element attribute="value" />
    ...
</root>

There are a lot of elements within the root element. I would prefer not to load them all into memory. I realize I could use a SAX handler for this, but using a SAX handler to deserialize everything into Java objects seems rather obtuse. I find JDOM very easy to use, but as far as I can tell JDOM always parses the entire tree. Is there a way I can use JDOM to parse the subelements one at a time?

Another reason for using JDOM is it makes writing serialization/deserialization code easy for the corresponding Java objects, which are meaningless if not entirely in memory. However, I don't want to load all of the Java objects into memory at the same time. Rather, I want to iterate over them once.

update: here is an example of how to do this in dom4j: http://docs.codehaus.org/display/GROOVY/Reading+XML+with+Groovy+and+DOM4J. Anyway to do this in jdom?

Best Answer

Why not use StAX (javax.xml.stream.*, an implementation is included in Java SE 6) to stream in the XML, and convert individual portions to objects?

import java.io.FileReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Element.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();

        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag();
        xsr.nextTag();
        while(xsr.hasNext()) {
            Element element = (Element) unmarshaller.unmarshal(xsr);
            System.out.println(element.getAttribute());
            if(xsr.nextTag() != XMLStreamReader.START_ELEMENT) {
                break;
            }
        }
    }

}

In the above example each individual "element" is unmarshalled into a POJO using JAXB (an implementation is included in Java SE 6), but you could process the fragment as you saw fit. JAXB model details below:

import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Element {

    private String attribute;

    @XmlAttribute
    public String getAttribute() {
        return attribute;
    }

    public void setAttribute(String attribute) {
        this.attribute = attribute;
    }

}

Note:

StAX and JAXB are also compatible with Java SE 5, you just need to download the implementations separately.