C++ – Fastest C++ XML parsing library

clibrariesparsingtext processingxml

I have thousands of .xml files from size 1MB-45MB (no DTDs). I need to parse and further manipulate these XML files before generating separate .xml files with the results of my regex.

What the fastest open-source XML parsing library for C++? Aside from the parsing library, what other approaches can I use to speed up XML parsing?

Best Answer

RapidXml is an attempt to create the fastest XML parser possible, while retaining useability, portability and reasonable W3C compatibility. It is an in-situ parser written in modern C++, with parsing speed approaching that of strlen function executed on the same data.

http://rapidxml.sourceforge.net/

Maybe you could start comparing Expat with this one? Expat is known to be really efficient on speed. RapidXml is used as backend of some boost libraries, mostly Boost.PropertyTree.

Also, I think maybe a more xml-scheme-specific approach could (maybe) be more efficient because of knowledge of the structure of the code. Such claim is just a supposition but if you're interested, CodeSynthesis provide a C++ code generators that takes an xsd file as input. The resulting parsing code might be more helpful... if you take time to define your format in an xsd. There are other similar tools available but it's the one I'm using for my last project. Those tools are mostly based on xerces but you can generate code that is independent. No idea about the performance impact.