How to Read Large XML Files Efficiently

PHPxml

I have a large XML file (about 75,000 lines) where I have to build a catalogue (houses) from. Building the lists works fine, but now I have a problem.

The catalogue should have a detailed presentation page of one house. One house (<item id="123">) has about 800-1200 lines of data, based on the house type.

Which is the best way, as referring to making the script faster and saving lines of code, to read these data and present them?

Some houses, for example, have a sauna, and when these data are in the XML file, the presentation page should contain a section sauna.

I tried before to read the whole XML content with a recursive function into arrays and with a lot of foreaches (maximum depth of children is three) with SimpleXML, but it was really ugly slow, and the recursion did not work at all, because my computer could not handle so much input.

Is there any other way to build this data except to query every variable with if?

Best Answer

XML is an inefficient method of storing large amounts of data. It uses a lot of disk space (look at an XML file and note what a large portion of it is taken up by the syntax and structure definition), and it is slow and memory-intensive to access. The whole tree (or at least a large portion of it) must be parsed just to get a single element, and XML parsers often use several times the size of the file in memory space to do this.

If you need to do something performance sensitive (such as load information onto a web page), 75,000 lines of XML just isn't going to be fast.

If performance matters, you should really move the information into a relational database, as suggested by thorsten müller. Then your task will become trivial. Even if you have no choice but to receive the data in XML, have your program perform a one-time load of that XML file into the database whenever it is updated, and then use the database the rest of the way. Besides being faster, the database will also be a lot easier to work with.

If you choose to stay with XML, you can get some help with your algorithm, but more information is needed. I suggest posting the portion of the code that you describe on Code Review.

Related Topic