PHP – How to Speed Up XML Parsing Operations

performancePHPxml

I currently have a php script set up to do some xml parsing. Sometimes the script is set as an on page include and other times it is accessed via an ajax call. The problem is the load time for this particular page is very long. I started to think that the php I had written to find what I need in the XML was written poorly and my script is very resource intense. After much research and testing the problem is indeed not my scripting (well perhaps you could consider it a problem with my scripting), but it looks like it takes a long time to load the particular xml sources.

My code is like such:

$source_recent = 'my xml feed'; 
$source_additional = 'the other feed I need'; 

$xmlstr_recent = file_get_contents($source_recent);
$feed_recent = new SimpleXMLElement($xmlstr_recent);

$xmlstr_additional = file_get_contents($source_additional);
$feed_additional = new SimpleXMLElement($xmlstr_additional);

In all my testing, the above code is what takes the time, not the additional processing I do below.

Is there anyway around this or am I at the mercy of the load time of the xml URL's?

One crazy thought I had to get around it is to load the xml contents into a db every so often, then just query the db for what I need.

Thoughts? Ideas?

Best Answer

I suggest you look into caching. Chances are the feeds don't change much, and if they do, maybe you can afford getting the changes into your application a little bit later.

Basic caching would go something like this:

  • Do we have the XML data in the cache?
  • If we do, just use the cached data.
  • If we don't, load and parse the XML file, and store the resulting DOM tree in the cache, then use the parsed data.

This would at least reduce your average response time; when the cache expires, one response will take longer, but the rest in between would completely skip the parsing step.

If you don't want any response to take longer, then you need to do the parsing asynchronously. Such a system requires three components: your existing web application, a daemon or cron job, and some kind of shared data store - a plain file in an easy-to-parse format, a memory cache such as memcached, or a database. The daemon process / cron job downloads and parses the XML files at regular intervals (say, every minute, or whatever makes sense) and updates the shared data store. If the data store update itself takes too long, consider using two data stores that you can swap atomically (e.g. using file renames or changing a symlink). The web application then never downloads or parses the XML itself, it simply queries the shared data store. Since the data there has been parsed already, the overhead is gone.

Related Topic