PHP Performance – Handling Memory and Time Intensive Tasks

linuxperformancePHP

Sorry if this question has been asked before, but I couldn't find anything usable.

I'm working on a project for a client and currently I have to loop through the users table which is about 3000 records and still growing.

I have to do some calculations on a nightly basis which I am going to be using cron/php. The calculations script uses about 3.5mb of memory and takes about 1 second to run.

When loading individual users my current php setup handles this fine, but if I try and loop through the user list my php script execution time runs out.

I've read after doing some searching that I can make the page reload itself after each user calculation and just keep my previous place in the loop and this sounds like a good idea, but I wanted to hear some opinions from others that have handled similar situations and how you handled these types of tasks.

Thanks.

Best Answer

If you really expect your table to grow, you should start thinking about batching the process, do your calculations in steps. The simplest way would be to have a secondary table that would hold the user id and the timestamp of when the user was last processed, and limit your cron script to loop through, for example, 500 users per time. The exact numbers will depend on what exactly you are doing, it'll be a bit of a trial and error.

If you do decide to batch the process, you'll obviously need to run the cron script more than once, that's easy enough, only process users that weren't recently processed (by checking the timestamp) and, of course, logging them as processed afterwards. If your user ids are sequential, you could save yourself the trouble of logging each processed user id and just log the last one of the batch, however if something goes wrong in the middle of the batch, you wouldn't have any idea of where it stopped. Your choice ;)

Next you need to optimize the hell out of your loop. Start with the simple stuff, are you using for or foreach? There's an abundance of references claiming one being faster than the other, but the truth is you'll have to test them and find out which one is faster (if there's actually a difference) for you. Depends on your php version, os, and the structures you're looping (if you are looping iterable objects, for example), and obviously you should run your test on the server where your script will live, especially if the environment is different from your local development one.

Then, it's time to profile and optimize your calculations. You aren't telling us what you are doing, but 3.5mb of memory sounds a bit much for a single iteration. It could be that your calculations are so intensive and you've done your best, or there might be something obvious you're missing, in any case that's something only a profiler can tell you.

Although max_execution_time is hardcoded to 0 (no limit) for the CLI SAPI, you might want to limit the execution time through set_time_limit or ini_set('max_execution_time') (same thing) for two reasons:

  1. It will help testing the script via the browser, where there is a limit (in php.ini). It wouldn't be advisable to allow browser access to the production script, but during development it wouldn't make sense to setup cron just to test your script.
  2. Although there's no limit for CLI scripts, it wouldn't hurt to impose the limit just in case something awry happens. Database servers do hiccup once in a while, and you don't really want your script to run ad infinitum (== until it runs out of memory).

If you are having memory problems, then it's time to do some garbage collecting. The naive approach would be to call gc_collect_cycles at the end of your script, forcing the garbage collection of any existing cycles at that time. It wouldn't hurt if you've unset() any memory hungry resources before hand. Remember that php loops don't create their own scope, for example:

<?php

foreach($array as $key => $value) {
   doSomething($value);
}

var_dump($key, $value);

?>

will work, dumping the last $key and $value of the loop, which means that at the end of the loop you don't have one ($array) but three unused variables, that will be collected when php decides it's a good time to collect garbage. To force it, do something like:

<?php

foreach($array as $key => $value) {
   doSomething($value);
}

unset($array, $key, $value);
gc_collect_cycles();

?>

I'm 99% certain that unset($array, $key, $value); is unnecessary here, however it's a favourite hack of the < php 5.3 days, and I'm sticking with it (at least until I fully understand how garbage collection works in php ;).

For anything more than that, you really need to give us the specifics of your calculations, and show us your code.