If you really expect your table to grow, you should start thinking about batching the process, do your calculations in steps. The simplest way would be to have a secondary table that would hold the user id and the timestamp of when the user was last processed, and limit your cron script to loop through, for example, 500 users per time. The exact numbers will depend on what exactly you are doing, it'll be a bit of a trial and error.
If you do decide to batch the process, you'll obviously need to run the cron script more than once, that's easy enough, only process users that weren't recently processed (by checking the timestamp) and, of course, logging them as processed afterwards. If your user ids are sequential, you could save yourself the trouble of logging each processed user id and just log the last one of the batch, however if something goes wrong in the middle of the batch, you wouldn't have any idea of where it stopped. Your choice ;)
Next you need to optimize the hell out of your loop. Start with the simple stuff, are you using for or foreach? There's an abundance of references claiming one being faster than the other, but the truth is you'll have to test them and find out which one is faster (if there's actually a difference) for you. Depends on your php version, os, and the structures you're looping (if you are looping iterable objects, for example), and obviously you should run your test on the server where your script will live, especially if the environment is different from your local development one.
Then, it's time to profile and optimize your calculations. You aren't telling us what you are doing, but 3.5mb of memory sounds a bit much for a single iteration. It could be that your calculations are so intensive and you've done your best, or there might be something obvious you're missing, in any case that's something only a profiler can tell you.
Although max_execution_time
is hardcoded to 0 (no limit) for the CLI SAPI, you might want to limit the execution time through set_time_limit or ini_set('max_execution_time') (same thing) for two reasons:
- It will help testing the script via the browser, where there is a limit (in php.ini). It wouldn't be advisable to allow browser access to the production script, but during development it wouldn't make sense to setup cron just to test your script.
- Although there's no limit for CLI scripts, it wouldn't hurt to impose the limit just in case something awry happens. Database servers do hiccup once in a while, and you don't really want your script to run ad infinitum (== until it runs out of memory).
If you are having memory problems, then it's time to do some garbage collecting. The naive approach would be to call gc_collect_cycles at the end of your script, forcing the garbage collection of any existing cycles at that time. It wouldn't hurt if you've unset() any memory hungry resources before hand. Remember that php loops don't create their own scope, for example:
<?php
foreach($array as $key => $value) {
doSomething($value);
}
var_dump($key, $value);
?>
will work, dumping the last $key
and $value
of the loop, which means that at the end of the loop you don't have one ($array
) but three unused variables, that will be collected when php decides it's a good time to collect garbage. To force it, do something like:
<?php
foreach($array as $key => $value) {
doSomething($value);
}
unset($array, $key, $value);
gc_collect_cycles();
?>
I'm 99% certain that unset($array, $key, $value);
is unnecessary here, however it's a favourite hack of the < php 5.3 days, and I'm sticking with it (at least until I fully understand how garbage collection works in php ;).
For anything more than that, you really need to give us the specifics of your calculations, and show us your code.
Approach 1: Subdivide sections.
You've got four sections, each section is thus 25% of the total survey in this model. If you answer a question at the start of section 1 that leads to the path of "skip all the rest of section 1" that then has you 25% done.
Within each section, you have some questions and maybe other branching paths. If section 1 has 5 questions, each question in section 1 is then 5% of the total survey... no matter how many questions are in section 2.
- easy to calculate
- questions have different values
Approach 2: Based on the long case
Lets say you have 100 questions that could be asked. Each question is 1% of the total.
If you answer question #5 in such a way that it skips to question #10 you go from 5% complete to 10% complete.
- easyish to calculate
- still possible to have jumps
- smoother transitions for each unit done
Note that if you have two paths:
- Answer A (5%) -> 3 questions -> C (10%)
- Answer B (5%) -> 5 questions -> C (10%)
You've got the situation where while the questions on path A are worth 1%, you will either have non-unit increment questions or a skip (5%, 6%, 7%, 8%, 10%)
Best Answer
I think this might help you:
It's a C# implementation of everything in your algorithm except for holidays (which you should be able to figure out given the rest of the code).
You start off by modeling a working week (Monday-Friday, 8am-5pm, for example, with an hour a day for lunch). Then you can ask it questions like "if I have a task that takes 15 hours and I start at 9:22 on Tuesday, when will I finish?"
If you've done some Java before hopefully you'll be able to read the C#. Alternatively, there's a JavaScript port here: