WSGIDaemonProcess – How Many Processes to Specify for Django

apache-2.2djangomod-wsgi

Let's say I have 2 sites(Superuser and Serverfault) running from their own Apache virtual host on one box. The 2 sites are powered by Django and are running on Apache with mod-wsgi. A typical configuration file for one of the site will look like the following:

WSGIDaemonProcess serverfault.com user=www-data group=www-data processes=5

The host is a linux machine with 4GB of RAM running Ubuntu. Can anyone suggest the number of processes I should specify above for my 2 sites? Let's assume they have the same traffic as the actual Superuser and Serverfault sites.

Best Answer

Well, how much traffic do the actual Superuser and Serverfault sites have? Hypotheticals aren't much use if they don't have enough info to make the answer easier...

Your worst-case process count should be the peak number of requests per second you want the site to be able to handle, divided by the number of requests per second that one process can handle if all those requests are made to your slowest action (so the reciprocal of the processing time of that action). Add whatever fudge factor you think is appropriate, based on the confidence interval of your req/sec and time measurements.

The average case count is the same, but you divide the req/sec by the weighted mean of your requests per second figure for each action (the weight is the percentage of requests you expect to hit that particular action). Again, fudge factors are useful.

The actual upper bound of how many processes you can run on the machine is dictated by the upper amount of memory each process takes; spool up one process, then run a variety of memory-hungry actions (ones that retrieve and process a lot of data, typically) against it with a realistic data set (if you just use a toy data set for testing, say 50 or 100 rows, then if one of your actions retrieves and manipulates every row in the table it won't be a good measurement for when that table grows to 10,000 rows) to see what the memory usage balloons out to. You can artificially constrain your per-process memory usage with a script that reaps workers that reach a certain memory usage threshold, at the risk of causing nasty problems if you set that threshold too low.

Once you've got your memory use figure, you deduct some amount of memory for system overhead (I like 512MB myself), deduct a pile more if you've got other processes running on the same machine (like a database), and then some more to make sure you don't run out of disk cache space (depends on your disk working set size, but again I'd go with no less than 512MB). That's the amount of memory that you divide by your per-process memory usage to get the ceiling.

If the number of processes you need to service your peak load is greater than the number of processes you can fit on the box, you need more machines (or to move the database to another machine, in the simplest case).

There you are, several years of experience scaling websites distilled into one small and simple SF post.