Postgresql – Server Clustering (Django, Apache, Nginx, Postgres)

clusterdjangoload balancingpostgresqlscalability

I have a project deployed with django, Apache, Nginx and Postgres. The project has requirement of live data viewable to customers.
The projects main points are:
1. Devices in field send data to server(devices are also like website users) after login.
2. There is background import process which imports the uploaded data in postgres.
3. The webusers of the system use this data and can send commands to the devices, which devices read when they login.
4. There are also background analysis routines running on the data.

All the above mentioned setup and system is deployed on one amazon EC2 cloud machine.
The project currently supports over 600 devices and 400 users. But as the number of devices are increasing with time the performance of the server is going down.

We want to extend this project so that it can support more and more devices.
My initial thinking is, We will create one more server like current one and divide the devices amongst these to servers. But Again We need a central user and device managment point though django admin.

Any Ideas?
What are the best possible ways to create a scalable architecture?
How can I create a Postgres Cluster and Use it with Django, if possible?

Best Answer

Your question is short on details and long on hand-waving, but it sounds like your initial thinking is a pretty sound start. Your app sounds pretty similar to the Zenoss monitoring suite, which uses essentially the same load-distribution architecture to scale up: Multiple monitoring hosts sharing the data collection workload, with a single admin interface, and a database on either the admin host or a separate system.

If your bottleneck is at point #1 (devices sending data to your server), splitting those tasks across a second machine should carve out some room for load growth. The biggest implementation obstacle is usually how to manage tasks across multiple Django servers. Celery, a distribued task queue engine, is probably the best option at the moment. It was originally designed around Django, which is good for you, and it has very active and helpful community of developers and users.

If points #2 and #4 are your current limitation, though, you're probably talking about database scalability. This is just a hard problem, in general: There is no code-transparent, load-neutral, and cheap way to scale up database capacity.

If you only need to get more database "read" IO capacity, replication will probably do the trick. Postgres supports replication using an external tool called Slony-I. The is single-master replication, with multiple read-only "slave" hosts that get fed copies of statements executed on the master. All of your app's writes (UPDATE, INSERT, DELETE...) go through the single master host, but you distribute your reads (SELECT...) across the master and all of the slaves.

The code modifications needed for distributed reads are usually pretty straightforward. Django recently added support for replicated databases, which I haven't used, but it's supposed to be pretty good.

If you need more database write IO capacity, sharding will probably work. Each host keeps a separate, unique chunk of each database table. The DB clients use a deterministic function decides where any given record should reside, so the load distribution is effectively stateless and can scale up to huge numbers of DB servers. Django's new multi-database support (same link as above) also supports sharding. You'll need some code changes, the pain should be limited.

Also, I want to mention Memcached, which seems to be part of just about every highly scalable web application on the Internet, today (Facebook, Google, Twitter...). A good caching implementation can cut your database requirements to a fraction of their original size, by converting expensive, slow DB lookups into cheap, fast cache lookups. Django has supported Memcached integration for quite a while, now.

I realize none of this is too specific, but it should give you a pretty good starting place for working out the details. Good luck with your project.