We are using Postgres as DB backend for our Django project. Everything works fine except from time to time Postgres jus hangs. Load average spikes to 80 points, and there are many posgtres processes. As far I see problematic is our table with users online. On every request there is update, we are updating "last seen" column. On every minute there is cronjob deleting users inactive for 1 minute. And in example today, when there was Postgres hang I've seen many updates and deletes to and from this table. I think it's some kind of race condition?
This table with users online is not big. In rush hours there are ~800 records. Here is schema.
Column | Type | Modifiers
------------+--------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('spoleczniak_online_id_seq'::regclass)
postac_id | integer | not null
data | timestamp with time zone | not null
zalogowany | timestamp with time zone | not null
Indexes:
"spoleczniak_online_pkey" PRIMARY KEY, btree (id)
"spoleczniak_online_postac_id_key" UNIQUE CONSTRAINT, btree (postac_id)
"spoleczniak_online_data" btree (data)
Foreign-key constraints:
"spoleczniak_online_postac_id_fkey" FOREIGN KEY (postac_id) REFERENCES postac_postacie(id) DEFERRABLE INITIALLY DEFERRED
Normally Postgres is generating less than 1.5 load average points. We are running it on i7, 16 GB of RAM, and hardware RAID-10 (3×2 disks) for OS/data + RAID-1 (2 disks) for WAL. Hardware RAID has 512 MB of cache.
I've tried 9.0 and 9.1 beta (I even set no WAL logs for that table in 9.1).
I'm really thinking about moving to MySQL but don't know any conversion tools. 😛
PS. I don't know if it's more Serverfault or Stackoverflow… but as you can see I decided to put it on Serverfault. 😛
Edit:
Some information from logs:
Jul 31 20:37:16 postgres postgres[1420]: [3-1] LOG: 00000: process 1420 acquired ExclusiveLock on tuple (29,7) of relation 33107 of database 20005 after 2071.481 ms
Jul 31 20:37:16 postgres postgres[1420]: [3-2] LOCATION: ProcSleep, proc.c:1076
Jul 31 20:37:16 postgres postgres[1420]: [3-3] STATEMENT: UPDATE "spoleczniak_online" SET "postac_id" = 101651, "data" = '2011-07-31 20:39:18.000699', "zalogowany" = '2011-07-31 20:31:04.843741' WHERE "spoleczniak_online"."id" = 559650
Jul 31 20:37:16 postgres postgres[1493]: [3-1] LOG: 00000: process 1493 acquired ExclusiveLock on tuple (29,7) of relation 33107 of database 20005 after 1393.154 ms
Jul 31 20:37:16 postgres postgres[1493]: [3-2] LOCATION: ProcSleep, proc.c:1076
Jul 31 20:37:16 postgres postgres[1493]: [3-3] STATEMENT: UPDATE "spoleczniak_online" SET "postac_id" = 101651, "data" = '2011-07-31 20:39:15.646537', "zalogowany" = '2011-07-31 20:31:04.843741' WHERE "spoleczniak_online"."id" = 559650
So ok, it's locking issue… but how can I avoid that?
Best Answer
This seems like locking problem: some large query locks too much, so other running queries cannot complete until lock is freed. Turn on slow query logging, add more monitoring. I strongly suggest using some monitoring system (zabbix, zenoss, etc) to be aware of your postgresql database status.
As a quick 'monitoring', take a look to pg_stat_activity and pg_locks views during the hang. Here is a good query that I'm often starting with:
It often gives enough information, but sometimes I have to run more in order to understand what's going on.
Please update your question with output of this (or similar) query.