Postgresql 9 database size swelling after daily bulk updates

postgresql

I have a Postgres database that I'm importing/updating about 4 GB of data into daily. (Usually about 15-20 million rows.) I'm importing from a flat file dump of data using bulk inserts. I also have an "Insert On duplicate key, update" style trigger that will simply update a row if the id already exists. The import data itself doesn't grow much (as far as filesize), it's mostly just that the records are being updated with a small number of new records being inserted. Also, after each import I'm running VACUUM ANALYZE.

Today I noticed that the database size swelled to about 15GB. After doing some googling, I decided to try a VACUUM FULL and a REINDEX. The size of the database dropped significantly (about 50%). I read that this might be related to the max_fsm_pages setting but it looks like that was removed as of Postgres 8.4 so I'm not sure what's going on here. There is nothing in the postgres logs of particular interest that I could see.

Here's a transcript of the process: http://pastie.org/private/jt7mxajyyxuzvodch9goq

All the tables shrank considerably, but I thought it was interesting that before the vacuum and reindex the largest table was an index that was 3.7GB (bigger than even the largest content table which was 2.8GB) and shrank to 629 MB.

Is this normal or is there something wrong here? I'm trying to keep the on-disk file size down since this is running on a fairly small Linode VM.


UPDATE: Using pgAdmin, here are a few screenshots of the statistics tab:

So, from the last screenshot it definitely looks like it's index bloat that's causing the problem. The table size is 3.0GB while the index size is 4.4GB. The app_prices table, for example is the table with the most rows (about 35 million), has only 5 columns, and 3 indices. Does this still seem normal?

Best Answer

Because of the way that Postgresql implements MVCC, this is probably normal.

Edit: http://www.postgresql.org/docs/9.0/static/routine-vacuuming.html has a pretty good explanation of what's happening and how to deal with it.