PostgreSQL lots of large Arrays and Writes

databasedatabase-performancepostgresql

I am running a python program that spawns 8 threads and as each thread launch its own postmaster process via psycopg2. This is to maximize the use of my CPU-cores (8). Each thread call a series of SQL Functions. Most of these functions go through many thousands of rows each associated to a large FLOAT8[] Array (250-300) values by using unnest() and multiplying each FLOAT8 by an another FLOAT8 associated to each row. This Array approach minimized the size of the Indexes and the Tables. The Function ends with an Insert into another Table of a row of the same form (pk INT4, array FLOAT8[]). Some SQL Functions called by python will Update a row of these kind of Tables (with large Arrays).

Now I currently have configured PostgreSQL to use most of the memory for cache (effective_cache_size of 57 GB I think) and only a small amount of it for shared memory (1GB I think). First, I was wondering what the difference between Cache and Shared memory was in regards to PostgreSQL (and my application).

What I have noticed is that only about 20-40% of my total CPU processing power is used during the most Read intensive parts of the application (Select unnest(array) etc). So secondly, I was wondering what I could do to improve this so that 100% of the CPU is used. Based on my observations, it does not seem to have anything to do with python or its GIL.

Thanks

Best Answer

effective_cache_size does not change any memory setting, it's used only for estimation purposes in query planning. Crank up the shared_buffers to about 25% of your available RAM and see if there are any differences in speed.

Also, use EXPLAIN to get the queryplan and see if you need some extra indexes or better configuration.