Postgresql – How well does PostgreSQL perform with a large number of databases

database-administrationdatabase-performancepostgresql

We have an web application whose architecture requires that any registered user (a company, actually) should be isolated from the other, i.e., I'll run the same webapp with the same data models, but with different data sets for every customer.

So, we did think about creating a different database in Postgres for every customer. Can this solution scale to, say, 10-20K databases? How well?

Does anyone have a better solution for this?

Thanks in advance.

Best Answer

At the low end, it basically boils down to "can you absolutely say that you have no shared data?" Unlike mysql, the database is an absolute boundry in postgresql. You cannot SELECT zip_code FROM common.city_zip WHERE city=... if you go with separate databases (at least not without dblink).

If you have any shared data at all, postgresql's "schema" is similar to what mysql calls a "database". You can CREATE SCHEMA clienta; CREATE TABLE clienta.customer (...);. You would create a schema for each client, that client's user would have their schema first in their search path, and permissions would be granted so that Client A's user would have access to the clienta and the public schemas (and their tables).

Your issue is going to be that at the high end of # of clients, each table is stored as a file, so whether you go with one database per client, one schema per client, or use something like ${client}_customer for your table names, you will likely run into filedescriptor limits with 10k clients even if you only had one table per client (plus one filedescriptor per connection). Of course, you can adjust the kernel's maximum number of file descriptors on the fly using sysctl, but the per-process limit (ulimit) will require restarting postgresql if you set it too low the first time around.

The alternative is to have "one big table" with a client column that identifies which client that row belongs to (ideally, by username if you have one user per client, this makes the stuff below a LOT easier). By not granting any access at all to this table by the clients, you can create client-specific views (or use session_user to identify the current client). Updates can't be done directly through a view, though. You would need to have defined functions to insert/update/delete on the table (one set of functions per client or else using session_user) with the functions using SECURITY DEFINER to execute as a special user with permission to insert/update/delete on the tables (note: session_user is used because user and current_user are based on the current context, and within a SECURITY DEFINER function this would always be the user who defined the function).

Performance-wise, beyond the fd issue, I honestly don't know what would happen with 10000 databases in postgresql, versus having one large table with 10000 clients' worth of data in it. Proper index design should keep the large table from being slow to query.

I will say that I went with separate databases for each client here (we add servers to keep the system usable, shifting client databases to new servers as needed, so we will never get to 10k databases on one server). I've had to restore individual clients' data from backups for debugging or due to user error on a regular basis, something that would be an absolute nightmare on the "one big table" design. Also, if you intend to sell customization of your product to your clients, the "one big table" design might end up hobbling you as far as ability to customize the data model.

Related Solutions

Postgresql – What’s the best way to automate backing-up of PostgreSQL databases

the same as you do for any other repetitive task that can be automated - you write a script to do the backup, and then set up a cron job to run it.

a script like the following, for instance:

(Note: it has to be run as the postgres user, or any other user with the same privs)

#! /bin/bash

# backup-postgresql.sh
# by Craig Sanders &lt;cas@taz.net.au&gt;
# This script is public domain.  feel free to use or modify
# as you like.

DUMPALL='/usr/bin/pg_dumpall'
PGDUMP='/usr/bin/pg_dump'
PSQL='/usr/bin/psql'

# directory to save backups in, must be rwx by postgres user
BASE_DIR='/var/backups/postgres'
YMD=$(date "+%Y-%m-%d")
DIR="$BASE_DIR/$YMD"
mkdir -p "$DIR"
cd "$DIR"

# get list of databases in system , exclude the tempate dbs
DBS=( $($PSQL --list --tuples-only |
          awk '!/template[01]/ && $1 != "|" {print $1}') )

# first dump entire postgres database, including pg_shadow etc.
$DUMPALL --column-inserts | gzip -9 > "$DIR/db.out.gz"

# next dump globals (roles and tablespaces) only
$DUMPALL --globals-only | gzip -9 > "$DIR/globals.gz"

# now loop through each individual database and backup the
# schema and data separately
for database in "${DBS[@]}" ; do
    SCHEMA="$DIR/$database.schema.gz"
    DATA="$DIR/$database.data.gz"
    INSERTS="$DIR/$database.inserts.gz"

    # export data from postgres databases to plain text:

    # dump schema
    $PGDUMP --create --clean --schema-only "$database" |
        gzip -9 > "$SCHEMA"

    # dump data
    $PGDUMP --disable-triggers --data-only "$database" |
        gzip -9 > "$DATA"

    # dump data as column inserts for a last resort backup
    $PGDUMP --disable-triggers --data-only --column-inserts \
        "$database" | gzip -9 > "$INSERTS"

done

# delete backup files older than 30 days
echo deleting old backup files:
find "$BASE_DIR/" -mindepth 1 -type d -mtime +30 -print0 |
    xargs -0r rm -rfv

EDIT :
pg_dumpall -D switch (line 27) is deprecated, now replaced with --column-inserts
https://wiki.postgresql.org/wiki/Deprecated_Features

PostgreSQL lots of large Arrays and Writes

effective_cache_size does not change any memory setting, it's used only for estimation purposes in query planning. Crank up the shared_buffers to about 25% of your available RAM and see if there are any differences in speed.

Also, use EXPLAIN to get the queryplan and see if you need some extra indexes or better configuration.

Best Answer

Related Solutions

Postgresql – What’s the best way to automate backing-up of PostgreSQL databases

PostgreSQL lots of large Arrays and Writes

Related Topic