Postgresql – Multiple Postgres Servers (one writer, multiple reader) with Shared Disk

clusterdatabase-replicationpostgresql

Here's the scenario:

  1. One shard disk (Gluster)
  2. Multiple Postgres servers

Requirements:

  1. Use the shared disk to store the database files
  2. Use a configuration which provide maximum efficiency

Findings so far,

  1. It's possible to use a shared disk to store the data as this document says. But it also says that "Another issue is that the standby server should never access the shared storage while the primary server is running". That means all of servers (except the master one) are left unused which is almost unacceptable for us.

  2. Since we are using shared disk there should be no replication. It's been found on this document that some configurations (Raw and Master/Slave modes) are good enough. But the other problem is that they might cause the above issue.

Problems:

  1. There are a lot of documentation on the web which made me confused about their requirements and features. Is my understanding correct?
  2. If so, is there any possibility to achieve this design (with pgpool or anyother tools)
  3. If so, would you please name the tools and or the keywords so I can find more information.

Note (for those who are interested in closing questions as many as they can)- It happened before to me. Some say I'm looking for opinion based answers. In fact I'm not. What I'm looking is the name of technology or some sort of keywords, no matter what. So that by using them I can search for more information. It sometimes happens that you need to know some keywords for search and finding the information.

Thanks in advance.

Best Answer

It is not possible to run multiple PostgreSQL servers from the same data directory, even if all but one are read-only. Absolutely 100% unsupported. Cannot be done. Give up now.

Somebody might one day add such a feature but it'd involve major changes to PostgreSQL, as Pg relies heavily on shared memory and signals for inter-process synchronization. Also, the shared_buffers contain "dirty" buffers that aren't yet written out; these can be written out lazily because PostgreSQL knows all backends will read from there and only go to disk if the data isn't in shared_buffers.

It's possibly practical to do it with minor changes to PostgreSQL if all the servers are read-only, but I haven't investigated it as it's a pretty uninteresting use-case.

The references to shared storage you've seen are only for failover, not concurrent operation. The manual is quite specific that you need to ensure there's proper fencing in place to prevent concurrent access to the storage by multiple DB servers and that major corruption will result if you don't.

You're going to have to rely on replication or use another DB engine that supports shared storage (and deals with the resulting performance impact).

Separately, though: DBs are often I/O limited. Shared storage doesn't gain you anything if you now have two servers capable of 1000tps instead of one server that can do 2000. Or, given the overheads of synchronisation of a shared storage system w/o a low-latency bus (think Infiniband/Myrinet), more like two servers capable of 200tps each.

Related Topic