Postgresql – PG pool failover leads to two independent masters with network failure

failoverpostgresql

I've got two servers set up with PG Pool to create a HA setup for a webapp.

PGPool and postgres run on both servers, using streaming replication from server 1 to server 2. The webapp on each machine connects to PgPool which then sends the request to the current master. It's set up to automatically failover should a database connection be interrupted, which runs a custom failover script to demote server 1 to slave and promote server 2 to master.

What happened this morning is that for 2 minutes the network went down, which means neither PGPool instance could talk to each other – so each PGPool thought the other machine was down.

Server 1 – Continued on as master, disconnecting server 2
Server 2 – Initated failover, disconnecting server 1 and making itself the master

Since the network was down the failover command couldn't get through to server 1 to make it the slave, and visa versa. So when the network came back up 2 minutes later what I had was two servers who both thought they were the master.

PgPool doesn't seem to have an automatic failback command, which could be used to force server 1 to become the master again when the network reconnects, which is the only real solution I can think of.

My question is how am I supposed to deal with this situation? Is this even the correct architecture for this setup? Surely this is a common scenario, I can't get my head around how this kind of this could be fixed.

Edit:
Would it be advisable to have pgpool run under a virtual ip under linux-ha ? That could solve things, and I already have that up and running for the public IP – that way only one pgpool instance gets accessed by either machine.

Best Answer

First, I think that pgpool2 does have a failback command, but this wouldn't help you much in that case. The problem is that chaos will result if both machines think they are the master. What's more, here you had a simple case: network went down. What if the network is partitioned? I.e., both machines are connected, but they somehow lose connectivity to each other. In that case, both machines will become the master, and they will serve different clients, and you will have a forked database. It's a rarer case, but are you certain it's so unlikely that you are prepared to risk the resulting chaos?

An alternative would be this:

                                    +- master db
                                    |
                ------ pgpool ------+
                                    |
                                    +- hot standby

In that case, however, you have a single point of failure, pgpool, which you probably don't want. I know only two ways to address this problem. The easiest is to only promote a standby to master manually, and this is applicable with your architecture. Your applications will need to go to read-only mode until human intervention.

The second way is to have quorums. One architecture that could work is this:

                +--- pgpool standing by -+    +- master db
                |                        |    |
   failover ip -+--- active pgpool      -+----+- hot standby 1
                |                        |    |
                +--- pgpool standing by -+    +- hot standby 2
                                              |
                                              +- hot standby 3

                                              (as many standby servers as
                                              you want, so that you have
                                              read-only load balancing)

The three pgpools are running on three different machines, each with its own IP address, but they also provide an additional failover IP address, taken only by the active machine, and it is the one used by clients. If the active pgpool fails, a standby pgpool takes it over. This can be accomplished with heartbeat.

In order to promote a hot standby to master, a quorum of pgpools (i.e. at least two of the three) must decide so; and they will implement the decision only after a delay of, say, 10 seconds after they decide. In addition, the active pgpool may not continue to use the existing master db for more than 10 seconds without getting confirmation from at least another pgpool (this is to guard against the case that the two standby pgpools lose their connection to the active pgpool and to the master at the same time, promote a hot standby to master, but the active pgpool continues to use the old master).

Actually the third pgpool need not participate in the failover IP, and just be there in order to help the quorum. In addition, I don't know if pgpool has enough features to do this. Maybe you need another daemon. A more general architecture is this:

              +--- active pgpool      -+          +- master db
              |                        |          | 
 failover ip -+                       -+----------+- hot standby 1
              |                        |          | 
              +--- pgpool standing by -+      +---+- hot standby 2
                                              |   | 
                                              |   +- hot standby 3
                monitoring daemon 1 ---+      |
                                       |      |
                monitoring daemon 2 ---+------+
                                       |
                monitoring daemon 3 ---+

In this case the load balancing done by pgpool is separated from the monitoring and promotion of standby to a master. Note that you can put pgpools, database servers, and monitoring daemons on the same machine, but the two pgpools must be on two different machines and the three monitoring daemons must be on three different machines. Note that I don't know whether a ready-made monitoring daemon with all necessary features exist.

The details can be changed, but I think that if you do automatic standby promotion to master without using a quorum, you are asking for trouble.