Message board

Sticky messages

The misconception of the etcd quorum when using Patroni and Postgres

JT wrote: 🕐 11-15-25 16:02

Don't be fooled!

 

There is a fundamental part of the Patroni architecture that is often grossly overlooked or misunderstood, the role and sizing of the Distributed Consensus Store. In the context of Patroni, this is typically etcd.

 

Patroni uses etcd to elect the primary, register cluster members, and ensure that only one node believes it is the leader at any given time, a concept known as a quorum. If you come from the mindset that the number of etcd nodes you need is simply based on the number of Postgres nodes divided by two, plus one, you have been profoundly misled.

 

etcd nodes = ( postgres nodes / 2 ) + 1

 

This rule of thumb is a common source of confusion and instability! The size of your etcd cluster is independent of your Postgres node count and is governed only by the need to maintain a reliable quorum for etcd itself.

 

Understanding why and how to size your etcd cluster correctly is essential for true high availability, and you must read the following to learn the proper methodology.

 

Why your Patroni Node Count doesn't Determine your etcd Quaorum

 

The core misunderstanding is failing to distinguish between the Patroni cluster (your data layer) and the etcd cluster (your consensus layer).

 

Patroni/Postgres Nodes (Data):

  • These nodes are the actual database servers. Their count determines how many copies of the data you have and where the Primary can run.

 

etcd Nodes (Consensus):

  • These nodes hold the metadata about the cluster (who the current Primary is, who the members are, etc.). They use an algorithm like Raft to ensure this metadata is consistently agreed upon by a quorum.

 

The availability of your Patroni cluster relies entirely on the availability of its etcd quorum. If the etcd cluster loses its quorum, Patroni cannot safely elect a new Primary or switch roles, even if the underlying Postgres data nodes are healthy.

 

On a side note, this heavy dependency on etcd and the Patroni layer managing Postgres, is why I favor pgPool in some cases.

 

 

The Correct etcd Quorum Sizing Rule

 

The sizing of the etcd cluster is based on the concept of fault tolerance, defined by the number of simultaneous etcd node failures.

Lets take the common misconception of a scenario where you have a Postgres cluster of 3 database servers managed by Patroni. Most likely, you placed the etcd service on each of the Postgres database server. You probably think that you just need 3 etcd nodes. Why not use the Postgres servers to host them. After all, the etcd footprint is fairly light. No big deal.

 

Ceiling of ( 3 / 2 ) + 1 = 3

 

Well, if more than one of your Postgres servers were to go down, you would be in a crisis trying to find out why you cannot reach the last database server out of 3.

 

The fact is, you have to take into account how many etcd node failures you are willing to tolerate in order to do a proper calculation.

 

If you have etcd running on the 3 database servers, and 2 of the database servers go down, you have just lost 2 of your etcd nodes leaving you with just 1. Well, 1 won't cut it for a quorum.

 

To survive 2 failures, you need to have a system where the remaining nodes can still form a majority.

 

The correct formula for determining the number of etcd nodes needed to survive 2 out of 3 etcd node failures is as follow:

 

N = ( 2 * F ) + 1

 

If F = 2 (two failures), then N = (2 * 2) + 1 = 5

 

You need to add enough extra nodes to your quorum so that even when two are taken away, you still have the minimum quorum number left over.

 

Lets break it down.

  • Total etcd nodes needed = 5
  • Quorum needed = 3 (since ceil of 5/2 + 1 = 3)
  • Failures Allowed = 2
  • If 2 nodes fail, 3 are left.
  • The remaining 3 nodes are still a majority of the original 5, so they can keep operating.
  • Lastly, 5 nodes needed minus the original 3 nodes, means you need an extra 2 etcd nodes. The original 3 on each database server, plus two additional stand alone etcd nodes.

 

 

Less
Copy link
Normal messages
Advanced Pgpool-II Training
🕐 10-24-25 11:48
231 Views
Replies
News - PostgreSQL 18 released
🕐 10-20-25 21:55
81 Views
Replies
News - pgAdmin 4 v9.9 Released
🕐 10-20-25 21:53
53 Views
Replies
PostgresSolutions.com launches
🕐 03-11-25 11:42
725 Views
Replies