[proxy] wiki.postgresql.org← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

Replication, Clustering, and Connection Pooling

Introduction

There are many approaches available to scale PostgreSQL beyond running on a single server. An outline of the terminology and basic technologies involved is at High Availability and Load Balancing. There is a presentation covering some of these solutions.

There is no one-size fits all replication software. You have to understand your requirements and how various approaches fit into that. For example, here are two extremes in the replication problem space:

These are both database replication problems, but the best way to solve them is very different. And as you can see from these examples, replication has a lot of specific terminology that you'll have to understand to figure out what class of solution makes sense for your requirements. A great source for this background is in the Postgres-R Terms and Definitions for Database Replication. The main theoretical topic it doesn't mention is how to resolve conflict resolution in lazy replication cases like the laptop situation, which involves voting and similar schemes.

Features in the Core of PostgreSQL

Historically, the PostgreSQL core team considered replication and clustering technology outside the scope of the main project's focus but this changed in 2008, see the Core Team's statement. Replication is now a significant focus of ongoing PostgreSQL development.

Comparison matrix

This page is being overhauled at Clustering

Program License Maturity Replication Method Sync Connection Pooling Load Balancing Query Partitioning
PgCluster BSD Not production ready Master-Master Synchronous No Yes No
pgpool-I BSD Stable Statement-Based Middleware Synchronous Yes Yes No
Pgpool-II BSD Recent release Statement-Based Middleware Synchronous Yes Yes Yes
slony BSD Stable Primary-Replica Asynchronous No No No
Bucardo BSD Stable Master-Master, Primary-Replica Asynchronous No No No
Londiste BSD Stable Primary-Replica Asynchronous No No No
Mammoth BSD No longer maintained Primary-Replica Asynchronous No No No
rubyrep MIT No longer maintained Master-Master, Primary-Replica Asynchronous No No No
Bi-Directional Replication PostgreSQL (BSD) Recent release Master-Master
(no triggers needed)
Asynchronous No No No
pg_shard LGPL Recent release Statement-based Middleware (as an extension) Synchronous No Yes Yes
pglogical PostgreSQL Recent release Primary-Replica Asynchronous No No No
Postgres-XL PostgreSQL Recent release MPP Postgres, scalable writes & reads Synchronous Yes Yes Yes
Citus AGPL Recent release MPP Postgres, scalable writes & reads Asynchronous or Synchronous Yes Yes Yes

Replication

Aside from the in-core streaming replication, mentioned above...

Inactive projects

Clustering

Connection Pooling and Acceleration

Connection pooling programs let you reduce database-related overhead when it's the sheer number of physical connections dragging performance down. This is particularly important on Windows, where system limitations prevent large number of connections; see "I cannot run with more than about 125 connections at once" in Running & Installing PostgreSQL On Native Windows. It's also vital for web applications where the number of connections can get very large.

Some programs that implement connection pooling are:

Some people also or alternately use memcached in various ways to reduce the work the database handles directly by caching popular data. pgmemcache is a PostgreSQL extension for interfacing with memcached servers.

Cluster management

Other Resources

Sources for more information located but not yet integrated into here.

Articles

Video tutorials

Credits

Sources for the initial information on this page include: