Running PostgreSQL on Kubernetes
Recently, there are more use cases to run PostgreSQL on Kubernetes. One of the main reason of this is to leverage Kubernetes‘s scalability and self-healing to run PostgreSQL. The survey by Timescale in 2023 shows that there are some amounts of users who deploy PostgreSQL on Kubernetes. People still uses Helm chart to deploy PostgreSQL on Kubernetes (like us), but more recently they are shifting to use Operator.
Reference: The survey by Timescale in 2023
In this article, we explore various PostgreSQL Operator on Kubernetes.
PostgreSQL Operator
We’ve investigated these 6 operators which are popular on GitHub or GitLab as OSS.
We tested each operator by following aspects:
- High Availability: Failover
- Documentation
- Customization
- Backup
- Monitoring
- Other features (UI or additional features)
Patroni
Before going into each operator, let’s discuss Patroni because three of the operators (Zalando, CrunchyData, Stackgres) use it for their HA implementation.
Patroni supports PostgreSQL cluster HA by managing the leader key stored on the etcd (or Zookeeper etc). It helps the PostgreSQL cluster for automatic failover/startup/shutdown/configuration, etc. Zalando developed this originally.
Patroni is not originally designed for Kubernetes, but it has good synergy with Kubernetes. With Kubernetes, it does not require a dedicated etcd cluster; it simply uses Kubernetes API to manage the cluster state.
While testing, we found it works well for managing the cluster state, and it handles failover smoothly.
CrunchyData documentation has some explanation about how it works on Kubernetes:
Each HA PostgreSQL cluster maintains its availability by using Patroni to manage failover when the primary becomes compromised. Patroni stores the primary’s ID in annotations on a Kubernetes Endpoints object which acts as a lease. The primary must periodically renew the lease to signal that it’s healthy. If the primary misses its deadline, replicas compare their WAL positions to see who has the most up-to-date data. Instances with the latest data try to overwrite the ID on the lease. The first to succeed becomes the new primary, and all others follow the new primary.
Some resources about Patroni:
CrunchyData Postgres-operator
HA
Patroni. Failover process is smooth.
Documentation
Simple and easy to follow. Many examples so you can start using it quickly.
Customization
Good. Everything can be defined in a single manifest.
Backup
pgBackRest is used for logical/WAL backup for database. It works fine with a PVC based approach but not with S3-compatible storage like Ceph S3. It requires configuring pgBackRest parameters, but we had some issues due to the inability to set environment variables or lack of debug logging.
Monitoring
Prometheus format metrics is supported. But it does not automatically create Service or PodMonitor resource. It’s hard to use with the existing Prometheus and Grafana (Discussed in here)
Pooler
PgBouncer is supported. There are some issues with the login user to use for PgBouncer, but there are fewer documentations about how to workaround.
Other features
pgAdmin : You can deploy pgAdmin Web UI, but it does not automatically create a Service.
Concerns
License
CrunchyData has statements about its licensing. The operator itself is open source under Apache License 2.0, but container images have a different license. This is not clear, but we understand this limits using it for production.
Setting environment variables
It does not allow setting environment variables in the resource definition but need to set on the StatefullSet manually (discussed in GitHub issue). This makes it difficult to overwrite S3 configs to work the backup process with S3 compatible storage.
TLS connection for PgBouncer
TLS connection is required to access PgBouncer. This is a little complex to disable, and it is not very clear in the doc about which certificate or key to use (GitHub issue link)
CloudNative PG
HA
It doesn’t use Patroni but its own HA implementation. It uses PostgreSQL’s native physical replication technology (more info in the doc). It also uses the Kubernetes API server to manage the cluster state.
Failover goes slowly by default. Based on the application requirements, you need to adjust .spec.failoverDelay
. This depends on the data protection policy or service availability policy balance. See some discussions in GitHub.
Documentation
Good.
Customization
Easy customization by using Helm chart for both operator and cluster.
Backup
S3 is supported for WAL and Data backup. This works smoothly with S3-compatible storage like Ceph S3 or MinIO.
Monitoring
Supported, it’s easy to configure.
Pooler
PgBouncer is supported.
Other features
Does not use StatefulSet
It does not use StatefulSet but uses a Pod to deploy PostgreSQL cluster. This feature makes this operator unique from others. Here are some explanations about the reasons behind using Pod instead of StatefulSet.
kubectl cnpg plugin
kubectl cnpg
plugin is very useful. It is full of utilities from spinning up a new cluster to running benchmark.
Concerns
Own HA implementation
It does not rely on Patroni but uses its own architecture for HA. You need to read the documentation carefully to understand how it works for failover.
Complex parameters
It is complex in configuring many parameters. For example, the default .spec.switchoverDelay
makes the failover slower and requires proper design and planning based on the application requirements.
Zalando Postgres Operator
HA
Spilo which bundles PostgreSQL and Patroni together is used.
Documentation
Insufficient and hard to find what you are looking for.
Customization
Complex due to a lack of clarity in the documentation.
Backup
S3 is supported. To use S3 compatible storage like Ceph S3, some hack is required to add some env vars. Unfortunately, this is not very clear from the documentation.
Monitoring
No. Need to add a sidecar container manually.
Pooler
PgBouncer is supported.
Other features:
Web UI is useful when deploying without a YAML file. You can check the logs etc. But it comes without authentication or authorization.
Concerns
Operator performance
The operator’s performance is a concern. Some child resources are not cleaned up when deleting parent resources (Looks like it has been fixed recently. GitHub issue and PR). Sometimes, terminating Pod took a long time. The insufficient documentation and lack of enough logging make the debugging a little difficult.
kubegres
HA
Standard Postgres streaming replication. It is built with the native Postgres library only. No custom or 3rd party library. The operator manages failover but it works slowly.
Documentation
Good and simple.
Customization
Not many customizations are available. It’s allowed parameters are limited. It uses the official docker image of PostgreSQL, so easy to customize the container image.
Backup
Supported on the PVC. No S3 support.
Monitoring
No built-in support.
Pooler
No built-in support.
Other features
Noting much.
Concerns
No side-car container support
No side-car supported. This makes it cumbersome to implement Prometheus Exporter or other customizations. Many manual works are required to set up monitoring or connection poolers.
Development community is not active
The development is not as active as others. (The last commit was a few months ago.)
Stackgres
HA
Patroni. The failover works smoothly.
Documentation
Good and easy to follow.
Customization
Very flexible and allows much customization but is more complex than others as there are more features and more resources. (SGInstanceProfile
, SGPostgresConfig
, SGPoolingConfig
, SGOBjectStorage
, SGDistributedLogs
, SGScript
, SGCluster
, etc)
For example. Creating a user or database should be written in the .sql
file on SGScript
resource.
Backup
S3 supported. Works fine with S3-compatible storage too.
Monitoring
Supported. Automatically integrate with existing Prometheus and Grafana. This is very easy to configure.
Pooler
PgBouncer is implemented as a side-car container in each pod. This is different from other operators that deploy PgBouncer outside of each PostgreSQL Pods.
Other features
Support Citus
It supports sharded cluster using Citus.
Fancy Web UI
Fancy Web UI with OIDC login support. This friendly UI helps the developer team to manage PostgreSQL instances by themselves.
Concerns
License
The license is AGPLv3. (This is fine but different from others.)
Complexity
Complex architecture due to many components and features.
Conclusion
In our conclusion, CloudNative PG can be the first option as it covers most of our use cases and is easy to use and configure. Also, there is a lot of community support, comes with good documentation and the development is very active. There is less management overhead because of the kubectl cnpg
plugin.
The second option will be Stackgres if you want to use Patroni, although it is complex. It is more robust and mature compared with other operators.
I personally like the concept of kubegres for its minimalism and simplicity. Many operators try to implement as many features as possible, but it’s unique that they try to keep it very simple and focus on a few critical things.
Management cost compared with Helm chart will be less or the same for us. Using Operator will add another abstraction layer and component you need to manage. It costs some work and time to understand how it works. However, if you are going to deploy and destroy many PostgreSQL clusters, Operator is very useful.