1.1.2 Running a PostgreSQL node

Concourse uses PostgreSQL for storing all data and coordinating work in a multi-web node installation.

Table of contents:
  1. 1.1.2.1 Prerequisites
  2. 1.1.2.2 Running PostgreSQL
    1. 1.1.2.2.1 Resource utilization

Prerequisites

PostgreSQL 9.5 or above is required, though the latest available version is recommended.

Running PostgreSQL

How this node is managed is up to you; Concourse doesn't actually have much of an opinion on it, it just needs a database.

How to install PostgreSQL is really dependent on your platform. Please refer to your Linux distribution or operating system's documentation.

For the most part, the instruction on Linux should look something like this:

sudo apt install postgresql
sudo su postgres -c "createuser $(whoami)"
sudo su postgres -c "createdb --owner=$(whoami) atc"

This will install PostgreSQL (assuming your distro uses apt), create a user, and create a database that the current UNIX user can access, assuming this same user is going to be running the web node. This is a reasonable default for distros like Ubuntu and Debian which default PostgreSQL to peer auth.

Resource utilization

CPU usage: this is one of the most volatile metrics, and one we try pretty hard to keep down. There will be near-constant database queries running, and while we try to keep them very simple, there is always more work to do. Expect to feed your database with at least a couple cores, ideally four to eight. Monitor this closely as the size of your deployment and the amount of traffic it's handling increases, and scale accordingly.

Memory usage: similar to CPU usage, but not quite as volatile.

Disk usage: pipeline configurations and various bookkeeping metadata for keeping track of jobs, builds, resources, containers, and volumes. In addition, all build logs are stored in the database. This is the primary source of disk usage. To mitigate this, users can configure job.build_logs_to_retain on a job, but currently there is no operator control for this setting. As a result, disk usage on the database can grow arbitrarily large.

Bandwidth usage: well, it's a database, so it most definitely uses the network. Something important to consider here is the number of simultaneous connections that the database server itself will allow. Postgres exposes a max_connections configuration variable, and depending on how many web nodes you are running and the size of their connection pool, you may need to tune these two numbers against each other.

Highly available: up to you. Clustered PostgreSQL is kind of new and probably tricky to deploy, but there are various cloud solutions for this.

Horizontally scalable: I...don't think so?

Outbound traffic:

  • none

Inbound traffic:

  • only ever from the web node