1.2.4 Running a web
node
The web
node is responsible for running the web UI, API, and as well as performing all pipeline scheduling. It's basically the brain of Concourse.
- 1.2.4.1 Prerequisites
-
1.2.4.2
Running
concourse web
- 1.2.4.2.1 Resource utilization
-
1.2.4.3
Operating a
web
node-
1.2.4.3.1
Scaling
- 1.2.4.3.1.1 Database connection pooling
- 1.2.4.3.2 Reloading worker authorized key
- 1.2.4.3.3 Restarting & Upgrading
- 1.2.4.3.4 Downgrading
-
1.2.4.3.1
Scaling
-
1.2.4.4
Configuring the
web
node- 1.2.4.4.1 Giving your cluster a name
- 1.2.4.4.2 Configuring ingress traffic
- 1.2.4.4.3 TLS via Let's Encrypt
- 1.2.4.4.4 Build log retention
- 1.2.4.4.5 Enabling audit logs
- 1.2.4.4.6 Configuring defaults for resource types
Prerequisites
Nothing special - the web
node is a pretty simple Go application that can be run like a 12-factor app.
Running concourse web
The concourse
CLI can run as a web
node via the web
subcommand.
Before running it, let's configure a local user so we can log in:
CONCOURSE_ADD_LOCAL_USER=myuser:mypass
CONCOURSE_MAIN_TEAM_LOCAL_USER=myuser
This will configure a single user, myuser
, with the password mypass
. You'll probably want to change those to sensible values, and later you may want to configure a proper auth provider - check out Auth & Teams whenever you're ready.
Next, you'll need to configure the session signing key, the SSH key for the worker gateway, and the authorized worker key. Check Generating Keys to learn what these are and how they are created.
CONCOURSE_SESSION_SIGNING_KEY=path/to/session_signing_key
CONCOURSE_TSA_HOST_KEY=path/to/tsa_host_key
CONCOURSE_TSA_AUTHORIZED_KEYS=path/to/authorized_worker_keys
Finally, web
needs to know how to reach your Postgres database. This can be set like so:
CONCOURSE_POSTGRES_HOST=127.0.0.1 # default
CONCOURSE_POSTGRES_PORT=5432 # default
CONCOURSE_POSTGRES_DATABASE=atc # default
CONCOURSE_POSTGRES_USER=my-user
CONCOURSE_POSTGRES_PASSWORD=my-password
If you're running PostgreSQL locally, you can probably just point it to the socket and rely on the peer
auth:
CONCOURSE_POSTGRES_SOCKET=/var/run/postgresql
Now that everything's set, run:
concourse web
All logs will be emitted to stdout
, with any panics or lower-level errors being emitted to stderr
.
Resource utilization
CPU usage: peaks during pipeline scheduling, primarily when scheduling Jobs. Mitigated by adding more web
nodes. In this regard, web
nodes can be considered compute-heavy more than anything else at large scale.
Memory usage: not very well classified at the moment as it's not generally a concern. Give it a few gigabytes and keep an eye on it.
Disk usage: none
Bandwidth usage: aside from handling external traffic, the web
node will at times have to stream bits out from one worker and into another while executing Steps.
Highly available: yes; web
nodes can all be configured the same (aside from --peer-address
) and placed behind a load balancer. Periodic tasks like garbage-collection will not be duplicated for each node.
Horizontally scalable: yes; they will coordinate workloads using the database, resulting in less work for each node and thus lower CPU usage.
Outbound traffic:
db
on its configured port for persistencedb
on its configured port for locking and coordinating in a multi-web
node deploymentother
web
nodes (possibly itself) on an ephemeral port when a worker is forwarded through the web node's TSA
Inbound traffic:
worker
connects to the TSA on port2222
for registrationworker
downloads inputs from the ATC duringfly execute
via its external URLexternal traffic to the ATC API via the web UI and
fly
CLI
Operating a web
node
The web
nodes themselves are stateless - they don't store anything on disk, and coordinate entirely using the database.
Scaling
The web
node can be scaled up for high availability. They'll also roughly share their scheduling workloads, using the database to synchronize. This is done by just running more web
commands on different machines, and optionally putting them behind a load balancer.
To run a cluster of web
nodes, you'll first need to ensure they're all pointing to the same PostgreSQL server.
Next, you'll need to configure a peer address. This is a DNS or IP address that can be used to reach this web
node from other web
nodes. Typically this uses a private IP, like so:
CONCOURSE_PEER_ADDRESS=10.10.0.1
This address will be used for forwarded worker connections, which listen on the ephemeral port range.
Finally, if all of these nodes are going to be accessed through a load balancer, you'll need to configure the external URL that will be used to reach your Concourse cluster:
CONCOURSE_EXTERNAL_URL=https://ci.example.com
Aside from the peer URL, all configuration must be consistent across all web
nodes in the cluster to ensure consistent results.
Database connection pooling
You may wish to configure the max number of parallel database connections that each node makes. There are two pools to configure: one for serving API requests, and one used for all the backend work such as pipeline scheduling.
CONCOURSE_API_MAX_CONNS=10 # default
CONCOURSE_BACKEND_MAX_CONNS=50 # default
There are some non-configurable connection pools. They take up the following number of connections per pool:
Garbage Collection: 5
Lock: 1
Worker Registration: 1
The sum of these numbers across all web
nodes should not be greater than the maximum number of simultaneous connections your Postgres server will allow. See db
node resource utilization for more information.
For example, if 3 web
nodes are configured with the values shown above then your PostgreSQL server should be configured with a connection limit of at least 201: (10 + 50 + 5 + 1 + 1) * 3 web nodes
.
Reloading worker authorized key
While Running concourse web
, the authorized worker key file, which contains all public keys for the workers, is loaded at startup. During the lifecycle of a web
node new worker
keys might be added or old ones removed. To perform a live reload of this file you can send a SIGHUP
signal to the concourse web
process. The process will remain running and Concourse will reload the authorized worker key file.
Restarting & Upgrading
The web
nodes can be killed and restarted willy-nilly. No draining is necessary; if the web
node was orchestrating a build it will continue where it left off when it comes back, or the build will be picked up by one of the other web
nodes.
To upgrade a web
node, stop its process and start a new one using the newly installed concourse
. Any database migrations will be run automatically on start. If web
nodes are started in parallel, only one will run the migrations.
We don't currently guarantee a lack of funny-business if you're running mixed Concourse versions - database migrations can perform modifications that confuse other web
nodes. So there may be some turbulence during a rolling upgrade, but everything should stabilize once all web
nodes are running the latest version.
If you want more control over when the database migrations happen and know if they were successful you can use the concourse migrate
command. The migrate
command accepts the same CONCOURSE_POSTGRES_*
env vars as the concourse web
command.
Downgrading
If you're stuck in a pinch and need to downgrade from one version of Concourse to another, you can use the concourse migrate
command.
First, grab the desired migration version by running the following:
# make sure this is the *old* Concourse binary
$ concourse migrate --supported-db-version
1551110547
That number (yours will be different) is the expected migration version for that version of Concourse.
Next, run the following with the new Concourse binary:
$ concourse migrate --migrate-db-to-version=1551110547
This will need the same CONCOURSE_POSTGRES_*
configuration described in Running concourse web
.
Once this completes, switch all web
nodes back to the older concourse
binary and you should be good to go.
Configuring the web
node
Giving your cluster a name
If you've got many Concourse clusters that you switch between, you can make it slightly easier to notice which one you're on by giving each cluster a name:
CONCOURSE_CLUSTER_NAME=production
When set, this name will be shown in the top bar when viewing the dashboard.
Configuring ingress traffic
If your web nodes are going to be accessed multiple network layers, you will need to set CONCOURSE_EXTERNAL_URL
to a URL accessible by your Concourse users. If you don't set this property, logging in will incorrectly redirect to its default value of 127.0.0.1
.
If your web node(s) will be behind a load balancer or reverse proxy then you will need to ensure connctions made by fly intercept
are properly handled by upgrading the connection. Here is a sample nginx configuration that upgrades connections made by fly intercept
.
server {
server_name ci.example.com;
add_header Strict-Transport-Security "max-age=31536000" always;
ssl_stapling on;
ssl_stapling_verify on;
# Proxy main concourse traffic
location / {
proxy_pass http://concourse.local:8080/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Protocol $scheme;
proxy_set_header X-Forwarded-Host $http_host;
}
# Proxy fly intercept traffic
location ~ /hijack$ {
proxy_pass http://concourse.local:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Protocol $scheme;
proxy_set_header X-Forwarded-Host $http_host;
# Upgrade connection
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
TLS via Let's Encrypt
Concourse can be configured to automatically acquire a TLS certificate via Let's Encrypt:
# Enable TLS
CONCOURSE_TLS_BIND_PORT=443
# Enable Let's Encrypt
CONCOURSE_ENABLE_LETS_ENCRYPT=true
Concourse's Let's Encrypt integration works by storing the TLS certificate and key in the database, so it is imperative that you enable database encryption as well.
By default, Concourse will reach out to Let's Encrypt's ACME CA directory. An alernative URL can be configured like so:
CONCOURSE_LETS_ENCRYPT_ACME_URL=https://acme.example.com/directory
In order to negotiate the certificate, your web
node must be reachable by the ACME server. There are intentionally no publicly listed IP addresses to whitelist, so this typically means just making your web
node publicly reachable.
Build log retention
Build logs are stored in the DB - if they are not cleanup up every once in a while, the storage usage for build logs will continue to grow as more builds run. While this is usually fine for small Concourse instances, as you scale up, you may run into storage concerns.
To clean up old build logs, you can configure Concourse to periodically scan for builds whose logs should be reaped based on a log retention policy, skipping over any paused pipelines and jobs. When a build's logs are reaped, they are no longer visible in the UI.
Concourse can be configured with a default build log retention policy for all jobs:
CONCOURSE_DEFAULT_BUILD_LOGS_TO_RETAIN=50
CONCOURSE_DEFAULT_DAYS_TO_RETAIN_BUILD_LOGS=14
With these settings, Concource will keep the latest 50 builds for each job. If a job runs more than 50 builds in 14 days, all of those builds will be retained until 14 days after they ran.
Some jobs have differing retention requirements - you can configure build_log_retention_policy
schema on a job-by-job basis.
You can also configure Concourse with maximum values for build log retention policies to prevent jobs from retaining their build logs for too long:
CONCOURSE_MAX_BUILD_LOGS_TO_RETAIN=100
CONCOURSE_MAX_DAYS_TO_RETAIN_BUILD_LOGS=30
With these settings,
is capped at 100, and
is capped at 30.
Enabling audit logs
A very simplistic form of audit logging can be enabled with the following vars:
# Enable auditing for all api requests connected to builds.
CONCOURSE_ENABLE_BUILD_AUDITING=true
# Enable auditing for all api requests connected to containers.
CONCOURSE_ENABLE_CONTAINER_AUDITING=true
# Enable auditing for all api requests connected to jobs.
CONCOURSE_ENABLE_JOB_AUDITING=true
# Enable auditing for all api requests connected to pipelines.
CONCOURSE_ENABLE_PIPELINE_AUDITING=true
# Enable auditing for all api requests connected to resources.
CONCOURSE_ENABLE_RESOURCE_AUDITING=true
# Enable auditing for all api requests connected to system transactions.
CONCOURSE_ENABLE_SYSTEM_AUDITING=true
# Enable auditing for all api requests connected to teams.
CONCOURSE_ENABLE_TEAM_AUDITING=true
# Enable auditing for all api requests connected to workers.
CONCOURSE_ENABLE_WORKER_AUDITING=true
# Enable auditing for all api requests connected to volumes.
CONCOURSE_ENABLE_VOLUME_AUDITING=true
When enabled, API requests will result in an info-level log line like so:
{"timestamp":"2019-05-09T14:41:54.880381537Z","level":"info","source":"atc","message":"atc.audit","data":{"action":"Info","parameters":{},"user":"test"}}
{"timestamp":"2019-05-09T14:42:36.704864093Z","level":"info","source":"atc","message":"atc.audit","data":{"action":"GetPipeline","parameters":{":pipeline_name":["booklit"],":team_name":["main"]},"user":"test"}}
Configuring defaults for resource types
Defaults for the "core" resource types (those that show up under the Concourse org) that comes with Concourse can be set cluster-wide by passing in a configuration file. The format of the file is the name of the resource type followed by an arbitrary configuration.
Documentation for each resource type's configuration is in each implementation's README
.
CONCOURSE_BASE_RESOURCE_TYPE_DEFAULTS=./defaults.yml
For example, a defaults.yml
that configures the entire cluster to use a registry mirror would have:
registry-image:
registry_mirror:
host: https://registry.mirror.example.com