1.1.3 Running a web node

The web node is responsible for running the web UI, API, and as well as performing all pipeline scheduling. It's basically the brain of Concourse.

Prerequisites

Nothing special - the web node is a pretty simple Go application that can be run like a 12-factor app.

Running

The concourse CLI can run as a web node via the web subcommand.

Before running it, let's configure a local user so we can log in:

CONCOURSE_ADD_LOCAL_USER=myuser:mypass
CONCOURSE_MAIN_TEAM_LOCAL_USER=myuser

This will configure a single user, myuser, with the password mypass. You'll probably want to change those to sensible values, and later you may want to configure a proper auth provider - check out Auth & Teams whenever you're ready.

Next, you'll need to configure the session signing key, the SSH key for the worker gateway, and the authorized worker key. Check Generating Keys to learn what these are and how they are created.

CONCOURSE_SESSION_SIGNING_KEY=path/to/session_signing_key
CONCOURSE_TSA_HOST_KEY=path/to/tsa_host_key
CONCOURSE_TSA_AUTHORIZED_KEYS=path/to/authorized_worker_keys

Finally, web needs to know how to reach your Postgres database. This can be set like so:

CONCOURSE_POSTGRES_HOST=127.0.0.1 # default
CONCOURSE_POSTGRES_PORT=5432      # default
CONCOURSE_POSTGRES_DATABASE=atc   # default
CONCOURSE_POSTGRES_USER=my-user
CONCOURSE_POSTGRES_PASSWORD=my-password

If you're running PostgreSQL locally, you can probably just point it to the socket and rely on the peer auth:

CONCOURSE_POSTGRES_SOCKET=/var/run/postgresql

Now that everything's set, run:

concourse web

All logs will be emitted to stdout, with any panics or lower-level errors being emitted to stderr.

Ingress

If your web nodes are going to be accessed over the network, you will need to set CONCOURSE_EXTERNAL_URL to a URL accessible by your consumers. If you don't set this property, logins will redirect to its default value of 127.0.0.1.

If your instance is available on the public internet, you may wish to prevent the Concourse UI from being nefariously embedded as an iframe by setting CONCOURSE_X_FRAME_OPTIONS to deny (to prevent any iframe embeddings) or sameorigin (to only allow iframe embeddings in pages served from the same subdomain). This protects against clickjacking.

Note: If setting the value to allow-from, please note that not all browsers support this value and when not supported, the header is ignored by the browser.

Properties

CPU usage: peaks during pipeline scheduling, primarily when scheduling jobs. Mitigated by adding more web nodes. In this regard, web nodes can be considered compute-heavy more than anything else at large scale.

Memory usage: not very well classified at the moment as it's not generally a concern. Give it a few gigabytes and keep an eye on it.

Disk usage: none

Bandwidth usage: aside from handling external traffic, the web node will at times have to stream bits out from one worker and into another while executing Steps.

Highly available: yes; web nodes can all be configured the same (aside from --peer-address) and placed behind a load balancer. Periodic tasks like garbage-collection will not be duplicated for each node.

Horizontally scalable: yes; they will coordinate workloads using the database, resulting in less work for each node and thus lower CPU usage.

Outbound traffic:

  • db on its configured port for persistence

  • db on its configured port for locking and coordinating in a multi-web node deployment

  • directly-registered worker nodes on ports 7777, 7788, and 7799 for checking resources, executing builds, and performing garbage-collection

  • other web nodes (possibly itself) on an ephemeral port when a worker is forwarded through the web node's TSA

Inbound traffic:

  • worker connects to the TSA on port 2222 for registration

  • worker downloads inputs from the ATC during fly execute via its external URL

  • external traffic to the ATC API via the web UI and fly CLI

Operation

The web nodes themselves are stateless - they don't store anything on disk, and coordinate entirely using the database.

Scaling

The web node can be scaled up for high availability. They'll also roughly share their scheduling workloads, using the database to synchronize. This is done by just running more web commands on different machines, and optionally putting them behind a load balancer.

To run a cluster of web nodes, you'll first need to ensure they're all pointing to the same PostgreSQL server.

Next, you'll need to configure a peer URL. This is a URL that can be used to reach this web node's web server from other web nodes. Typically this uses a private IP, like so:

CONCOURSE_PEER_URL=http://10.10.0.1:8080

Finally, if all of these nodes are going to be accessed through a load balancer, you'll need to configure the external URL that will be used to reach your Concourse cluster:

CONCOURSE_EXTERNAL_URL=https://ci.example.com

Aside from the peer URL, all configuration must be consistent across all web nodes in the cluster to ensure consistent results.

Restarting & Upgrading

The web nodes can be killed and restarted willy-nilly. No draining is necessary; if the web node was orchestrating a build it will just continue where it left off when it comes back or, or the build will be picked up by one of the other web nodes.

To upgrade a web node, stop its process and start a new one using the newly installed concourse. Any migrations will be run automatically on start. If web nodes are started in parallel, only one will run the migrations.

Note that we don't currently guarantee a lack of funny-business if you're running mixed Concourse versions - database migrations can perform modifications that confuse other web nodes. So there may be some turbulence during a rolling upgrade, but everything should stabilize once all web nodes are running the latest version.

Downgrading

If you're stuck in a pinch and need to downgrade from one version of Concourse to another, you can use the concourse migrate command.

Note: support for down migrations is a fairly recent addition to Concourse; it is not supported for downgrading to v3.6.0 and below.

First, grab the desired migration version by running the following:

# make sure this is the *old* Concourse binary
$ concourse migrate --supported-db-version
1551110547

That number (yours will be different) is the expected migration version for that version of Concourse.

Next, run the following with the new Concourse binary:

$ concourse migrate --migrate-db-to-version=1551110547

This will need the same CONCOURSE_POSTGRES_* configuration described in Running.

Once this completes, switch all web nodes back to the older concourse binary and you should be good to go.

Configuration

Giving your cluster a name

If you've got many Concourse clusters that you switch between, you can make it slightly easier to notice which one you're on by giving each cluster a name:

CONCOURSE_CLUSTER_NAME=production

When set, this name will be shown in the top bar when viewing the dashboard.

Enabling audit logs

A very simplistic form of audit logging can be enabled with the following vars:

# Enable auditing for all api requests connected to builds.
CONCOURSE_ENABLE_BUILD_AUDITING=true

# Enable auditing for all api requests connected to containers.
CONCOURSE_ENABLE_CONTAINER_AUDITING=true

# Enable auditing for all api requests connected to jobs.
CONCOURSE_ENABLE_JOB_AUDITING=true

# Enable auditing for all api requests connected to pipelines.
CONCOURSE_ENABLE_PIPELINE_AUDITING=true

# Enable auditing for all api requests connected to resources.
CONCOURSE_ENABLE_RESOURCE_AUDITING=true

# Enable auditing for all api requests connected to system transactions.
CONCOURSE_ENABLE_SYSTEM_AUDITING=true

# Enable auditing for all api requests connected to teams.
CONCOURSE_ENABLE_TEAM_AUDITING=true

# Enable auditing for all api requests connected to workers.
CONCOURSE_ENABLE_WORKER_AUDITING=true

# Enable auditing for all api requests connected to volumes.
CONCOURSE_ENABLE_VOLUME_AUDITING=true

When enabled, API requests will result in an info-level log line like so:

{"timestamp":"2019-05-09T14:41:54.880381537Z","level":"info","source":"atc","message":"atc.audit","data":{"action":"Info","parameters":{},"user":"test"}}
{"timestamp":"2019-05-09T14:42:36.704864093Z","level":"info","source":"atc","message":"atc.audit","data":{"action":"GetPipeline","parameters":{":pipeline_name":["booklit"],":team_name":["main"]},"user":"test"}}