1.10.3 Metrics
Metrics are essential in understanding how any large system is behaving and performing. Concourse can emit metrics about both the system health itself and about the builds that it is running. Operators can tap into these metrics in order to observe the health of the system.
In the spirit of openness, the metrics from our deployment are public. We consider it a bug to emit anything sensitive or secret into our metrics pipeline.
Configuring Metrics
The web
node can be configured to emit metrics on start.
Currently supported metrics emitters are InfluxDB, NewRelic, Prometheus, Datadog, and Riemann. There is also a dummy emitter that will just spit the metrics out in to the logs at DEBUG
level, which can be enabled with the --emit-to-logs
flag.
Regardless of your metrics emitter, you can set CONCOURSE_METRICS_BUFFER_SIZE
to determine how many metrics emissions are sent at a time. Increasing this number can be helpful if sending metrics is regularly failing (due to rate limiting or network failures) or if latency is particularly high.
There are various flags for different emitters; run concourse web --help
and look for "Metric Emitter" to see what's available.
What's emitted?
This reference section lists of all of the metrics that Concourse emits. We don't include the warning and critical levels as they will keep changing as we optimise the system. To find those, please refer to the source of truth: the code.
This is the time taken (in milliseconds) to schedule an entire pipeline including the time taken to load the version information from the database and calculate the latest valid versions for each job.
Attributes
pipeline
The pipeline which was being scheduled.
This is the time taken (in milliseconds) to load the version information from the database.
Attributes
pipeline
The pipeline which was being scheduled.
This is the time taken (in milliseconds) to calculate the set of valid input versions when scheduling a job. It is emitted once for each job per pipeline scheduling tick.
Attributes
pipeline
The pipeline which was being scheduled.
job
The job which was being scheduled.
The number of containers that are currently running on your workers.
Attributes
worker
The name of the worker.
The number of volumes that are currently present on your workers.
Attributes
worker
The name of the worker.
This event is emitted when a build starts. Its value is the build ID of the build. However, it is most useful for annotating your metrics with the start and end of different jobs.
Attributes
pipeline
The pipeline which contains the build being started.
job
The job which configured the build being started.
build_name
The name of the build being started. (Remember that build numbers in Concourse are actually names and are strings).
build_id
The ID of the build being started.
This event is emitted when a build ends. Its value is the duration of the build in milliseconds. You can use this metric in conjunction with build started
to annotate your metrics with when builds started and stopped.
Attributes
pipeline
The pipeline which contains the build that finished.
job
The job which configured the build that finished.
build_name
The name of the build that finished. (Remember that build numbers in Concourse are actually names and are strings).
build_id
The ID of the build that finished.
build_status
The resulting status of the build; one of "succeeded", "failed", "errored", or "aborted".
This metric is emitted for each HTTP request to an ATC (both API and web requests). It contains the duration (in milliseconds) for each request and is useful for finding slow requests.
Attributes
route
The route which the HTTP request matched. i.e. /builds/:id
path
The literal path of the HTTP request. i.e. /builds/1234
This metric is emitted each time a worker registers or heartbeats. It contains the number of active task containers on the worker in question. Particularly if max-active-tasks-per-worker
is set, this can be useful for debugging containers stuck in a pending state.
Attributes
worker
The name of the worker registering/heartbeating.
platform
The OS platform of the worker in question (i.e. windows/linux/darwin).