1.15.1 Metrics

Metrics are essential in understanding how any large system is behaving and performing. Concourse can emit metrics about both the system health itself and about the builds that it is running. Operators can tap into these metrics in order to observe the health of the system.

In the spirit of openness, the metrics from our deployment are public. We consider it a bug to emit anything sensitive or secret into our metrics pipeline.

Configuring Metrics

The web node can be configured to emit metrics on start.

Currently supported metrics emitters are InfluxDB, NewRelic, Prometheus, and Datadog. There is also a dummy emitter that will just spit the metrics out in to the logs at DEBUG level, which can be enabled with the --emit-to-logs flag.

Regardless of your metrics emitter, you can set CONCOURSE_METRICS_BUFFER_SIZE to determine how many metrics emissions are sent at a time. Increasing this number can be helpful if sending metrics is regularly failing (due to rate limiting or network failures) or if latency is particularly high.

There are various flags for different emitters; run concourse web --help and look for "Metric Emitter" to see what's available.

What's emitted?

This reference section lists of all of the metrics that Concourse emits via the Prometheus emitter.

To make this document easy to maintain, Prometheus is used as the "source of truth" - primarily because it has help text built-in, making this list easy to generate. Treat this list as a reference when looking for the equivalent metric names for your emitter of choice.

Total number of Concourse builds aborted.
Total number of Concourse builds errored.
Total number of Concourse builds failed.
Total number of Concourse builds finished.
Status of Latest Completed Build. 0=Success, 1=Failed, 2=Aborted, 3=Errored.
Number of Concourse builds currently running.
Total number of Concourse builds started.
Total number of Concourse builds succeeded.
Number of concurrent requests being served by endpoints that have a specified limit of concurrent requests.
Total number of requests rejected because the server was already serving too many concurrent requests.
Current number of concourse database connections
Total number of database Concourse database queries
Response time in seconds
Total number of Concourse jobs scheduled.
Number of Concourse jobs currently being scheduled.
The size of the checks queue
Total number of checks enqueued
Total number of checks finished
Total number of checks started
Database locks held
Elapsed time waiting for execution
Number of Concourse tasks currently waiting.
Total number of volumes streamed from one worker to the other
Number of containers per worker
Number of workers per state as seen by the database
Number of active tasks per worker
Number of volumes per worker
A summary of the GC invocation durations.
Number of goroutines that currently exist.
Information about the Go environment.
Number of bytes allocated and still in use.
Total number of bytes allocated, even if freed.
Number of bytes used by the profiling bucket hash table.
Total number of frees.
The fraction of this program's available CPU time used by the GC since the program started.
Number of bytes used for garbage collection system metadata.
Number of heap bytes allocated and still in use.
Number of heap bytes waiting to be used.
Number of heap bytes that are in use.
Number of allocated objects.
Number of heap bytes released to OS.
Number of heap bytes obtained from system.
Number of seconds since 1970 of last garbage collection.
Total number of pointer lookups.
Total number of mallocs.
Number of bytes in use by mcache structures.
Number of bytes used for mcache structures obtained from system.
Number of bytes in use by mspan structures.
Number of bytes used for mspan structures obtained from system.
Number of heap bytes when next garbage collection will take place.
Number of bytes used for other system allocations.
Number of bytes in use by the stack allocator.
Number of bytes obtained from system for stack allocator.
Number of bytes obtained from system.
Number of OS threads created.
Total user and system CPU time spent in seconds.
Maximum number of open file descriptors.
Number of open file descriptors.
Resident memory size in bytes.
Start time of the process since unix epoch in seconds.
Virtual memory size in bytes.
Maximum amount of virtual memory available in bytes.
Current number of scrapes being served.
Total number of scrapes by HTTP status code.