Prometheus: scrap all Docker Swarm tasks
We have a server with Docker Swarm initialized. A few services are running on it. We need to scrap metrics to prometheus and we want it to use direct connection to the services without going through a load balancer.
I’ve spent a bunch of time figuring out how to do it, so here is a tutorial just for my future self.
There are several ways to scrap metrics with service discovery:
dns_sd_config
docker_sd_config
dockerswarm_sd_config
The first one doesn’t work well since Docker Swarm doesn’t expose SRV DNS records of services. I have no idea why, it sounds logic to do so. I would expose SRV DNS records.
The second one — docker_sd_config
is less useful than dockerswarm_sd_config
since it scans all the
containers, even the containers that are not a part of swarm service.
dockerswarm_sd_config
has a field called filters
which just doesn’t work at all, I’ve tried
several options such as label=x=y
and label:x=y
and all these combinations. Just doesn’t work.
So, the solution is to use relabel_configs
and use action
field to filter out services that
don’t want to be scrapped.
There are other roles such as services
and nodes
, but we need tasks
since a container is a
task in the Docker Swarm dictionary.
- job_name: 'swarm'
scrape_interval: 1s
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: tasks
relabel_configs:
- source_labels: [__meta_dockerswarm_task_desired_state]
regex: running
action: keep
- source_labels: [__meta_dockerswarm_service_label_metrics_job]
regex: .+
action: keep
- source_labels: [__meta_dockerswarm_service_label_metrics_job]
target_label: job
- action: replace
regex: (.+)
source_labels:
- __meta_dockerswarm_service_label_metrics_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_dockerswarm_service_label_metrics_port
target_label: __address__
- The first relabel config keeps only services that have desired state equal to ‘running’.
- The second one looks for new name of the job, we don’t want all the metrics to have the same name “swarm”.
- The third one uses
metrics-path
label as path to metrics. By default it is/metrics
. - The 4th one joins internal docker IP and
metrics-port
label value.
Note about ansible
Do not use container_labels
to put labels on containers. Just use labels
instead. I don’t know
why, but container_labels
aren’t scrapped at all for some reasons.
Follow me on Twitter: @reconquestio
Comments