Skip to main content

Prometheus Metrics

TaskDaemon exposes Prometheus metrics for monitoring.

Metrics Endpoint

http://localhost:8080/metrics

Available Metrics

MetricTypeDescription
taskdaemon_tasks_queued_totalCounterTotal tasks queued
taskdaemon_tasks_completed_totalCounterTotal tasks completed successfully
taskdaemon_tasks_failed_totalCounterTotal tasks failed
taskdaemon_task_duration_secondsHistogramTask processing duration
taskdaemon_queue_sizeGaugeCurrent queue size
taskdaemon_active_workersGaugeNumber of active workers

Prometheus Configuration

Create prometheus.yml:
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'taskdaemon'
    static_configs:
      - targets: ['taskdaemon:8080']

Docker Compose Setup

services:
  taskdaemon:
    image: mshelia/taskdaemon
    ports:
      - "8080:8080"
    volumes:
      - ./handlers.toml:/app/handlers.toml
      - /var/run/docker.sock:/var/run/docker.sock

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    depends_on:
      - taskdaemon

Useful Queries

Throughput

# Tasks completed per second
rate(taskdaemon_tasks_completed_total[1m])

# Tasks queued per second
rate(taskdaemon_tasks_queued_total[1m])

Latency

# Average task duration
rate(taskdaemon_task_duration_seconds_sum[5m]) / rate(taskdaemon_task_duration_seconds_count[5m])

# 95th percentile duration
histogram_quantile(0.95, rate(taskdaemon_task_duration_seconds_bucket[5m]))

Error Rate

# Failure rate
rate(taskdaemon_tasks_failed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m])

# Success rate
rate(taskdaemon_tasks_completed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m])

Queue Health

# Current queue depth
taskdaemon_queue_size

# Queue growth rate
deriv(taskdaemon_queue_size[5m])

Alerting Rules

Example Prometheus alerting rules:
groups:
  - name: taskdaemon
    rules:
      - alert: HighQueueDepth
        expr: taskdaemon_queue_size > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "TaskDaemon queue depth is high"

      - alert: HighErrorRate
        expr: rate(taskdaemon_tasks_failed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "TaskDaemon error rate above 10%"

      - alert: SlowTasks
        expr: histogram_quantile(0.95, rate(taskdaemon_task_duration_seconds_bucket[5m])) > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "95th percentile task duration above 30s"