Skip to main content

Grafana Dashboards

Visualize TaskDaemon metrics with Grafana.

Quick Setup

services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on:
      - prometheus
Access at http://localhost:3001 (admin/admin)

Add Prometheus Data Source

  1. Go to ConfigurationData Sources
  2. Click Add data source
  3. Select Prometheus
  4. Set URL: http://prometheus:9090
  5. Click Save & Test

Dashboard Panels

Tasks Overview

Tasks per Second
rate(taskdaemon_tasks_completed_total[1m])
Visualization: Graph Queue Size
taskdaemon_queue_size
Visualization: Stat or Gauge Active Workers
taskdaemon_active_workers
Visualization: Stat

Performance

Task Duration (95th percentile)
histogram_quantile(0.95, rate(taskdaemon_task_duration_seconds_bucket[5m]))
Visualization: Graph Average Duration
rate(taskdaemon_task_duration_seconds_sum[5m]) / rate(taskdaemon_task_duration_seconds_count[5m])
Visualization: Graph

Health

Success Rate
rate(taskdaemon_tasks_completed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m]) * 100
Visualization: Gauge (0-100%) Error Rate
rate(taskdaemon_tasks_failed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m]) * 100
Visualization: Gauge with thresholds (green < 1%, yellow < 5%, red > 5%)

Totals

Total Tasks Processed
taskdaemon_tasks_completed_total + taskdaemon_tasks_failed_total
Visualization: Stat

Sample Dashboard JSON

Import this dashboard in Grafana:
{
  "title": "TaskDaemon",
  "panels": [
    {
      "title": "Tasks/sec",
      "type": "graph",
      "targets": [
        {"expr": "rate(taskdaemon_tasks_completed_total[1m])", "legendFormat": "Completed"},
        {"expr": "rate(taskdaemon_tasks_failed_total[1m])", "legendFormat": "Failed"}
      ]
    },
    {
      "title": "Queue Size",
      "type": "stat",
      "targets": [{"expr": "taskdaemon_queue_size"}]
    },
    {
      "title": "P95 Latency",
      "type": "graph",
      "targets": [
        {"expr": "histogram_quantile(0.95, rate(taskdaemon_task_duration_seconds_bucket[5m]))"}
      ]
    },
    {
      "title": "Success Rate",
      "type": "gauge",
      "targets": [
        {"expr": "rate(taskdaemon_tasks_completed_total[5m]) / rate(taskdaemon_tasks_queued_total[5m]) * 100"}
      ]
    }
  ]
}

Alerts

Configure Grafana alerts:
  1. Edit a panel
  2. Go to Alert tab
  3. Create alert rule:
    • Condition: WHEN avg() OF query(A) IS ABOVE 1000
    • Evaluate every: 1m
    • For: 5m
Example alerts:
  • Queue size > 1000 for 5 minutes
  • Error rate > 5% for 5 minutes
  • P95 latency > 30s for 5 minutes