FluxMQ
Deployment

Health & Readiness

Health, readiness, and cluster status endpoints for monitoring and orchestration

Health & Readiness

FluxMQ exposes HTTP endpoints for liveness probes, readiness probes, and cluster status queries. These are designed to integrate with Kubernetes, load-balancers, and monitoring systems.

All endpoints are served by the health server when server.health_enabled: true (default). The listen address is controlled by server.health_addr (default :8081).

Endpoints

EndpointMethodPurpose
/healthGETLiveness probe — is the process alive?
/readyGETReadiness probe — can the node accept traffic?
/cluster/statusGETCluster membership and leadership info

GET /health

Returns 200 OK if the process is running. Use this for liveness probes — a failure means the process should be restarted.

{
  "status": "healthy"
}

This endpoint performs no dependency checks. If the HTTP server can respond, the process is alive.


GET /ready

Returns a composite readiness assessment of the node's subsystems. Use this for readiness probes — a failure means the node should be removed from the load-balancer rotation until it recovers.

Response fields

FieldTypeDescription
statusstring"ready" or "not_ready"
modestring"nominal" or "degraded" (present when status is "ready")
detailsstringHuman-readable reason (present when status is "not_ready")
checksobjectPer-component check results

Each entry in checks contains:

FieldTypeDescription
statusstring"up" or "down"
detailsstringOptional detail (e.g. peer reachability count)

Components checked

ComponentFailure →Condition
broker503 not_readyBroker not initialized
storage503 not_readyStorage not initialized, or storage.Ping() returns an error (BadgerDB unreachable, closed, etc.)
cluster503 not_readyCluster enabled but NodeID is empty (still initializing)
cluster200 degradedSome peers are unreachable (node can still serve local traffic)

Nominal (all healthy)

HTTP 200:

{
  "status": "ready",
  "mode": "nominal",
  "checks": {
    "broker": { "status": "up" },
    "storage": { "status": "up" },
    "cluster": { "status": "up" }
  }
}

Degraded (partial peer loss)

HTTP 200 — the node is still operational but cross-node routing is impaired:

{
  "status": "ready",
  "mode": "degraded",
  "checks": {
    "broker": { "status": "up" },
    "storage": { "status": "up" },
    "cluster": { "status": "up", "details": "1/2 peers reachable" }
  }
}

Load-balancers should keep routing traffic to degraded nodes. The node can still serve local clients, accept new connections, and route messages to reachable peers.

Not ready (hard failure)

HTTP 503:

{
  "status": "not_ready",
  "details": "storage unavailable",
  "checks": {
    "broker": { "status": "up" },
    "storage": { "status": "down", "details": "badger: closed" }
  }
}

Single-node mode

When clustering is disabled, the cluster check is omitted:

{
  "status": "ready",
  "mode": "nominal",
  "checks": {
    "broker": { "status": "up" },
    "storage": { "status": "up" }
  }
}

GET /cluster/status

Returns cluster membership information. Useful for dashboards and debugging but not intended as a probe target.

Single-node response

{
  "node_id": "single-node",
  "is_leader": false,
  "cluster_mode": false,
  "sessions": 42
}

Cluster response

{
  "node_id": "node-1",
  "is_leader": true,
  "cluster_mode": true,
  "sessions": 128
}

Kubernetes integration

Pod spec example

livenessProbe:
  httpGet:
    path: /health
    port: 8081
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8081
  initialDelaySeconds: 10
  periodSeconds: 5

Interpreting degraded mode

Kubernetes readiness probes only look at the HTTP status code. Since degraded returns 200, the pod stays in the ready pool. To alert on degraded state, scrape /ready from your monitoring system and check the mode field:

# Example: alert when any FluxMQ node reports degraded for >5m
fluxmq_ready_mode{mode="degraded"} == 1

(This assumes you export the mode as a metric or scrape the JSON endpoint.)


Configuration

server:
  health_enabled: true    # Enable health endpoints (default: true)
  health_addr: ":8081"    # Listen address (default: ":8081")

On this page