Embedded etcd metadata, gRPC routing, session takeover, and cluster message flow

Clustering

Last Updated: 2026-02-05

FluxMQ clustering is designed around one idea: keep coordination data consistent across nodes, and keep payload data fast and local wherever possible.

This page explains how etcd, the inter-node transport, and the protocol brokers work together to route publishes, take over sessions, and (optionally) replicate durable queue logs.

Two Planes: Metadata vs Data

In clustered mode there are two planes:

Metadata plane (etcd): “who owns what”, “who is subscribed”, “who is consuming”.
Data plane (gRPC transport): the actual routed messages and takeover payloads.

The system stays understandable if you keep this split in mind: etcd does not stream user traffic; it tells nodes where to send it.

etcd Keyspace: What Lives Where

etcd is the source of truth for cluster metadata. FluxMQ maintains local in-memory caches (subscription and session-owner caches) to reduce etcd round trips; caches are kept up-to-date via etcd watches.

Key prefixes:

Prefix	Meaning
`/mqtt/sessions/<client>/owner`	Session ownership (written with a lease so it expires on node death)
`/mqtt/subscriptions/<client>/<filter>`	Subscription registry for cross-node pub/sub routing
`/mqtt/queue-consumers/<queue>/<group>/<consumer>`	Queue consumer registry for cross-node queue delivery
`/mqtt/retained-data/` and `/mqtt/retained-index/`	Hybrid retained store (small payloads replicated; large payloads indexed)
`/mqtt/will-data/` and `/mqtt/will-index/`	Hybrid will store (same strategy as retained)
`/mqtt/leader`	Cluster leader election (used for coordination and visibility)

Session Ownership: Why It Exists

MQTT sessions are stateful (inflight QoS 1/2, offline queue, subscriptions, will). In a cluster, you need a single node to be “the owner” at any time so publishes, acks, and retained/will management don’t split-brain.

Ownership is stored in etcd and written with a lease:

If a node crashes, its lease expires and ownership keys disappear automatically.
Nodes cache ownership locally for fast routing, but can fall back to etcd when needed.

Session Takeover (MQTT): End-to-End Flow

Takeover happens when a client reconnects to a node that is not the current owner.

The goal: move session state from the old owner to the new owner, and guarantee that only one node continues the session.

Important details:

The takeover request uses the gRPC transport, not etcd.
Ownership is updated after the state transfer completes (so the new owner can safely overwrite).
The old owner closes the session as part of preparing the state.

Pub/Sub Routing Across Nodes

For “normal” pub/sub topics, the originating node does two things:

Deliver locally to matching subscriptions.
Forward to remote nodes that own sessions with matching subscriptions.

The routing decision is based on the subscription registry and the session-owner map.

Hybrid Retained and Will Storage

Retained messages and wills need to be available cluster-wide, but replicating large payloads through etcd is expensive.

FluxMQ uses a hybrid strategy (threshold configurable):

Small payloads: store metadata + payload in etcd (replicated to all nodes).
Large payloads: store payload in the owner’s local store; store metadata in etcd; fetch payload on-demand via gRPC.

Durable Queues Across Nodes

Queue consumers are registered cluster-wide so a node receiving a queue publish can find where consumption is happening.

Two distribution styles exist (configured via cluster.raft.distribution_mode):

forward: the node that appends to the queue also delivers (or routes) messages to remote consumers.
replicate: the queue log is replicated (Raft), so each node with consumers can deliver from its local log.

Configuration Entry Points

Cluster basics: Cluster configuration
Replication and tuning: Configuration reference

Clustering

On this page