Clio  develop
The XRP Ledger API server.
Loading...
Searching...
No Matches
cluster::WriterDecider Class Reference

Decides which node in the cluster should be the writer based on cluster state. More...

#include <WriterDecider.hpp>

Public Member Functions

 WriterDecider (boost::asio::thread_pool &ctx, std::unique_ptr< etl::WriterStateInterface > writerState, std::chrono::steady_clock::duration recoveryTime=kRECOVERY_TIME)
 Constructs a WriterDecider.
void onNewState (ClioNode::CUuid selfId, std::shared_ptr< Backend::ClusterData const > clusterData)
 Handles cluster state changes and decides whether this node should be the writer.

Static Public Attributes

static constexpr std::chrono::steady_clock::duration kRECOVERY_TIME = std::chrono::hours{1}

Detailed Description

Decides which node in the cluster should be the writer based on cluster state.

This class monitors cluster state changes and determines whether the current node should act as the writer to the database.

Election (normal operation)

All non-ReadOnly nodes are sorted by UUID. The first node with etlStarted and cacheIsFull is elected writer. If no fully-ready node exists, the first node with etlStarted is chosen. All others give up writing.

Fallback mode

Fallback is the slower but more reliable mechanism based on database write-conflict detection (a node waits ~10 s of DB silence before writing). The cluster enters fallback whenever any non-ReadOnly node publishes DbRole::Fallback — for example during a rolling upgrade when an old node without cluster-coordination support is present.

Fallback recovery

To avoid the cluster staying in fallback indefinitely, a recovery timer is started when this node enters fallback. After the timer fires the node enters DbRole::FallbackRecovery and coordinates with peers to return to election mode. If any peer is already in FallbackRecovery, the node joins immediately (contagion rule), cancelling its own pending timer.

State machine for onNewState

sees any Fallback node
[election mode] ──────────────────────────────► [Fallback]
(NotWriter / │
Writer) recovery timer fires
▲ (1 hour)
│ OR sees FallbackRecovery
│ node (contagion rule)
│ │
│ ▼
│ no Fallback nodes visible [FallbackRecovery]
└─────────────────────────────────────────────────

Nodes in FallbackRecovery continue the fallback write-race so there is no write availability gap during the coordination phase.

Constructor & Destructor Documentation

◆ WriterDecider()

cluster::WriterDecider::WriterDecider ( boost::asio::thread_pool & ctx,
std::unique_ptr< etl::WriterStateInterface > writerState,
std::chrono::steady_clock::duration recoveryTime = kRECOVERY_TIME )

Constructs a WriterDecider.

Parameters
ctxThread pool for executing asynchronous operations
writerStateWriter state interface for controlling write operations
recoveryTimeHow long to wait in Fallback before attempting recovery (defaults to kRECOVERY_TIME; pass a short duration in tests)

Member Function Documentation

◆ onNewState()

void cluster::WriterDecider::onNewState ( ClioNode::CUuid selfId,
std::shared_ptr< Backend::ClusterData const > clusterData )

Handles cluster state changes and decides whether this node should be the writer.

Spawns an asynchronous task that applies the state machine described in the class documentation. Decisions are based on the clusterData snapshot:

  • If clusterData has no value (communication failure), no action is taken.
  • If self is ReadOnly, writing is given up unconditionally.
  • If self is Fallback and a FallbackRecovery node is visible, the contagion rule applies: this node also enters FallbackRecovery and the recovery timer is cancelled.
  • If self is Fallback and the recovery timer is not running, it is started (handles the case where fallback was triggered externally, e.g. by Monitor).
  • If self is FallbackRecovery and no Fallback nodes are visible, the recovery coordination is complete: writing is given up and the fallback recovery flag is cleared so the node enters election mode on the next cycle.
  • If self is in election mode and any Fallback node is visible, this node switches to Fallback and the recovery timer is started.
  • Otherwise, election proceeds: nodes are sorted by UUID and the first fully-ready (etlStarted && cacheIsFull) non-ReadOnly node is elected writer.
Parameters
selfIdThe UUID of the current node
clusterDataShared pointer to current cluster data; may be empty if communication failed

The documentation for this class was generated from the following files:
  • /__w/clio/clio/src/cluster/WriterDecider.hpp
  • /__w/clio/clio/src/cluster/WriterDecider.cpp