MONITORING AND RECOVERY
When an oracle drops, a settlement root lags, a queue backs up, or a stress event widens spreads, the only question that matters for a funded trader is whether the cashier still works. This page lists the four monitoring tracks that watch every payout pipeline, the recovery posture for each failure class, the guardian pause authority that can halt deposits or withdrawals within a single transaction, and the published SLA for restoring service. The targets are concrete: a 24-hour payout window after Sumsub-class KYC clears, sub-minute detection on every feed and queue signal, and a fully visible post-incident reconciliation posted to the operations channel.
Four monitoring tracks run continuously. Each has a defined detection threshold, an on-call routing rule, and a documented response posture — degraded venue state always becomes visible before it reaches withdrawal pipelines.
| Operational signal | Detection target | Response posture |
|---|---|---|
| Feed freshness and market posture | Oracle staleness above the per-source freshness budget; cross-source divergence above tolerance; mark drift versus reference exchanges | Affected markets switch to constrained mode (reduced max leverage, widened bands, no new opens) within one block |
| Root publication and liveness | Time since last commitSettlement, queue depth of pending withdraws, and runtime publication lag | Operations page-out at threshold; guardian pause armed if lag exceeds disputeWindowSec without a publication recovery |
| Gateway and auth health | Per-tier auth error rate, cross-tier permission leak attempts, signed-order replay attempts, rate-limit saturation | Surface-level circuit breaker per tier; admission-tier failures never cascade to operator or multi-sig surfaces |
| Treasury and reconciliation state | Operating-wallet vs deep-cold-vault balance drift, fee-sweep accounting variance, insurance-reserve exposure ratio | Daily reconciliation report to the ops multi-sig; any unreconciled drift above $1K blocks the next fee sweep until resolved |
#What the venue watches continuously
Every signal feeds the same on-call pager — a single rotation across the founding team and the engineering lead. There is no quiet hour. Oracle and feed alarms route directly to the trading-services on-call; root and publication alarms route to the runtime on-call with the operations lead in CC; gateway and auth anomalies route to the platform on-call; treasury and reconciliation drift routes to the operations multi-sig signers, because anything that ends in a Base transaction needs the signer who will be asked to co-sign the remediation.
Each signal also feeds the public venue surface. Constrained markets show their state on every order ticket and in the order book header, with the reason (oracle staleness, source divergence, mark drift) labelled. Pending withdrawals show the gating condition (root not yet committed, disputeWindowSec elapsing, manual review, KYC pending) so a funded trader can read why their cashier is paused without contacting support. A degraded venue that still presents itself as normal is treated as a worse failure than the underlying degradation.
Releases follow the same discipline. Runtime and contract changes pass through regression and smoke suites before deploy; contract upgrades pass through an additional security-review window with the 3-of-5 governance multi-sig and a 24-hour proposal-visibility delay. Operational drift — a queue tuning, a feed-source weight change, a permission rotation — is logged in the ops channel with the responsible signer and the rollback path. The protocol is not secured by code alone; it is secured by the discipline with which code is changed, published, and monitored in production.
signal turns unhealthy
-> market posture tightens or gateway breaker trips
-> affected surfaces label their state for users
-> on-call paged, signer set notified for treasury-class events
-> guardian pause armed if degradation crosses safety threshold
-> recovery waits for fresh data + healthy publication + reconciled treasury
-> only the 3-of-5 governance multi-sig can unpause a guardian halt
#How recovery is supposed to work
Recovery is staged, not flipped. Before any protected surface returns to full authority, four conditions must hold simultaneously: oracle freshness has been continuously inside budget for at least the post-recovery soak window, the runtime has committed a settlement root that the contract accepts as in-history, gateway tier health is green across the public, authenticated, signed-order, and operator surfaces, and the operating-wallet to deep-cold-vault reconciliation matches with zero unresolved drift. Any single missing condition keeps the affected surface in constrained mode.
For funded traders the published SLA is the bound that matters. Withdrawals up to $5K outside the leaderboard podium top three target release within 24 hours of Sumsub-class KYC clearing, and within four hours of the gating condition clearing for users already past first-payout review. Withdrawals above $5K or in the podium top three target the same 24-hour window from the point the operations 2-of-3 multi-sig opens the review ticket — the longer tail is review time, not signing time. After any incident that pauses withdrawals, the unpause is accompanied by a published reconciliation showing the cause, the impacted balance state, and the per-account remediation. No payout that was owed before the incident is reduced because of it; insurance reserves backstop any shortfall produced by the recovery.
This is the standard Dexter is operating to. Detection in seconds, posture changes in one block, transparency on every degraded surface, recovery only after independent conditions reconverge, and a 24-hour cashier target that the multi-sig will honor or publish the reason it could not.