Skip to main content

State Reconciliation

This page expands Section 5 from Architecture Overview.

Reliability lens

Think of this subsystem as eventual consistency enforcement between desired state in data stores and actual state in Kubernetes.

State Reconciliation Model (Deployment Listener)

Core reliability subsystem: Deployment Listener keeps API-visible runtime state aligned with real cluster state.

Why it exists

Asynchronous deploy systems drift when:

  • watch streams disconnect,
  • pods crash or restart unexpectedly,
  • namespaces are deleted out-of-band,
  • cache and real cluster state diverge.

Deployment Listener converges observed Kubernetes state back to the runtime state used by APIs.

Observed state wins

When request intent and cluster reality diverge, reconciliation should trust observed Kubernetes state to prevent ghost runtime records.

Watch architecture

  • Watches pods across all namespaces filtered by ctf/kind=challenge.
  • Uses Kubernetes resourceVersion streaming with reconnect/resync logic.
  • Handles 410 Gone and stale resourceVersion by forced relist.
  • Uses shard channels by pod UID hash to parallelize processing while preserving per-shard ordering.
  • Default worker count is configurable (CHALLENGE_WATCHER_WORKER_COUNT, default 20).

Event handling semantics

Deleted event

  • Removes deployment reservation from Redis ZSET.
  • Marks status as STOPPED via status callback path.
  • Updates challenge_start_tracking.StoppedAt for matching namespace label.

Pod restart detection

  • Detects pod UID change and updates cached pod_id.
  • Extends cache TTL based on remaining runtime window.

Stuck pod detection

Evaluates pod/container status and reason, including:

  • ImagePullBackOff / ErrImagePull / InvalidImageName.
  • CrashLoopBackOff / OOMKilled with restart threshold.
  • Long ContainerCreating timeout.
  • Running but not ready beyond threshold.

When classified as stuck, listener can delete namespace and cleanup state.

Ghost resource cleanup

If pod exists but corresponding deployment cache is missing:

  • Namespace is treated as ghost.
  • Listener attempts cleanup and removes any residual deployment reservation.

Reconciliation on startup/reconnect

  • On initial list, listener compares active pod namespaces with challenge_start_tracking rows where StoppedAt is null.
  • Rows with missing pods are marked stopped (orphan reconciliation).

Result:

  • Prevents long-lived false "running" records after missed watch events.

Operational outcome

Operational outcome: event-driven best-effort orchestration remains eventually consistent, with less manual recovery during CTF peaks.

Operator signal

A growing mismatch between active namespaces and unstopped tracking rows is an early warning that watch or callback paths need attention.