Skip to main content

Runtime and Lifecycle Flows

This page expands Section 4 from Architecture Overview.

Flow interpretation

Each flow describes one trust and latency boundary crossing; use it to reason about failure handling and retry ownership.

Data Flow (Key Paths)

User Flow

  1. User reaches Ingress NGINX.
  2. Request is routed to Contestant Portal or Admin Portal.
  3. Frontend calls Contestant Service APIs.
  4. Contestant Service validates auth/session and serves challenge, team, and scoreboard operations.

Outcome: user operations stay on the low-latency request path.

Deployment Flow

  1. Contestant Service or Admin Portal requests challenge start/stop via Deployment Center.
  2. Deployment Center validates request signature and writes initial cache state.
  3. Deployment Center publishes job to RabbitMQ deployment_exchange -> deployment_queue.
  4. Deployment Consumer dequeues, checks workflow concurrency budget, and submits an Argo workflow.
  5. Argo applies challenge manifests and creates namespace/service/job.
  6. Deployment Listener observes pod readiness/deletion and finalizes lifecycle state.

Outcome: deployment work is queue-mediated and isolated from frontend latency.

Do not short-circuit the queue path

Directly creating workloads from user request paths can introduce contest-time latency spikes and inconsistent lifecycle records.

Runtime Access Flow

  1. User receives signed challenge token URL.
  2. User accesses Challenge Gateway with token.
  3. Gateway verifies token signature and expiry.
  4. Gateway routes HTTP/TCP traffic to internal challenge service endpoint.
  5. Gateway enforces token/IP rate limits and connection caps.

Outcome: runtime traffic is token-validated at the gateway boundary.

State Sync Flow

  1. Deployment Listener watches Kubernetes pod events with sharded workers.
  2. Event handlers map namespace names back to team/challenge IDs.
  3. Listener updates Redis deployment state and MariaDB tracking rows.
  4. On inconsistencies (ghost pod, stuck pod, orphan DB tracking), listener performs cleanup/reconciliation.

Outcome: observed cluster state continuously corrects cache and tracking truth.

Deployment and Execution Model (Argo + Kubernetes)

Build path

  • Challenge package is stored on NFS.
  • up-challenge-template runs Kaniko to build and push image to Harbor/internal registry.
  • Registry credentials are distributed through global-regcred for downstream image pulls.
  • Workflow exit hook posts status back to Deployment Center.

Trade-off: NFS + Kaniko simplifies self-hosted build context handling, but elevated privileges and host networking in the current template increase hardening requirements.

Start runtime path

  • Deployment Consumer computes deployment parameters from challenge metadata (CPU/memory, USE_GVISOR, harden flag, timeout).
  • start-chal-v2-template applies namespace first, then namespaced resources.
  • When USE_GVISOR=true, the template injects runtimeClassName: gvisor so challenge pods run under runsc sandbox.
  • The workflow copies image pull secret into target namespace before workload creation.

Runtime resource model

Each deployment namespace includes:

  • Namespace labels for challenge lifecycle metadata.
  • ClusterIP service exposing challenge internal port as service port 3333.
  • Network policies (deny-all baseline + explicit allows).
  • Job for challenge container with resource limits and probe settings.

Lifecycle and TTL

  • Queue message expiry and Redis provisional TTL guard against stale deploy intents.
  • Challenge job has ttlSecondsAfterFinished for workload object cleanup.
  • Argo workflow templates define TTL strategy for workflow object retention.
  • Additional cron cleanup removes expired temporary challenge namespaces by label.
TTL tuning practice

Tune queue expiry, cache TTL, and workflow TTL together. Misaligned values can leave stale "running" state or overly aggressive cleanup.