Runtime and Lifecycle Flows
This page expands Section 4 from Architecture Overview.
Each flow describes one trust and latency boundary crossing; use it to reason about failure handling and retry ownership.
Data Flow (Key Paths)
User Flow
- User reaches Ingress NGINX.
- Request is routed to Contestant Portal or Admin Portal.
- Frontend calls Contestant Service APIs.
- Contestant Service validates auth/session and serves challenge, team, and scoreboard operations.
Outcome: user operations stay on the low-latency request path.
Deployment Flow
- Contestant Service or Admin Portal requests challenge start/stop via Deployment Center.
- Deployment Center validates request signature and writes initial cache state.
- Deployment Center publishes job to RabbitMQ deployment_exchange -> deployment_queue.
- Deployment Consumer dequeues, checks workflow concurrency budget, and submits an Argo workflow.
- Argo applies challenge manifests and creates namespace/service/job.
- Deployment Listener observes pod readiness/deletion and finalizes lifecycle state.
Outcome: deployment work is queue-mediated and isolated from frontend latency.
Directly creating workloads from user request paths can introduce contest-time latency spikes and inconsistent lifecycle records.
Runtime Access Flow
- User receives signed challenge token URL.
- User accesses Challenge Gateway with token.
- Gateway verifies token signature and expiry.
- Gateway routes HTTP/TCP traffic to internal challenge service endpoint.
- Gateway enforces token/IP rate limits and connection caps.
Outcome: runtime traffic is token-validated at the gateway boundary.
State Sync Flow
- Deployment Listener watches Kubernetes pod events with sharded workers.
- Event handlers map namespace names back to team/challenge IDs.
- Listener updates Redis deployment state and MariaDB tracking rows.
- On inconsistencies (ghost pod, stuck pod, orphan DB tracking), listener performs cleanup/reconciliation.
Outcome: observed cluster state continuously corrects cache and tracking truth.
Deployment and Execution Model (Argo + Kubernetes)
Build path
- Challenge package is stored on NFS.
- up-challenge-template runs Kaniko to build and push image to Harbor/internal registry.
- Registry credentials are distributed through
global-regcredfor downstream image pulls. - Workflow exit hook posts status back to Deployment Center.
Trade-off: NFS + Kaniko simplifies self-hosted build context handling, but elevated privileges and host networking in the current template increase hardening requirements.
Start runtime path
- Deployment Consumer computes deployment parameters from challenge metadata (CPU/memory,
USE_GVISOR, harden flag, timeout). - start-chal-v2-template applies namespace first, then namespaced resources.
- When
USE_GVISOR=true, the template injectsruntimeClassName: gvisorso challenge pods run under runsc sandbox. - The workflow copies image pull secret into target namespace before workload creation.
Runtime resource model
Each deployment namespace includes:
- Namespace labels for challenge lifecycle metadata.
- ClusterIP service exposing challenge internal port as service port 3333.
- Network policies (deny-all baseline + explicit allows).
- Job for challenge container with resource limits and probe settings.
Lifecycle and TTL
- Queue message expiry and Redis provisional TTL guard against stale deploy intents.
- Challenge job has ttlSecondsAfterFinished for workload object cleanup.
- Argo workflow templates define TTL strategy for workflow object retention.
- Additional cron cleanup removes expired temporary challenge namespaces by label.
Tune queue expiry, cache TTL, and workflow TTL together. Misaligned values can leave stale "running" state or overly aggressive cleanup.