Service Architecture
This page expands Section 3 from Architecture Overview.
Read this page by service boundary, not by technology stack. It reflects ownership and failure domains.
Component Breakdown
Entry Layer
Ingress NGINX
- Terminates incoming HTTP(S) traffic.
- Routes organizer, Contestant Portal, API backend, and observability UIs by host.
- Uses cert-manager annotations for certificate automation.
Frontend Layer
Admin Portal
- CTFd-based organizer/admin UI and management backend.
- Handles challenge authoring, event configuration, and admin workflows.
- Reads/writes challenge files via NFS-backed PVCs.
- Calls Deployment Center for runtime operations.
Contestant Portal
- Player-facing frontend (React + Vite).
- Talks to Contestant Service for competition operations.
- Uses Challenge Gateway domain/ports for runtime challenge access.
Core Backend
Contestant Service
Primary competition API for:
- Auth and token session checks.
- Teams, challenge discovery, prerequisites, files, hints, submissions.
- Scoreboard and ticket APIs.
- Challenge lifecycle user actions (start/stop/status) through Deployment Center.
Notable behavior:
- Rate limiting via AspNetCoreRateLimit using Redis.
- Token integrity checks with tokenUuid against DB/cache.
- Redis Lua + lock-based race protection for submissions and deployment quotas.
- Shared-instance mode supported via special team ID handling.
Challenge Gateway is the only intended external entry boundary for challenge runtime traffic.
Deployment Center
Control API for deployment orchestration:
- Handles start/stop/status/log requests.
- Publishes deploy jobs to RabbitMQ exchange deployment_exchange with routing key deploy.
- Persists/reads deployment state in Redis.
- Calls Kubernetes and Argo APIs (status, logs, namespace operations).
- Exposes callback endpoint for workflow status messages.
Deployment Center should remain the single control API for start and stop orchestration so retries, audit, and status transitions stay consistent.
Challenge Gateway
Runtime access gateway for deployed challenge instances:
- HTTP gateway on port 8080 (reverse proxy with token-cookie flow).
- TCP gateway on port 1337 (token-authenticated stream proxy).
- Uses HMAC-signed challenge tokens (PRIVATE_KEY) instead of direct pod exposure.
- Redis-backed rate and connection limiting:
- token + IP request limits.
- per-IP, per-token, and global TCP connection caps.
Async Layer
RabbitMQ
Deploy queue topology:
- Vhost: fctf_deploy.
- Exchange: deployment_exchange (direct).
- Queue: deployment_queue.
- Binding: routing key deploy.
- Queue policy includes x-max-length=300 with reject-publish overflow behavior.
Deployment Consumer
Worker process that:
- Consumes deployment_queue with manual ack/nack semantics.
- Applies prefetch QoS (40) and batch processing.
- Enforces workflow concurrency by querying running Argo workflows.
- Submits start workflow templates to Argo.
- Updates deployment cache state and TTL.
Execution Layer
Argo Workflows
Two primary templates:
- up-challenge-template:
- Builds/pushes challenge images from NFS context using Kaniko.
- Pushes images to Harbor/internal registry and relies on registry pull secrets for runtime workloads.
- Calls Deployment Center callback on exit with workflow status.
- start-chal-v2-template:
- Applies challenge namespace/service/network policy/job manifests from NFS templates.
- Chooses hardened vs plain challenge manifest.
- Uses
USE_GVISORto decide whether to injectruntimeClassName: gvisor.
Kubernetes Challenge Runtime
For each deployment instance:
- Creates a dedicated namespace (derived from team/challenge naming).
- Service is internal ClusterIP (
${CHALLENGE_NAME}-svc). - Challenge workload runs as a Job with TTL cleanup.
- NetworkPolicies enforce default-deny ingress and gateway-only access.
State Reconciliation
Deployment Listener
Watches pod events for label ctf/kind=challenge and reconciles system state:
- Detects pod deletions, restarts, and stuck states.
- Cleans ghost resources (pods/namespaces without valid cache state).
- Updates stopped tracking records when workloads terminate.
- Reconciles orphaned DB entries after watch stream disruptions.
Shared Infrastructure Library
ResourceShared
Shared cross-service implementation includes:
- Redis helper with atomic Lua scripts for deployment quota and lifecycle state.
- Kubernetes service wrapper for workflow status/logs, namespace operations, and pod health checks.
- Token and challenge naming helpers.
- MultiServiceConnector for service-to-service HTTP calls.
Put cross-service consistency logic here only when at least two services must enforce identical behavior.