Skip to main content

Overview

This page is the contributor entry point. It gives a fast system map and links to canonical detail pages.

How to read this section

Use this page as the map, then drill down to the linked pages for implementation-level detail.

Section Map

SectionWhat it answersCanonical details
1. Architecture OverviewWhy the platform is split this wayThis page
2. InfrastructureWhat on-prem/k3s baseline the system assumesInfrastructure
3. Service ArchitectureWho owns what behavior across servicesService Architecture
4. Runtime and Lifecycle FlowsHow requests/deployments/access moveRuntime and Lifecycle Flows
5. State ReconciliationHow desired and actual state are convergedState Reconciliation
6. SecurityWhere trust boundaries and controls are enforcedSecurity
7. Storage and DataWhy each data technology exists in the designStorage and Data
8. ScalabilityWhich limits and quotas govern throughputScalability
9. Observability and OperationsHow to diagnose and operate safely under loadObservability and Operations
10. Design PrinciplesWhy major architectural decisions were chosenDesign Principles

1. Architecture Overview

FCTF high-level architecture

The platform is organized into four primary blocks with clear ownership boundaries.

BlockResponsibility
Management & BusinessUser-facing entry point. Contestants interact through the Contestant Portal, and requests are handled by backend services. The Admin Portal is based on open-source CTFd and provides management interfaces for Jury, Challenge Writers, and Platform Admins.
Deployment OrchestrationCoordinates challenge build and deployment through Deployment Center and Argo Workflows. Manages challenge lifecycle operations and exposes control APIs such as stop, status check, and log retrieval.
Infrastructure (Kubernetes)Runs each challenge as an isolated pod and performs expired challenge cleanup via scheduled cron jobs.
Shared Persistence & CachingProvides shared state infrastructure. MariaDB stores durable data, while Redis supports caching, locking, and temporary lifecycle state management.

2. Infrastructure

AspectCurrent designWhy it matters
PlatformSelf-managed on-prem k3sNo cloud-managed dependency assumptions.
Networking baselineflannel disabled, Traefik disabled, Calico installed explicitlyPredictable CNI behavior and policy control.
IngressIngress NGINX via HelmConsistent routing for app and runtime access boundary.
Runtime sandboxOptional RuntimeClass gvisor via USE_GVISORAdditional isolation option for untrusted challenge workloads.
Service exposureClusterIP + Ingress (primary), NodePort (fallback)Works in constrained lab DNS/ingress environments.

Key namespaces: app, argo, db, monitoring, registry, cattle-system, challenge (dynamic labeled namespaces), storage.

Canonical details: Infrastructure.

3. Service Architecture

Service boundaryPrimary responsibility
Contestant ServiceCompetition domain APIs and race-safe quota logic
Deployment CenterDeployment intent, orchestration coordination, workflow/pod status surfaces
Challenge GatewayOnly runtime ingress boundary (HTTP/TCP token-based access)
Deployment Consumer + ArgoAsync deploy/build execution pipeline
Deployment ListenerReconcile Redis/MariaDB lifecycle truth against Kubernetes events

Boundary reminder: Challenge Gateway is the only runtime ingress boundary for challenge traffic.

Canonical details: Service Architecture.

4. Runtime and Lifecycle Flows

FlowPath
User requestingress -> frontend -> Contestant Service
DeploymentDeployment Center -> RabbitMQ -> Deployment Consumer -> Argo -> Kubernetes
Runtime accesssigned token -> Challenge Gateway -> internal challenge service
Stop/cleanupcontrol intent -> runtime termination -> reconciled STOPPED state

Canonical details: Runtime and Lifecycle Flows.

5. State Reconciliation

ConceptImplementation in FCTF
Desired stateRedis lifecycle keys + MariaDB tracking rows
Actual stateKubernetes namespaces/pods/readiness/restarts
Control loopdesired -> execute -> observe -> reconcile
Drift sourceswatch disconnect, stale resourceVersion, out-of-band deletes, partial workflow completion
ReconcilerDeployment Listener shards pod events and applies cleanup/state correction

Canonical details: State Reconciliation.

6. Security

LayerControl
Runtime exposureChallenge pods are never directly exposed; gateway-only boundary
Access tokensHMAC-signed token flow for HTTP and TCP challenge access
Network isolationRole-scoped NetworkPolicies in app and challenge namespaces
Data-plane authzRedis ACL, MariaDB least privilege, scoped RabbitMQ topology
Runtime hardeningOptional gVisor; RBAC scoped by service role

Canonical details: Security.

7. Storage and Data

StoreRole in architecture
MariaDBDurable business truth and audit history
RedisLow-latency coordination, quotas, lifecycle state
RabbitMQAsync buffering and backpressure for deploy intents
NFSShared challenge assets and workflow build contexts
HarborPrivate image distribution for challenge runtime namespaces

Canonical details: Storage and Data.

8. Scalability

Control areaCurrent approach
Queue pressureBounded queue length and reject-publish behavior
Worker throughputPrefetch/batch controls and workflow concurrency gate
Runtime entry loadGateway connection and rate limits (global, per-IP, per-token)
FairnessTeam-level concurrent and per-challenge limits via Redis
Scaling styleHybrid: HPA for selected stateless services, manual tuning for worker/reconciler components

Canonical details: Scalability.

9. Observability and Operations

CapabilityStack or surface
MetricsPrometheus
VisualizationGrafana
Log aggregationLoki
Runtime diagnosticsDeployment Center workflow/pod/log query endpoints
Cluster operationsScripted bootstrap/install path for on-prem environments

Canonical details: Observability and Operations.

10. Design Principles

PrinciplePractical effect on implementation
Async-first control pathKeeps user-facing APIs responsive under deployment load
Isolation by defaultLimits blast radius and cross-team interference
Reconciliation over assumptionPrevents stale runtime state from persisting
Defense in depthApplies layered controls at gateway/network/data/runtime
On-premise-first operabilityWorks without managed cloud dependencies
Contributor-first boundariesKeeps service, flow, and consistency concerns separate

Canonical details: Design Principles.

Contributor shortcut

If you are new to the codebase, read sections in this order: Overview -> Service Architecture -> Runtime and Lifecycle Flows -> State Reconciliation.