Overview

This page is the contributor entry point. It gives a fast system map and links to canonical detail pages.

How to read this section

Use this page as the map, then drill down to the linked pages for implementation-level detail.

Section Map

Section	What it answers	Canonical details
1. Architecture Overview	Why the platform is split this way	This page
2. Infrastructure	What on-prem/k3s baseline the system assumes	Infrastructure
3. Service Architecture	Who owns what behavior across services	Service Architecture
4. Runtime and Lifecycle Flows	How requests/deployments/access move	Runtime and Lifecycle Flows
5. State Reconciliation	How desired and actual state are converged	State Reconciliation
6. Security	Where trust boundaries and controls are enforced	Security
7. Storage and Data	Why each data technology exists in the design	Storage and Data
8. Scalability	Which limits and quotas govern throughput	Scalability
9. Observability and Operations	How to diagnose and operate safely under load	Observability and Operations
10. Design Principles	Why major architectural decisions were chosen	Design Principles

1. Architecture Overview

FCTF high-level architecture

The platform is organized into four primary blocks with clear ownership boundaries.

Block	Responsibility
Management & Business	User-facing entry point. Contestants interact through the Contestant Portal, and requests are handled by backend services. The Admin Portal is based on open-source CTFd and provides management interfaces for Jury, Challenge Writers, and Platform Admins.
Deployment Orchestration	Coordinates challenge build and deployment through Deployment Center and Argo Workflows. Manages challenge lifecycle operations and exposes control APIs such as stop, status check, and log retrieval.
Infrastructure (Kubernetes)	Runs each challenge as an isolated pod and performs expired challenge cleanup via scheduled cron jobs.
Shared Persistence & Caching	Provides shared state infrastructure. MariaDB stores durable data, while Redis supports caching, locking, and temporary lifecycle state management.

2. Infrastructure

Aspect	Current design	Why it matters
Platform	Self-managed on-prem k3s	No cloud-managed dependency assumptions.
Networking baseline	flannel disabled, Traefik disabled, Calico installed explicitly	Predictable CNI behavior and policy control.
Ingress	Ingress NGINX via Helm	Consistent routing for app and runtime access boundary.
Runtime sandbox	Optional RuntimeClass `gvisor` via `USE_GVISOR`	Additional isolation option for untrusted challenge workloads.
Service exposure	ClusterIP + Ingress (primary), NodePort (fallback)	Works in constrained lab DNS/ingress environments.

Key namespaces: app, argo, db, monitoring, registry, cattle-system, challenge (dynamic labeled namespaces), storage.

Canonical details: Infrastructure.

3. Service Architecture

Service boundary	Primary responsibility
Contestant Service	Competition domain APIs and race-safe quota logic
Deployment Center	Deployment intent, orchestration coordination, workflow/pod status surfaces
Challenge Gateway	Only runtime ingress boundary (HTTP/TCP token-based access)
Deployment Consumer + Argo	Async deploy/build execution pipeline
Deployment Listener	Reconcile Redis/MariaDB lifecycle truth against Kubernetes events

Boundary reminder: Challenge Gateway is the only runtime ingress boundary for challenge traffic.

Canonical details: Service Architecture.

4. Runtime and Lifecycle Flows

Flow	Path
User request	ingress -> frontend -> Contestant Service
Deployment	Deployment Center -> RabbitMQ -> Deployment Consumer -> Argo -> Kubernetes
Runtime access	signed token -> Challenge Gateway -> internal challenge service
Stop/cleanup	control intent -> runtime termination -> reconciled `STOPPED` state

Canonical details: Runtime and Lifecycle Flows.

5. State Reconciliation

Concept	Implementation in FCTF
Desired state	Redis lifecycle keys + MariaDB tracking rows
Actual state	Kubernetes namespaces/pods/readiness/restarts
Control loop	desired -> execute -> observe -> reconcile
Drift sources	watch disconnect, stale `resourceVersion`, out-of-band deletes, partial workflow completion
Reconciler	Deployment Listener shards pod events and applies cleanup/state correction

Canonical details: State Reconciliation.

6. Security

Layer	Control
Runtime exposure	Challenge pods are never directly exposed; gateway-only boundary
Access tokens	HMAC-signed token flow for HTTP and TCP challenge access
Network isolation	Role-scoped NetworkPolicies in `app` and challenge namespaces
Data-plane authz	Redis ACL, MariaDB least privilege, scoped RabbitMQ topology
Runtime hardening	Optional gVisor; RBAC scoped by service role

Canonical details: Security.

7. Storage and Data

Store	Role in architecture
MariaDB	Durable business truth and audit history
Redis	Low-latency coordination, quotas, lifecycle state
RabbitMQ	Async buffering and backpressure for deploy intents
NFS	Shared challenge assets and workflow build contexts
Harbor	Private image distribution for challenge runtime namespaces

Canonical details: Storage and Data.

8. Scalability

Control area	Current approach
Queue pressure	Bounded queue length and reject-publish behavior
Worker throughput	Prefetch/batch controls and workflow concurrency gate
Runtime entry load	Gateway connection and rate limits (global, per-IP, per-token)
Fairness	Team-level concurrent and per-challenge limits via Redis
Scaling style	Hybrid: HPA for selected stateless services, manual tuning for worker/reconciler components

Canonical details: Scalability.

9. Observability and Operations

Capability	Stack or surface
Metrics	Prometheus
Visualization	Grafana
Log aggregation	Loki
Runtime diagnostics	Deployment Center workflow/pod/log query endpoints
Cluster operations	Scripted bootstrap/install path for on-prem environments

Canonical details: Observability and Operations.

10. Design Principles

Principle	Practical effect on implementation
Async-first control path	Keeps user-facing APIs responsive under deployment load
Isolation by default	Limits blast radius and cross-team interference
Reconciliation over assumption	Prevents stale runtime state from persisting
Defense in depth	Applies layered controls at gateway/network/data/runtime
On-premise-first operability	Works without managed cloud dependencies
Contributor-first boundaries	Keeps service, flow, and consistency concerns separate

Canonical details: Design Principles.

Contributor shortcut

If you are new to the codebase, read sections in this order: Overview -> Service Architecture -> Runtime and Lifecycle Flows -> State Reconciliation.

Section Map​

1. Architecture Overview​

2. Infrastructure​

3. Service Architecture​

4. Runtime and Lifecycle Flows​

5. State Reconciliation​

6. Security​

7. Storage and Data​

8. Scalability​

9. Observability and Operations​

10. Design Principles​