Skip to main content

System Architecture

This page explains Agenta's system architecture: what each component does and how they connect.

System Overview

Agenta uses a microservices architecture deployed as Docker containers. The diagram below shows how the main layers connect.

┌─────────────────────────────────────┐
│ Users │
│ (Developers, AI Engineers) │
└─────────────────┬───────────────────┘

┌─────────────────▼───────────────────┐
│ Load Balancer / Proxy │
│ (Traefik or Nginx) │
│ Handles SSL and routing │
└─────────────┬───────────────────────┘

┌─────────────────────────────┼─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ API Backend │ │ Services API │
│ (Web UI) │◄────────► (FastAPI) │◄────────► (FastAPI) │
│ │ │ │ │ │
│ • Next.js App │ │ • REST API │ │ • Completion │
│ • Playground │ │ • Core logic │ │ • Chat │
│ • Admin UI │ │ • Persistence │ │ • LLM adapters │
└─────────────────┘ └─────────┬───────┘ └────────┬────────┘
│ │ │
│ ▼ ▼
│ ┌─────────────────────────┐ ┌─────────────────┐
│ │ Worker Pool │ │ runner :8765 │
│ │ (background procs) │ │ (agent runs) │
│ │ • worker-evaluations │ └────────┬────────┘
│ │ • worker-tracing │ │
│ │ • worker-webhooks │ │
│ │ • worker-events │ │
│ │ • worker-records │ │
│ │ • worker-interactions │ │
│ │ • worker-triggers │ │
│ │ • cron │ │
│ └──────────────┬──────────┘ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ PostgreSQL │ │ Redis │ │ SuperTokens │ │seaweedfs │ │
│ │ │ │ │ │ │ │ :8333 │ │
│ │ • Core DB │ │ • Task queues │ │ • Auth │ │(bundled │ │
│ │ • Tracing DB │ │ • Streams │ │ • Sessions │ │or ext S3)│ │
│ │ • Auth DB │ │ • Caching │ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ └──────────────┘ └──────────┘ │
└──────────────────────────────────────────────────────────────────────────┘

Frontend Components

Web UI (NextJS Application)

  • Technology: React, TypeScript, Next.js
  • Port: 3000 (internal)
  • Purpose: Primary user interface for Agenta platform

Key Responsibilities:

  • User Interface: Provides intuitive web interface for application management
  • Playground: Interactive environment for testing and evaluating LLM applications
  • Evaluation Dashboard: Visualizations and metrics for application performance
  • Application Management: Create, configure, and deploy AI applications
  • User Authentication: Login, registration, and session management

Backend Components

API Service (FastAPI)

  • Technology: Python, FastAPI, SQLAlchemy
  • Port: 8000 (internal)
  • Purpose: Core business logic and API endpoints

Key Responsibilities:

  • REST API: Provides RESTful endpoints for frontend and external integrations
  • Business Logic: Implements core platform functionality
  • Data Management: Handles CRUD operations for applications, evaluations, experiments, etc
  • Authentication: Integrates with SuperTokens for user authentication
  • Application Orchestration: Manages application lifecycle and deployment
  • Evaluation Management: Coordinates evaluation runs and result collection

Worker Services (TaskIQ + Async Consumers)

  • Technology: Python workers, TaskIQ, asyncio consumers, Redis, PostgreSQL
  • Purpose: Background processing for evaluations, tracing, events, and webhooks

Key Responsibilities:

  • Evaluation Execution: worker-evaluations runs asynchronous evaluation workloads
  • Tracing Ingestion: worker-tracing consumes OTLP tracing pipelines
  • Webhook Delivery: worker-webhooks dispatches outbound webhook notifications
  • Event Processing: worker-events processes internal event streams
  • Session Records: worker-records persists agent session records from the streams:records Redis stream
  • Interaction Dispatch: worker-interactions consumes the queues:interactions queue and dispatches async session interactions
  • Trigger Processing: worker-triggers processes trigger events for automated workflow execution

TaskIQ Integration:

  • Broker: Uses Redis streams for queueing and task distribution
  • Task Registration: Evaluation tasks are registered at worker startup
  • Execution: Workers consume Redis-backed jobs and process them asynchronously

Agent Runner

  • Technology: Node.js TypeScript sidecar
  • Port: 8765 (internal)
  • Purpose: Executes agent workflows on behalf of the Services API

The runner receives /run requests from the Services API (routed via AGENTA_RUNNER_URL) and starts harness processes (Pi, Claude Code, or other supported adapters) in local or remote sandboxes. It mounts durable working directories from the store into each sandbox and relays server-side tools back to the Services API without exposing the full stack environment to the harness.

Sandbox matrix:

  • local — in-process on the runner host; the default for compose and Kubernetes deployments.
  • daytona — a remote Daytona cloud sandbox; requires SANDBOX_AGENT_PROVIDER=daytona on the runner.

See Deploy the agent runner.

Services Backend

Services API (FastAPI)

  • Technology: Python, FastAPI
  • Port: 8080 (internal)
  • Purpose: LLM-facing endpoints and service-layer APIs exposed under /services/*

Key Responsibilities:

  • LLM Integration: Connects to various LLM providers (OpenAI, Anthropic, etc.)
  • Prompt Processing: Handles prompt templates and variable substitution
  • Response Generation: Manages LLM API calls and response handling
  • Provider Abstraction: Unified interface across different LLM providers
  • Error Handling: Robust error handling for LLM API failures
  • Endpoint Groups: Includes /services/completion/* and /services/chat/*

Infrastructure Services

PostgreSQL (Database)

  • Technology: PostgreSQL 17
  • Port: 5432
  • Purpose: Primary data storage

Databases:

  • Core Database: Application data, Datasets, Evaluations, Users & Profiles, etc.
  • Tracing Database: Execution traces and performance metrics
  • SuperTokens Database: Authentication and user management data

Redis (Task Queue, Caching & Sessions)

  • Technology: Redis
  • Ports: 6379 (volatile), 6381 (durable)
  • Purpose: Task queue, caching, pub/sub, streams

Use Cases:

  • Task Queue: TaskIQ broker for background job distribution and processing
  • Application Caching: Frequently accessed data
  • Session Storage: User sessions and temporary data
  • Task Results: TaskIQ task results and status
  • Real-time Data: Live updates and notifications
  • Rate Limiting: API rate limit counters

SuperTokens (Authentication)

  • Technology: SuperTokens
  • Port: 3567
  • Purpose: Authentication and user management

Features:

  • User Authentication: Login/logout, password management
  • Session Management: Secure session handling with JWT
  • OAuth Integration: Google, and GitHub
  • User Management: User registration, profile management

Durable Store (SeaweedFS / S3)

  • Technology: SeaweedFS (bundled) or any S3-compatible store (AWS S3, Cloudflare R2, MinIO)
  • Port: 8333 (bundled SeaweedFS)
  • Purpose: S3-compatible object store backing durable agent workspaces

Files written during an agent run are stored here and remounted automatically on the next turn, so agent workspaces survive sandbox teardown.

The store.seaweedfs.enabled Helm toggle controls whether the chart bundles a SeaweedFS StatefulSet or points store.endpointUrl at an external store. This mirrors the postgresql.enabled pattern. The endpoint URL is always explicit; a remote S3-compatible store (AWS, MinIO) must set it.

Per-deployment default:

  • Dev compose: SeaweedFS container bundled.
  • Railway: SeaweedFS service and volume (publicly reachable, no tunnel needed).
  • Kubernetes (gh self-host): no bundled SeaweedFS; supply external S3 credentials via store.* values.
  • Kubernetes (operator choice): enable via store.seaweedfs.enabled=true.
  • Live / private cloud: external AWS S3 (store.seaweedfs.enabled=false).

See the Store configuration reference.

Service Dependencies

Frontend Dependencies

Web UI depends on:
├── API Service (primary backend)
├── Services API (playground and model calls)
└── Authentication (SuperTokens via API)

Backend Dependencies

API Service depends on:
├── PostgreSQL (data persistence)
├── Redis (task queue, caching, sessions)
├── SuperTokens (authentication)
└── Worker pool (async task execution)

Services API depends on:
├── PostgreSQL (agent and service state)
├── LLM providers (model calls)
└── runner sidecar (agent workflow execution via AGENTA_RUNNER_URL)

Worker Dependencies

Worker pool depends on:
├── Redis (queues and streams)
├── PostgreSQL (state and persistence)
├── API backend (coordination and config)
├── worker-records (streams:records stream → session persistence)
└── Services API / external endpoints (workload-specific processing)