Docker Deployment
Docker Engine setup, agent container lifecycle, bollard integration, NFS volume mounting, and systemd service configuration.
Docker Deployment
Docker is the Phase 1 container runtime for AEGIS. The orchestrator communicates with Docker via the bollard Rust library (Docker API over the Unix socket). This page covers production Docker deployment, container lifecycle, and the NFS volume mounting model.
Prerequisites
- Docker Engine 24.0+
- The AEGIS daemon process has read/write access to
/var/run/docker.sock - Agent container images are accessible from the daemon host (either locally present or pullable)
- NFS traffic (TCP port 2049) is routable between agent containers and the daemon host
Docker Socket Access
The AEGIS daemon must be able to connect to the Docker socket. In production, run the daemon as a user in the docker group, or configure a dedicated socket permission:
# Add the aegis service user to the docker group
sudo usermod -aG docker aegis
# Verify
sudo -u aegis docker psNever run the AEGIS daemon as root.
Container Lifecycle
When an execution starts, the orchestrator:
- Pulls the image (respecting
image_pull_policyinaegis-config.yaml). - Creates the container with:
- CPU quota and memory limit from
spec.resources - NFS volume mounts (described below)
- Network configuration from
spec.security.network_policy - Environment variables from
spec.environment - The container UID/GID stored in the
Executionmetadata for UID/GID squashing
- CPU quota and memory limit from
- Starts the container —
bootstrap.pybegins executing. - Monitors the container for the duration of the iteration.
- Stops and removes the container after the iteration completes or times out.
Containers are removed immediately after each iteration. A fresh container is created for each iteration in the 100monkeys loop.
Resource Limits
Manifest resource limits are translated to Docker container constraints:
spec:
resources:
cpu_quota: 1.0 # → Docker --cpus=1.0
memory_bytes: 1073741824 # → Docker --memory=1073741824
timeout_secs: 300timeout_secs is enforced by the ExecutionSupervisor. If the inner loop has not produced a final response within timeout_secs, the container is force-killed and the iteration is failed.
NFS Volume Mounting
Agent containers mount their volumes via the kernel NFS client to the orchestrator's NFS server gateway (port 2049):
# Example Docker API mount configuration produced by AEGIS for a volume named "workspace":
{
"Target": "/workspace",
"Type": "volume",
"VolumeOptions": {
"DriverConfig": {
"Name": "local",
"Options": {
"type": "nfs",
"o": "addr=<orchestrator-host>,nfsvers=3,proto=tcp,soft,timeo=10,nolock",
"device": ":/<tenant_id>/<volume_id>"
}
}
}
}The agent container does not require CAP_SYS_ADMIN or any elevated capabilities for NFS mounts. The kernel NFS client is used (as opposed to FUSE), which runs in user space without special privileges.
Network Reachability of NFS
Agent containers must be able to reach the orchestrator host on TCP port 2049. In single-host deployments using the Docker bridge network, the orchestrator host is typically reachable at 172.17.0.1 (Docker default bridge gateway).
Configure the NFS listen address in aegis-config.yaml:
storage:
nfs_listen_addr: "0.0.0.0:2049"In multi-host deployments, use the orchestrator host's external IP or hostname.
Network Isolation
Each agent container is placed on a user-defined Docker bridge network. Network egress is controlled by the manifest network_policy:
spec:
security:
network_policy:
mode: allow
allowlist:
- pypi.org
- api.github.comThe AEGIS daemon enforces network policy at the SMCP layer (per tool call), not via Docker network rules. In high-security environments, you may additionally configure Docker network rules or firewall rules to enforce the allowlist at the kernel level.
systemd Service
For production deployments, run the AEGIS daemon as a systemd service:
# /etc/systemd/system/aegis.service
[Unit]
Description=AEGIS Orchestrator Daemon
After=network-online.target docker.service
Requires=docker.service
[Service]
User=aegis
Group=docker
WorkingDirectory=/opt/aegis
ExecStart=/usr/local/bin/aegis daemon --config /etc/aegis/config.yaml
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535
# Environment variables for secrets (avoid plaintext in config)
EnvironmentFile=/etc/aegis/env
[Install]
WantedBy=multi-user.target# /etc/aegis/env (chmod 600)
DATABASE_URL=postgresql://aegis:password@localhost:5432/aegis
OPENAI_API_KEY=sk-...
OPENBAO_ROLE_ID=...
OPENBAO_SECRET_ID=...# Enable and start
sudo systemctl enable aegis
sudo systemctl start aegis
# Check status
sudo systemctl status aegis
# Follow logs
sudo journalctl -u aegis -fHealth Checks
The AEGIS daemon exposes health endpoints:
# Liveness (daemon process alive)
curl http://localhost:8080/health/live
# Readiness (daemon ready to accept requests; all dependencies connected)
curl http://localhost:8080/health/readyUse these in load balancer health check configuration or Kubernetes readiness probes.