📑 Table des matières

Docker + IA : conteneuriser ses services intelligents

Self-Hosting 🟡 Intermédiaire ⏱️ 15 min de lecture 📅 2026-02-24

Docker + AI: Containerizing Your Intelligent Services

You have a server running an API, a database, a reverse proxy, and maybe a language model. Everything is installed directly on the OS. One day, you update Python and everything breaks. Your API won't start, your dependencies are in conflict, and you spend 4 hours fixing everything.

With Docker, this scenario is a thing of the past. Each service runs in its isolated container, with its own dependencies, its own Python version, and its own environment. Update whatever you want — the other services remain unaffected.

In this guide, we'll containerize concrete AI services: a FastAPI API, an LLM pipeline, and the entire ecosystem around them. Real, clean, and reproducible self-hosting.

🐳 Why Docker for AI

The 4 Pillars

Pillar Without Docker With Docker
Isolation Python 3.11 breaks your Python 3.9 app Each service has its own version
Reproducibility "It works on my machine" Same image = same result everywhere
Scaling Restart the entire server Scale just the service that needs it
Deployment 47 commands to type in order docker compose up -d

Why It's Particularly Important for AI

AI projects have heavy and conflicting dependencies:

# Project A wants:
torch==2.1.0
numpy==1.24.0
transformers==4.35.0

# Project B wants:
torch==1.13.0
numpy==1.21.0
tensorflow==2.14.0

Without Docker, it's a nightmare of virtual environments. With Docker:

# Project A
docker run -d --name project-a project-a:latest

# Project B (completely different versions, no conflicts)
docker run -d --name project-b project-b:latest

Docker vs VM

Aspect Docker VM (VirtualBox, etc.)
Startup Seconds Minutes
Size MBs (image) GBs (full OS)
Performance Near-native Virtualization overhead
Isolation Processes Full OS
AI Use Case ✅ Ideal Overkill

📦 Docker Basics in 5 Minutes

Essential Vocabulary

Image     = The build plan (like a class)
Container = The running instance (like an object)
Dockerfile = The recipe to create an image
Volume    = Persistent storage (survives container restart)
Network   = Virtual network between containers
Compose   = Multi-container orchestration

Survival Commands

# See running containers
docker ps

# See ALL containers (including stopped ones)
docker ps -a

# See downloaded images
docker images

# Logs of a container
docker logs my-container
docker logs -f my-container  # Follow (real-time)

# Enter a container
docker exec -it my-container bash

# Stop / remove
docker stop my-container
docker rm my-container

# Clean up (unused images/containers)
docker system prune -a

Installing Docker

# Ubuntu/Debian
curl -fsSL https://get.docker.com | sh

# Add your user to the docker group (avoids sudo)
sudo usermod -aG docker $USER

# Install Docker Compose (included in Docker Desktop, otherwise :)
sudo apt install docker-compose-plugin

# Verify
docker --version
docker compose version

🏗️ Docker Compose: Orchestrating Multiple Services

Why Compose?

A typical AI project needs:
- An API (FastAPI, Flask)
- A database (SQLite, PostgreSQL)
- A reverse proxy (Nginx, Caddy)
- Maybe a cache (Redis)
- Maybe a worker (Celery, async tasks)

Managing all this with individual docker run commands is unmanageable. Docker Compose defines everything in a single file:

# docker-compose.yml
version: '3.8'

services:
  api:
    build: ./api
    ports:
      - "8000:8000"
    volumes:
      - ./data:/app/data
    environment:
      - DATABASE_URL=sqlite:///data/app.db
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
    depends_on:
      - db
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  db:
    image: postgres:16-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 10s
      timeout: 5s
      retries: 5

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/certs:/etc/nginx/certs:ro
    depends_on:
      - api
    restart: unless-stopped

volumes:
  pgdata:

Essential Compose Commands

# Start all services
docker compose up -d

# See logs of all services
docker compose logs -f

# Logs of a specific service
docker compose logs -f api

# Restart a service
docker compose restart api

# Stop everything
docker compose down

# Stop everything AND remove volumes (WARNING: data loss)
docker compose down -v

# Rebuild after modifying the Dockerfile
docker compose up -d --build

🤖 Example 1: FastAPI + SQLite Containerized API

Project Structure

my-ai-api/
├── docker-compose.yml
├── .env
├── api/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py
│   └── models.py
├── nginx/
│   └── nginx.conf
└── data/
    └── (SQLite DB will be created here)

The API Dockerfile

# api/Dockerfile
FROM python:3.11-slim

# Avoid interactive questions
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the code
COPY . .

# Exposed port
EXPOSE 8000

# Healthcheck
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Run the API
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Dependencies

# api/requirements.txt
fastapi==0.109.0
uvicorn[standard]==0.27.0
sqlalchemy==2.0.25
httpx==0.26.0
pydantic==2.5.3

The FastAPI API

# api/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sqlalchemy import create_engine, Column, Integer, String, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime
import httpx
import os

app = FastAPI(title="My AI API", version="1.0.0")

# Database
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///data/app.db")
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)
Base = declarative_base()

class Query(Base):
    __tablename__ = "queries"
    id = Column(Integer, primary_key=True)
    question = Column(Text, nullable=False)
    answer = Column(Text)
    model = Column(String(100))
    created_at = Column(DateTime, default=datetime.utcnow)

Base.metadata.create_all(engine)

# Schemas
class QuestionRequest(BaseModel):
    question: str
    model: str = "anthropic/claude-sonnet-4"

class QuestionResponse(BaseModel):
    answer: str
    model: str
    query_id: int

@app.get("/health")
async def health():
    return {"status": "healthy", "timestamp": datetime.utcnow().isoformat()}

@app.post("/ask", response_model=QuestionResponse)
async def ask_question(req: QuestionRequest):
    api_key = os.getenv("OPENROUTER_API_KEY")
    if not api_key:
        raise HTTPException(status_code=500, detail="API key not configured")

    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://openrouter.ai/api/v1/chat/completions",
            headers={"Authorization": f"Bearer {api_key}"},
            json={
                "model": req.model,
                "messages": [{"role": "user", "content": req.question}],
                "max_tokens": 1000
            },
            timeout=30.0
        )

    if response.status_code != 200:
        raise HTTPException(status_code=502, detail="LLM API error")

    answer = response.json()["choices"][0]["message"]["content"]

    # Save to database
    db = SessionLocal()
    query = Query(question=req.question, answer=answer, model=req.model)
    db.add(query)
    db.commit()
    query_id = query.id
    db.close()

    return QuestionResponse(answer=answer, model=req.model, query_id=query_id)

@app.get("/queries")
async def list_queries(limit: int = 10):
    db = SessionLocal()
    queries = db.query(Query).order_by(Query.created_at.desc()).limit(limit).all()
    db.close()
    return [{"id": q.id, "question": q.question[:100], "model": q.model, "date": q.created_at.isoformat()} for q in queries]

Nginx Configuration

# nginx/nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream api {
        server api:8000;
    }

    server {
        listen 80;
        server_name _;

        # Request size limit
        client_max_body_size 10M;

        location / {
            proxy_pass http://api;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # Timeouts for AI requests (can be slow)
            proxy_read_timeout 60s;
            proxy_connect_timeout 10s;
        }

        location /health {
            proxy_pass http://api/health;
            access_log off;
        }
    }
}

The .env File

# .env (DO NOT commit this file!)
OPENROUTER_API_KEY=sk-or-v1-your-key-here
DB_PASSWORD=a-strong-password-here

Running Everything

# Create the data directory
mkdir -p data

# Run
docker compose up -d --build

# Verify
docker compose ps

# Test
curl http://localhost/health
curl -X POST http://localhost/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "Explain Docker in one sentence"}'

🧠 Example 2: Containerized LLM Pipeline

Architecture

A typical LLM pipeline:

User Request
    ↓
[Container: API Gateway]
    ↓
[Container: Preprocessing]
  - Text cleaning
  - Intent extraction
    ↓
[Container: LLM Service]
  - Model call
  - Cache management
    ↓
[Container: Postprocessing]
  - Response formatting
  - Logging
    ↓
Response

Docker Compose Multi-Services

# docker-compose.yml
version: '3.8'

services:
  gateway:
    build: ./gateway
    ports:
      - "8080:8080"
    environment:
      - LLM_SERVICE_URL=http://llm:8001
      - REDIS_URL=redis://cache:6379
    depends_on:
      cache:
        condition: service_healthy
      llm:
        condition: service_healthy
    restart: unless-stopped

  llm:
    build: ./llm-service
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - REDIS_URL=redis://cache:6379
      - DEFAULT_MODEL=anthropic/claude-sonnet-4
    depends_on:
      cache:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
      interval: 15s
      timeout: 5s
      retries: 3

  cache:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

  worker:
    build: ./worker
    environment:
      - REDIS_URL=redis://cache:6379
      - DATABASE_URL=sqlite:///data/logs.db
    volumes:
      - ./data:/app/data
    depends_on:
      cache:
        condition: service_healthy
    restart: unless-stopped

volumes:
  redis_data: