📑 Table des matières

APIs IA : OpenRouter vs appels directs

Self-Hosting 🟡 Intermédiaire ⏱️ 15 min de lecture 📅 2026-02-24

When building an AI-based project, a critical question quickly arises: how to call language models? Directly via official APIs (OpenAI, Anthropic, Google) or through an aggregator like OpenRouter?

This choice has profound implications for reliability, cost, flexibility, and code maintainability. In this article, we compare both approaches in detail, with concrete Python code and a production-ready fallback pattern.


🎯 The Problem: Too Many Providers, Not Enough Standards

The LLM market in 2025 is a zoo. Each provider has:
- Its own endpoint (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com...)
- Its own request/response format
- Its own error handling, rate limits, and authentication
- Its own models with different names and versions

For developers, this means: N integrations to maintain, N API keys to manage, N formats to parse.


🔌 Direct Calls: The Classic Method

How It Works

You create an account with each provider, get an API key, and call their endpoint directly.

Direct OpenAI Call

import httpx
import os

async def call_openai(prompt: str, model: str = "gpt-4o") -> str:
    """Direct call to OpenAI API."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.openai.com/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
                "Content-Type": "application/json",
            },
            json={
                "model": model,
                "messages": [{"role": "user", "content": prompt}],
                "max_tokens": 1000,
                "temperature": 0.7,
            },
            timeout=30.0,
        )
        response.raise_for_status()
        return response.json()["choices"][0]["message"]["content"]

Direct Anthropic Call

async def call_anthropic(prompt: str, model: str = "claude-sonnet-4-20250514") -> str:
    """Direct call to Anthropic API."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.anthropic.com/v1/messages",
            headers={
                "x-api-key": os.getenv("ANTHROPIC_API_KEY"),
                "anthropic-version": "2023-06-01",
                "Content-Type": "application/json",
            },
            json={
                "model": model,
                "max_tokens": 1000,
                "messages": [{"role": "user", "content": prompt}],
            },
            timeout=30.0,
        )
        response.raise_for_status()
        return response.json()["content"][0]["text"]

Direct Google (Gemini) Call

async def call_google(prompt: str, model: str = "gemini-2.0-flash") -> str:
    """Direct call to Google Gemini API."""
    api_key = os.getenv("GOOGLE_API_KEY")
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key}",
            json={
                "contents": [{"parts": [{"text": prompt}]}],
                "generationConfig": {
                    "maxOutputTokens": 1000,
                    "temperature": 0.7,
                },
            },
            timeout=30.0,
        )
        response.raise_for_status()
        return response.json()["candidates"][0]["content"]["parts"][0]["text"]

See the problem? Three different functions, three request formats, three response formats, three authentication systems. And we've only covered 3 providers out of dozens.

Advantages of Direct Calls

Advantage Detail
Minimal latency No intermediary, direct connection
Less dependency Single point of failure per provider
Immediate access New models available at launch
No extra cost Raw provider pricing, no markup
Full control Access to all parameters, even experimental ones

Disadvantages of Direct Calls

Disadvantage Detail
Heavy maintenance N integrations = N times more code to maintain
N API keys One key per provider to manage and secure
N bills One billing account per provider
No native fallback If a provider fails, your code crashes
No unified view Hard to compare costs and usage

🌐 OpenRouter: The Universal Aggregator

How It Works

OpenRouter is an API proxy that exposes a single OpenAI-compatible endpoint. You send your requests to OpenRouter, which routes them to the correct provider.

Your code ---> OpenRouter ---> OpenAI / Anthropic / Google / Mistral / ...

One API key, one format, one endpoint.

Call via OpenRouter

async def call_openrouter(
    prompt: str,
    model: str = "anthropic/claude-sonnet-4-20250514"
) -> str:
    """Call via OpenRouter, one endpoint for all models."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://openrouter.ai/api/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
                "Content-Type": "application/json",
                "HTTP-Referer": "https://mysite.com",
                "X-Title": "My AI App",
            },
            json={
                "model": model,
                "messages": [{"role": "user", "content": prompt}],
                "max_tokens": 1000,
                "temperature": 0.7,
            },
            timeout=60.0,
        )
        response.raise_for_status()
        return response.json()["choices"][0]["message"]["content"]

The same code works for ALL models — just change the model parameter:

# Claude
result = await call_openrouter(prompt, "anthropic/claude-sonnet-4-20250514")

# GPT-4o
result = await call_openrouter(prompt, "openai/gpt-4o")

# Gemini
result = await call_openrouter(prompt, "google/gemini-2.0-flash")

# Mistral
result = await call_openrouter(prompt, "mistralai/mistral-large")

# FREE models
result = await call_openrouter(prompt, "google/gemini-2.0-flash-exp:free")

Full Comparison Table

Criteria Direct Calls OpenRouter
Endpoint 1 per provider 1 unified
API Keys 1 per provider 1 single
Format Variable Unified (OpenAI-compatible)
Latency Minimal +50-100ms
Fallback Self-coded Built-in (auto-routing)
Billing N invoices 1 invoice
Free models Rare Yes, several available
New models Immediate Delay of hours/days
Cost Raw pricing Raw + small markup (~5-15%)
Availability Direct Depends on OpenRouter
Dashboard N dashboards 1 unified

⚡ Real Advantages of OpenRouter

1. Free Models

OpenRouter offers free models, perfect for prototyping and simple tasks:

FREE_MODELS = [
    "google/gemini-2.0-flash-exp:free",
    "meta-llama/llama-3.1-8b-instruct:free",
    "mistralai/mistral-7b-instruct:free",
    "huggingfaceh4/zephyr-7b-beta:free",
]

2. Automatic Fallback

If a provider is down, OpenRouter can automatically switch:

# With the "route" parameter
response = await client.post(
    "https://openrouter.ai/api/v1/chat/completions",
    json={
        "model": "anthropic/claude-sonnet-4-20250514",
        "route": "fallback",  # Auto-switch if primary model is down
        "messages": [{"role": "user", "content": prompt}],
    },
)

3. Unified Billing

One invoice, one dashboard to see:
- How much you spend per model
- Which models are most used
- Real-time consumption

4. Easy Comparison

Testing a new model = changing a string. No need to create an account, get an API key, or integrate a new format.


⚠️ Real Disadvantages of OpenRouter

Let's be honest, it's not all perfect.

1. Additional Latency (+50-100ms)

Every request goes through an intermediary. For real-time chat, this can be noticeable. For batch processing, it's negligible.

Direct call:    You ---- 120ms ----> Anthropic
OpenRouter:      You -- 50ms --> OpenRouter -- 120ms --> Anthropic
                         Total: ~170ms (+50ms)

2. Single Point of Failure

If OpenRouter goes down, all your calls fail. This is the main risk of centralization.

3. Model Availability

Some models aren't available on OpenRouter, or with a delay after release. Very recent or preview models may be missing.

4. Price Markup

OpenRouter adds a markup (typically 5-15%) over direct prices. For large volumes, this can add up.


🏗️ Production Pattern: ModelManager with Fallback

Here's the pattern we use in production. It combines the best of both worlds: OpenRouter as primary, direct calls as fallback.

```python
import httpx
import asyncio
import time
import os
from dataclasses import dataclass, field
from typing import Optional
import logging

logger = logging.getLogger(name)

@dataclass
class ModelConfig:
"""Configuration for a model with its provider."""
name: str
provider: str # "openrouter", "openai", "anthropic", "google"
model_id: str
max_tokens: int = 2000
temperature: float = 0.7
cost_per_1k_input: float = 0.0
cost_per_1k_output: float = 0.0

@dataclass
class RateLimitState:
"""Rate limiting state for a provider."""
is_limited: bool = False
retry_after: float = 0.0
limited_at: float = 0.0
consecutive_errors: int = 0

class ModelManager:
"""Model manager with fallback chain and rate-limit detection."""

def __init__(self):
    self.providers: dict[str, RateLimitState] = {}
    self.fallback_chains: dict[str, list[ModelConfig]] = {}
    self._setup_default_chains()

def _setup_default_chains(self):
    """Configure default fallback chains."""

    # "smart" chain: best models
    self.fallback_chains["smart"] = [
        ModelConfig(
            name="Claude Sonnet (OpenRouter)",
            provider="openrouter",
            model_id="anthropic/claude-sonnet-4-20250514",
            cost_per_1k_input=0.003,
            cost_per_1k_output=0.015,
        ),
        ModelConfig(
            name="Claude Sonnet (Direct)",
            provider="anthropic",
            model_id="claude-sonnet-4-20250514",
            cost_per_1k_input=0.003,
            cost_per_1k_output=0.015,
        ),
        ModelConfig(
            name="GPT-4o (Direct)",
            provider="openai",
            model_id="gpt-4o",
            cost_per_1k_input=0.005,
            cost_per_1k_output=0.015,
        ),
    ]

    # "fast" chain: fast and economical models
    self.fallback_chains["fast"] = [
        ModelConfig(
            name="Gemini Flash (OpenRouter)",
            provider="openrouter",
            model_id="google/gemini-2.0-flash",
        ),
        ModelConfig(
            name="GPT-4o-mini (Direct)",
            provider="openai",
            model_id="gpt-4o-mini",
            cost_per_1k_input=0.00015,
            cost_per_1k_output=0.0006,
        ),
    ]

    # "free" chain: free models only