When building an AI-based project, a critical question quickly arises: how to call language models? Directly via official APIs (OpenAI, Anthropic, Google) or through an aggregator like OpenRouter?
This choice has profound implications for reliability, cost, flexibility, and code maintainability. In this article, we compare both approaches in detail, with concrete Python code and a production-ready fallback pattern.
🎯 The Problem: Too Many Providers, Not Enough Standards
The LLM market in 2025 is a zoo. Each provider has:
- Its own endpoint (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com...)
- Its own request/response format
- Its own error handling, rate limits, and authentication
- Its own models with different names and versions
For developers, this means: N integrations to maintain, N API keys to manage, N formats to parse.
🔌 Direct Calls: The Classic Method
How It Works
You create an account with each provider, get an API key, and call their endpoint directly.
Direct OpenAI Call
import httpx
import os
async def call_openai(prompt: str, model: str = "gpt-4o") -> str:
"""Direct call to OpenAI API."""
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.openai.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
"Content-Type": "application/json",
},
json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 1000,
"temperature": 0.7,
},
timeout=30.0,
)
response.raise_for_status()
return response.json()["choices"][0]["message"]["content"]
Direct Anthropic Call
async def call_anthropic(prompt: str, model: str = "claude-sonnet-4-20250514") -> str:
"""Direct call to Anthropic API."""
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.anthropic.com/v1/messages",
headers={
"x-api-key": os.getenv("ANTHROPIC_API_KEY"),
"anthropic-version": "2023-06-01",
"Content-Type": "application/json",
},
json={
"model": model,
"max_tokens": 1000,
"messages": [{"role": "user", "content": prompt}],
},
timeout=30.0,
)
response.raise_for_status()
return response.json()["content"][0]["text"]
Direct Google (Gemini) Call
async def call_google(prompt: str, model: str = "gemini-2.0-flash") -> str:
"""Direct call to Google Gemini API."""
api_key = os.getenv("GOOGLE_API_KEY")
async with httpx.AsyncClient() as client:
response = await client.post(
f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key}",
json={
"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": {
"maxOutputTokens": 1000,
"temperature": 0.7,
},
},
timeout=30.0,
)
response.raise_for_status()
return response.json()["candidates"][0]["content"]["parts"][0]["text"]
See the problem? Three different functions, three request formats, three response formats, three authentication systems. And we've only covered 3 providers out of dozens.
Advantages of Direct Calls
| Advantage | Detail |
|---|---|
| Minimal latency | No intermediary, direct connection |
| Less dependency | Single point of failure per provider |
| Immediate access | New models available at launch |
| No extra cost | Raw provider pricing, no markup |
| Full control | Access to all parameters, even experimental ones |
Disadvantages of Direct Calls
| Disadvantage | Detail |
|---|---|
| Heavy maintenance | N integrations = N times more code to maintain |
| N API keys | One key per provider to manage and secure |
| N bills | One billing account per provider |
| No native fallback | If a provider fails, your code crashes |
| No unified view | Hard to compare costs and usage |
🌐 OpenRouter: The Universal Aggregator
How It Works
OpenRouter is an API proxy that exposes a single OpenAI-compatible endpoint. You send your requests to OpenRouter, which routes them to the correct provider.
Your code ---> OpenRouter ---> OpenAI / Anthropic / Google / Mistral / ...
One API key, one format, one endpoint.
Call via OpenRouter
async def call_openrouter(
prompt: str,
model: str = "anthropic/claude-sonnet-4-20250514"
) -> str:
"""Call via OpenRouter, one endpoint for all models."""
async with httpx.AsyncClient() as client:
response = await client.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
"Content-Type": "application/json",
"HTTP-Referer": "https://mysite.com",
"X-Title": "My AI App",
},
json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 1000,
"temperature": 0.7,
},
timeout=60.0,
)
response.raise_for_status()
return response.json()["choices"][0]["message"]["content"]
The same code works for ALL models — just change the model parameter:
# Claude
result = await call_openrouter(prompt, "anthropic/claude-sonnet-4-20250514")
# GPT-4o
result = await call_openrouter(prompt, "openai/gpt-4o")
# Gemini
result = await call_openrouter(prompt, "google/gemini-2.0-flash")
# Mistral
result = await call_openrouter(prompt, "mistralai/mistral-large")
# FREE models
result = await call_openrouter(prompt, "google/gemini-2.0-flash-exp:free")
Full Comparison Table
| Criteria | Direct Calls | OpenRouter |
|---|---|---|
| Endpoint | 1 per provider | 1 unified |
| API Keys | 1 per provider | 1 single |
| Format | Variable | Unified (OpenAI-compatible) |
| Latency | Minimal | +50-100ms |
| Fallback | Self-coded | Built-in (auto-routing) |
| Billing | N invoices | 1 invoice |
| Free models | Rare | Yes, several available |
| New models | Immediate | Delay of hours/days |
| Cost | Raw pricing | Raw + small markup (~5-15%) |
| Availability | Direct | Depends on OpenRouter |
| Dashboard | N dashboards | 1 unified |
⚡ Real Advantages of OpenRouter
1. Free Models
OpenRouter offers free models, perfect for prototyping and simple tasks:
FREE_MODELS = [
"google/gemini-2.0-flash-exp:free",
"meta-llama/llama-3.1-8b-instruct:free",
"mistralai/mistral-7b-instruct:free",
"huggingfaceh4/zephyr-7b-beta:free",
]
2. Automatic Fallback
If a provider is down, OpenRouter can automatically switch:
# With the "route" parameter
response = await client.post(
"https://openrouter.ai/api/v1/chat/completions",
json={
"model": "anthropic/claude-sonnet-4-20250514",
"route": "fallback", # Auto-switch if primary model is down
"messages": [{"role": "user", "content": prompt}],
},
)
3. Unified Billing
One invoice, one dashboard to see:
- How much you spend per model
- Which models are most used
- Real-time consumption
4. Easy Comparison
Testing a new model = changing a string. No need to create an account, get an API key, or integrate a new format.
⚠️ Real Disadvantages of OpenRouter
Let's be honest, it's not all perfect.
1. Additional Latency (+50-100ms)
Every request goes through an intermediary. For real-time chat, this can be noticeable. For batch processing, it's negligible.
Direct call: You ---- 120ms ----> Anthropic
OpenRouter: You -- 50ms --> OpenRouter -- 120ms --> Anthropic
Total: ~170ms (+50ms)
2. Single Point of Failure
If OpenRouter goes down, all your calls fail. This is the main risk of centralization.
3. Model Availability
Some models aren't available on OpenRouter, or with a delay after release. Very recent or preview models may be missing.
4. Price Markup
OpenRouter adds a markup (typically 5-15%) over direct prices. For large volumes, this can add up.
🏗️ Production Pattern: ModelManager with Fallback
Here's the pattern we use in production. It combines the best of both worlds: OpenRouter as primary, direct calls as fallback.
```python
import httpx
import asyncio
import time
import os
from dataclasses import dataclass, field
from typing import Optional
import logging
logger = logging.getLogger(name)
@dataclass
class ModelConfig:
"""Configuration for a model with its provider."""
name: str
provider: str # "openrouter", "openai", "anthropic", "google"
model_id: str
max_tokens: int = 2000
temperature: float = 0.7
cost_per_1k_input: float = 0.0
cost_per_1k_output: float = 0.0
@dataclass
class RateLimitState:
"""Rate limiting state for a provider."""
is_limited: bool = False
retry_after: float = 0.0
limited_at: float = 0.0
consecutive_errors: int = 0
class ModelManager:
"""Model manager with fallback chain and rate-limit detection."""
def __init__(self):
self.providers: dict[str, RateLimitState] = {}
self.fallback_chains: dict[str, list[ModelConfig]] = {}
self._setup_default_chains()
def _setup_default_chains(self):
"""Configure default fallback chains."""
# "smart" chain: best models
self.fallback_chains["smart"] = [
ModelConfig(
name="Claude Sonnet (OpenRouter)",
provider="openrouter",
model_id="anthropic/claude-sonnet-4-20250514",
cost_per_1k_input=0.003,
cost_per_1k_output=0.015,
),
ModelConfig(
name="Claude Sonnet (Direct)",
provider="anthropic",
model_id="claude-sonnet-4-20250514",
cost_per_1k_input=0.003,
cost_per_1k_output=0.015,
),
ModelConfig(
name="GPT-4o (Direct)",
provider="openai",
model_id="gpt-4o",
cost_per_1k_input=0.005,
cost_per_1k_output=0.015,
),
]
# "fast" chain: fast and economical models
self.fallback_chains["fast"] = [
ModelConfig(
name="Gemini Flash (OpenRouter)",
provider="openrouter",
model_id="google/gemini-2.0-flash",
),
ModelConfig(
name="GPT-4o-mini (Direct)",
provider="openai",
model_id="gpt-4o-mini",
cost_per_1k_input=0.00015,
cost_per_1k_output=0.0006,
),
]
# "free" chain: free models only