APIs AI: OpenRouter vs direct calls

Self-Hosting 🟡 Intermediate ⏱️ 9 min read 📅 2026-02-24

🎯 The problem: too many providers, not enough standards

The LLM market in 2025 is a zoo. Each provider has:
- Its own endpoint (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com...)
- Its own request and response format
- Its own error handling, rate limits, and authentication
- Its own models with different names and versions

For a developer, this means: N integrations to maintain, N API keys to manage, N formats to parse.

🔌 Direct calls: the classic method

How it works

You create an account with each provider, get an API key, and call their endpoint directly.

Direct call to OpenAI

The tool call_openai is an asynchronous function that sends an HTTP POST request to the api.openai.com/v1/chat/completions endpoint. It authenticates the request via a Bearer header with your API key, then builds a JSON payload containing the target model (by default gpt-4o), the prompt, and generation parameters like max_tokens and temperature. The response is parsed to extract the generated text from the choices[0].message.content key.

Direct call to Anthropic

The tool call_anthropic works similarly, but targets the api.anthropic.com/v1/messages endpoint. Authentication uses a specific x-api-key header, accompanied by a mandatory anthropic-version header. The payload and response format differs slightly: the generated text is located in content[0].text.

Direct call to Google (Gemini)

The tool call_google interacts with the Gemini API via generativelanguage.googleapis.com. The API key is passed directly as a URL parameter. The payload format is specific to Google: the prompt is encapsulated in contents[0].parts[0].text and the generation parameters are grouped under generationConfig. The result is extracted from candidates[0].content.parts[0].text.

Do you see the problem? Three different functions, three request formats, three response formats, three authentication systems. And we've only covered 3 providers out of dozens.

Advantages of direct calls

Advantage	Detail
Minimal latency	No intermediary, direct connection
Less dependency	Single point of failure per provider
Immediate access	New models available as soon as they are released
No extra cost	Raw provider price, no margin
Total control	Access to all parameters, even experimental ones

Disadvantages of direct calls

Disadvantage	Detail
Heavy maintenance	N integrations = N times more code to maintain
N API keys	One key per provider to manage and secure
N invoices	One billing account per provider
No native fallback	If a provider goes down, your code crashes
No unified view	Hard to compare costs and usage

🌐 OpenRouter: the universal aggregator

How it works

OpenRouter is an API proxy that exposes a single OpenAI-compatible endpoint. You send your requests to OpenRouter, which routes them to the right provider.

Your code ---> OpenRouter ---> OpenAI / Anthropic / Google / Mistral / ...

A single API key, a single format, a single endpoint.

Calling via OpenRouter

The call_openrouter tool centralizes all calls to a single endpoint (openrouter.ai/api/v1/chat/completions) using a request format identical to OpenAI's. In addition to the API key, it accepts optional headers like HTTP-Referer and X-Title for identification. The huge advantage lies in the model parameter: switching providers simply means changing a string (e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, google/gemini-2.0-flash). OpenRouter even offers free models suffixed with :free.

Complete comparison table

Criterion	Direct calls	OpenRouter
Endpoint	1 per provider	1 unique
API keys	1 per provider	1 single
Format	Variable	Unified (OpenAI compatible)
Latency	Minimal	+50-100ms
Fallback	Must code it yourself	Built-in (auto route)
Billing	N invoices	1 invoice
Free models	Rare	Yes, several available
New models	Immediate	Delay of a few hours/days
Cost	Raw price	Raw price + small margin (~5-15%)
Availability	Direct	Depends on OpenRouter
Dashboard	N dashboards	1 unified

⚡ The real benefits of OpenRouter

1. Free models

OpenRouter offers free models, perfect for prototyping and simple tasks. The FREE_MODELS list includes notable references such as google/gemini-2.0-flash-exp:free, meta-llama/llama-3.1-8b-instruct:free, mistralai/mistral-7b-instruct:free, and huggingfaceh4/zephyr-7b-beta:free. These models are ideal for development and testing phases without incurring any costs.

2. Automatic fallback

If a provider is down, OpenRouter can automatically switch over. The fallback route mechanism is activated by adding the "route": "fallback" parameter in the JSON payload of your request. If the primary model (for example Claude) is unavailable, OpenRouter automatically redirects the call to an equivalent backup model without your code needing to handle the error.

3. Unified billing

A single bill, a single dashboard to see:
- How much you spend per model
- Which models are the most used
- Your real-time consumption

4. Easy comparison

Testing a new model = changing a string. No need to create an account, get an API key, or integrate a new format.

⚠️ The real downsides of OpenRouter

Let's be honest, it's not all roses.

1. Extra latency (+50-100ms)

Every request goes through an intermediary. For real-time chat, you can feel it. For batch processing, it's negligible.

Direct call:    You ---- 120ms ----> Anthropic
OpenRouter:      You -- 50ms --> OpenRouter -- 120ms --> Anthropic
                         Total: ~170ms (+50ms)

2. Single point of failure

If OpenRouter goes down, all your calls fail. This is the main risk of centralization.

3. Model availability

Some models are not available on OpenRouter, or there is a delay after their release. Very recent or preview models might be missing.

4. Price margin

OpenRouter adds a margin (usually 5-15%) on top of direct prices. At high volumes, this can add up.

🏗️ Pattern production : ModelManager avec fallback

Here is the pattern we use in production. It combines the best of both worlds: OpenRouter as primary, direct calls as fallback.

The ModelManager class is a complete orchestrator that manages fallback chains. Internally, it configures three default chains: "smart" (for the best models, mixing OpenRouter and direct Anthropic/OpenAI calls), "fast" (for fast and economical models), and "free" (for free models). It maintains a state of rate-limits per provider: when a call returns a 429 error, the provider is marked as unavailable for the duration indicated by the retry-after header. The main complete() method iterates through the requested chain, ignores limited providers, and attempts the next call in case of failure.

Usage examples for this manager simply come down to instantiating the class and calling the complete method with a list of messages and the name of the desired chain (for example "smart"). The system silently takes care of finding the first available model and returning the response, offering total abstraction from backend outages.

How does it work?

Fallback chains: each chain (smart, fast, free) defines an ordered list of models
Rate-limit detection: when a provider returns 429, it is marked as limited for the indicated duration
Automatic fallback: if a model fails or is rate-limited, we move on to the next one
Mixed providers: the same chain can mix OpenRouter and direct calls

📊 When to use what?

Situation	Recommendation	Why
Prototyping / side project	OpenRouter only	Single account, free models, quick to setup
Production with 1 model	Direct call	Lower latency, fewer dependencies
Multi-model production	OpenRouter + direct fallback	Best of both worlds
High volumes (>100k req/day)	Direct calls	Savings on the OpenRouter margin
Tight budget	OpenRouter free models	Zero cost for free-tier models
Testing / model comparison	OpenRouter	Switching models = changing a string

🔧 Configuration in OpenClaw

If you are using OpenClaw on your VPS, configuring the providers is simple. OpenClaw natively uses OpenRouter. The configuration takes the form of a YAML file declaring the OpenRouter API key via an environment variable and the default model. You can add an optional direct fallback section for Anthropic to it, injecting your own secure API key. If you are looking to deploy this stack, our article on VPS + AI: the complete setup to self-host everything details the server installation, and the guide Docker + AI: containerizing your smart services explains how to isolate all of this in containers.

❌ Common mistakes

Ignoring rate limits: Calling a provider without handling 429 errors will eventually block your application. Always use a detection system like the one in ModelManager.
Hardcoding API keys: Never leave your keys in plain text in the code. Always use environment variables.
Forgetting the fallback: In production, a single provider will go down sooner or later. Always plan a fallback chain.
Underestimating OpenRouter latency: On highly sensitive synchronous applications (e.g., real-time voice generation), the extra 50-100ms can be a dealbreaker.

📋 Key takeaways

Direct calls offer minimal latency and total control, but require heavy maintenance (N integrations, N keys).
OpenRouter unifies everything into a single endpoint, a single key, a single format, at the cost of slight latency and a margin on prices.
The ModelManager pattern combines both: OpenRouter as primary, direct calls as fallback, with intelligent rate limit management.
To get started, jump straight into OpenRouter. To scale, add direct fallbacks.

❓ FAQ

Is OpenRouter reliable in production?
Yes, but like any intermediary, it can experience outages. This is precisely why it is recommended to couple OpenRouter with direct calls as a fallback.

Is OpenRouter's margin a drawback?
It sits between 5 and 15%. For small volumes or prototyping, it is negligible. Beyond 100,000 requests per day, it becomes more cost-effective to switch to direct calls for the most heavily used models.

Can you mix OpenRouter and direct calls in the same project?
This is actually the recommendation for production. The ModelManager pattern presented in this article does exactly that.

Do OpenRouter's free models have limitations?
Yes, they are subject to stricter rate limits and can be slower. They are perfect for development or non-critical tasks, but should be avoided for production customer service.

🎯 Conclusion

For most self-hosted projects:

Start with OpenRouter -- it's the simplest and most flexible
Add direct calls as a fallback for your critical models
Use the ModelManager pattern to automatically handle outages
Monitor your costs via the OpenRouter dashboard and adjust

The pattern is tested in production. Over to you.

🛠️ Recommended tools

OpenRouter : The must-have AI API aggregator to unify your calls.
Cloudflare Tunnel : exposer ses services sans ouvrir de ports : To secure access to your local APIs if you expose webhooks.
VPS + IA : le setup complet pour tout auto-héberger : The definitive guide to preparing your infrastructure.
Docker + IA : conteneuriser ses services intelligents : To easily isolate and deploy your AI agents.

#Comparison #OpenRouter #api #llm

📚 Related articles

Self-Hosting 🟢 Débutant 15 min

antirez launches ds4: the local inference engine that makes DeepSeek V4 Flash usable on a Mac

Antirez, Redis creator, launches ds4: a local inference engine making DeepSeek V4 Flash usable on a Mac. Discover this open source project.

2026-05-18 16:05

Self-Hosting 🟢 Débutant 13 min

How to install a local LLM in 2026

Learn how to set up a local LLM in 2026. Ollama, DeepSeek V4 & Kimi K2.6: simple guide for private, powerful AI with no API fees.

2026-05-09 03:55

VPS + IA : le setup complet pour tout auto-héberger

Self-Hosting 🟡 Intermédiaire 14 min

VPS + AI: the complete setup to self-host everything

Build your own AI server on a VPS: choose a host, secure Ubuntu (SSH, UFW, fail2ban), deploy OpenClaw & Caddy. Setup for 5-15€/mo.

2026-02-24 09:51

📑 Table of contents

🎯 The problem: too many providers, not enough standards

🔌 Direct calls: the classic method

How it works

Direct call to OpenAI

Direct call to Anthropic

Direct call to Google (Gemini)

Advantages of direct calls

Disadvantages of direct calls

🌐 OpenRouter: the universal aggregator

How it works

Calling via OpenRouter

Complete comparison table

⚡ The real benefits of OpenRouter

1. Free models

2. Automatic fallback

3. Unified billing

4. Easy comparison

⚠️ The real downsides of OpenRouter

1. Extra latency (+50-100ms)

2. Single point of failure

3. Model availability

4. Price margin

🏗️ Pattern production : ModelManager avec fallback

How does it work?

📊 When to use what?

🔧 Configuration in OpenClaw

❌ Common mistakes

📋 Key takeaways

❓ FAQ

🎯 Conclusion

🛠️ Recommended tools

📚 Related articles

antirez launches ds4: the local inference engine that makes DeepSeek V4 Flash usable on a Mac

How to install a local LLM in 2026

VPS + AI: the complete setup to self-host everything