πŸ“‘ Table of contents

APIs AI: OpenRouter vs direct calls

APIs AI: OpenRouter vs direct calls

Self-Hosting 🟑 Intermediate ⏱️ 9 min read πŸ“… 2026-02-24

🎯 The problem: too many providers, not enough standards

The LLM market in 2025 is a zoo. Each provider has:
- Its own endpoint (api.openai.com, api.anthropic.com, generativelanguage.googleapis.com...)
- Its own request and response format
- Its own error handling, rate limits, and authentication
- Its own models with different names and versions

For a developer, this means: N integrations to maintain, N API keys to manage, N formats to parse.


πŸ”Œ Direct calls: the classic method

How it works

You create an account with each provider, get an API key, and call their endpoint directly.

Direct call to OpenAI

The tool call_openai is an asynchronous function that sends an HTTP POST request to the api.openai.com/v1/chat/completions endpoint. It authenticates the request via a Bearer header with your API key, then builds a JSON payload containing the target model (by default gpt-4o), the prompt, and generation parameters like max_tokens and temperature. The response is parsed to extract the generated text from the choices[0].message.content key.

Direct call to Anthropic

The tool call_anthropic works similarly, but targets the api.anthropic.com/v1/messages endpoint. Authentication uses a specific x-api-key header, accompanied by a mandatory anthropic-version header. The payload and response format differs slightly: the generated text is located in content[0].text.

Direct call to Google (Gemini)

The tool call_google interacts with the Gemini API via generativelanguage.googleapis.com. The API key is passed directly as a URL parameter. The payload format is specific to Google: the prompt is encapsulated in contents[0].parts[0].text and the generation parameters are grouped under generationConfig. The result is extracted from candidates[0].content.parts[0].text.

Do you see the problem? Three different functions, three request formats, three response formats, three authentication systems. And we've only covered 3 providers out of dozens.

Advantages of direct calls

Advantage Detail
Minimal latency No intermediary, direct connection
Less dependency Single point of failure per provider
Immediate access New models available as soon as they are released
No extra cost Raw provider price, no margin
Total control Access to all parameters, even experimental ones

Disadvantages of direct calls

Disadvantage Detail
Heavy maintenance N integrations = N times more code to maintain
N API keys One key per provider to manage and secure
N invoices One billing account per provider
No native fallback If a provider goes down, your code crashes
No unified view Hard to compare costs and usage

🌐 OpenRouter: the universal aggregator

How it works

OpenRouter is an API proxy that exposes a single OpenAI-compatible endpoint. You send your requests to OpenRouter, which routes them to the right provider.

Your code ---> OpenRouter ---> OpenAI / Anthropic / Google / Mistral / ...

A single API key, a single format, a single endpoint.

Calling via OpenRouter

The call_openrouter tool centralizes all calls to a single endpoint (openrouter.ai/api/v1/chat/completions) using a request format identical to OpenAI's. In addition to the API key, it accepts optional headers like HTTP-Referer and X-Title for identification. The huge advantage lies in the model parameter: switching providers simply means changing a string (e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, google/gemini-2.0-flash). OpenRouter even offers free models suffixed with :free.

Complete comparison table

Criterion Direct calls OpenRouter
Endpoint 1 per provider 1 unique
API keys 1 per provider 1 single
Format Variable Unified (OpenAI compatible)
Latency Minimal +50-100ms
Fallback Must code it yourself Built-in (auto route)
Billing N invoices 1 invoice
Free models Rare Yes, several available
New models Immediate Delay of a few hours/days
Cost Raw price Raw price + small margin (~5-15%)
Availability Direct Depends on OpenRouter
Dashboard N dashboards 1 unified

⚑ The real benefits of OpenRouter

1. Free models

OpenRouter offers free models, perfect for prototyping and simple tasks. The FREE_MODELS list includes notable references such as google/gemini-2.0-flash-exp:free, meta-llama/llama-3.1-8b-instruct:free, mistralai/mistral-7b-instruct:free, and huggingfaceh4/zephyr-7b-beta:free. These models are ideal for development and testing phases without incurring any costs.

2. Automatic fallback

If a provider is down, OpenRouter can automatically switch over. The fallback route mechanism is activated by adding the "route": "fallback" parameter in the JSON payload of your request. If the primary model (for example Claude) is unavailable, OpenRouter automatically redirects the call to an equivalent backup model without your code needing to handle the error.

3. Unified billing

A single bill, a single dashboard to see:
- How much you spend per model
- Which models are the most used
- Your real-time consumption

4. Easy comparison

Testing a new model = changing a string. No need to create an account, get an API key, or integrate a new format.


⚠️ The real downsides of OpenRouter

Let's be honest, it's not all roses.

1. Extra latency (+50-100ms)

Every request goes through an intermediary. For real-time chat, you can feel it. For batch processing, it's negligible.

Direct call:    You ---- 120ms ----> Anthropic
OpenRouter:      You -- 50ms --> OpenRouter -- 120ms --> Anthropic
                         Total: ~170ms (+50ms)

2. Single point of failure

If OpenRouter goes down, all your calls fail. This is the main risk of centralization.

3. Model availability

Some models are not available on OpenRouter, or there is a delay after their release. Very recent or preview models might be missing.

4. Price margin

OpenRouter adds a margin (usually 5-15%) on top of direct prices. At high volumes, this can add up.


πŸ—οΈ Pattern production : ModelManager avec fallback

Here is the pattern we use in production. It combines the best of both worlds: OpenRouter as primary, direct calls as fallback.

The ModelManager class is a complete orchestrator that manages fallback chains. Internally, it configures three default chains: "smart" (for the best models, mixing OpenRouter and direct Anthropic/OpenAI calls), "fast" (for fast and economical models), and "free" (for free models). It maintains a state of rate-limits per provider: when a call returns a 429 error, the provider is marked as unavailable for the duration indicated by the retry-after header. The main complete() method iterates through the requested chain, ignores limited providers, and attempts the next call in case of failure.

Usage examples for this manager simply come down to instantiating the class and calling the complete method with a list of messages and the name of the desired chain (for example "smart"). The system silently takes care of finding the first available model and returning the response, offering total abstraction from backend outages.

How does it work?

  1. Fallback chains: each chain (smart, fast, free) defines an ordered list of models
  2. Rate-limit detection: when a provider returns 429, it is marked as limited for the indicated duration
  3. Automatic fallback: if a model fails or is rate-limited, we move on to the next one
  4. Mixed providers: the same chain can mix OpenRouter and direct calls

πŸ“Š When to use what?

Situation Recommendation Why
Prototyping / side project OpenRouter only Single account, free models, quick to setup
Production with 1 model Direct call Lower latency, fewer dependencies
Multi-model production OpenRouter + direct fallback Best of both worlds
High volumes (>100k req/day) Direct calls Savings on the OpenRouter margin
Tight budget OpenRouter free models Zero cost for free-tier models
Testing / model comparison OpenRouter Switching models = changing a string

πŸ”§ Configuration in OpenClaw

If you are using OpenClaw on your VPS, configuring the providers is simple. OpenClaw natively uses OpenRouter. The configuration takes the form of a YAML file declaring the OpenRouter API key via an environment variable and the default model. You can add an optional direct fallback section for Anthropic to it, injecting your own secure API key. If you are looking to deploy this stack, our article on VPS + AI: the complete setup to self-host everything details the server installation, and the guide Docker + AI: containerizing your smart services explains how to isolate all of this in containers.


❌ Common mistakes

  • Ignoring rate limits: Calling a provider without handling 429 errors will eventually block your application. Always use a detection system like the one in ModelManager.
  • Hardcoding API keys: Never leave your keys in plain text in the code. Always use environment variables.
  • Forgetting the fallback: In production, a single provider will go down sooner or later. Always plan a fallback chain.
  • Underestimating OpenRouter latency: On highly sensitive synchronous applications (e.g., real-time voice generation), the extra 50-100ms can be a dealbreaker.

πŸ“‹ Key takeaways

  • Direct calls offer minimal latency and total control, but require heavy maintenance (N integrations, N keys).
  • OpenRouter unifies everything into a single endpoint, a single key, a single format, at the cost of slight latency and a margin on prices.
  • The ModelManager pattern combines both: OpenRouter as primary, direct calls as fallback, with intelligent rate limit management.
  • To get started, jump straight into OpenRouter. To scale, add direct fallbacks.

❓ FAQ

Is OpenRouter reliable in production?
Yes, but like any intermediary, it can experience outages. This is precisely why it is recommended to couple OpenRouter with direct calls as a fallback.

Is OpenRouter's margin a drawback?
It sits between 5 and 15%. For small volumes or prototyping, it is negligible. Beyond 100,000 requests per day, it becomes more cost-effective to switch to direct calls for the most heavily used models.

Can you mix OpenRouter and direct calls in the same project?
This is actually the recommendation for production. The ModelManager pattern presented in this article does exactly that.

Do OpenRouter's free models have limitations?
Yes, they are subject to stricter rate limits and can be slower. They are perfect for development or non-critical tasks, but should be avoided for production customer service.


🎯 Conclusion

For most self-hosted projects:

  1. Start with OpenRouter -- it's the simplest and most flexible
  2. Add direct calls as a fallback for your critical models
  3. Use the ModelManager pattern to automatically handle outages
  4. Monitor your costs via the OpenRouter dashboard and adjust

The pattern is tested in production. Over to you.