← Back to Blog

Don't Lock Into One LLM — Use Each for a Different Purpose

Everyone's obsessed with picking the "best" AI model. The one model to rule them all. Spoiler: it doesn't exist. And the more you try to find it, the more you handicap yourself.

The winning strategy is the opposite: use many models. Each for what it's actually good at.

The Lock-In Trap

I see this constantly with clients and builders: they pick one model — usually Claude or GPT-4 — and force every problem through it. Why? Because it's convenient. You get good at one thing, you use that thing for everything.

This makes sense until it doesn't.

Claude is brilliant at reasoning, writing, analysis. GPT-4 is flexible and fast. But neither is the best at everything. Neither should be.

Lock-in happens when:

At that point, switching costs real engineering work. You're locked in. And the vendor knows it.

Why Different Models Exist

There's a reason we have Claude, GPT, Llama, Gemini, Mistral, specialized models. They're not accidents. They're tradeoffs.

Different models optimize for different things:

These tradeoffs don't resolve. They're not "Claude's better, use it everywhere." They're structural.

My Actual Routing Strategy

I don't use one model. I use a router. Here's how I think about it:

Claude for Reasoning

Complex analysis, multi-step problem solving, writing that requires depth. Claude is genuinely the best I've used. Yes, it costs more. But when you need serious thinking, it's worth it.

Example: analyzing a contract for risk, synthesizing research across multiple sources, building a strategic recommendation. That's Claude work.

GPT-4o for Speed & Flexibility

I need something fast that handles a wide range of tasks reasonably well. GPT-4o is my pick. It's not the best at any one thing, but it's surprisingly competent across domains.

Example: summarizing a document, categorizing support tickets, simple extraction. Get the job done fast, spend less.

Smaller Models for Scale

When you need to process thousands of items and the task is straightforward, small models shine. Cost per request drops 10x.

Example: classifying customer feedback, validating outputs, simple formatting. High volume, low complexity.

Specialized Models for Domain Work

Code generation? Use a model optimized for it. Medical text analysis? Use one trained on medical data. You get better results and you're not paying for unused capability.

How to Actually Implement Routing

This sounds complicated. It's not. You need three things:

1. An Abstraction Layer

Don't call models directly from your code. Wrap them. You're calling a reasoning service, a summarization service, a classification service. Those services decide which model to use internally.

Pseudocode:

result = reason(prompt, context)

Inside, it picks Claude or GPT-4 based on latency requirements, context size, cost budget. Your code doesn't care. Your code doesn't know.

2. Cost & Latency Constraints

Tag each task with what matters: "I need this in 500ms" or "I have a $0.01 budget." Your router uses those constraints to pick models.

Doesn't need to be complex. Simple rules work:

3. Monitoring & Adjustment

Track which model you used, what it cost, whether the output was good. Over time, you optimize your routing rules. You learn that 80% of customer tickets can be handled by a small model. You use Claude more selectively.

This is not set and forget. It's empirical. You adjust based on what actually works.

The Real Benefit: Resilience

Here's why this matters beyond efficiency: diversification is risk management.

If Claude has an outage, your whole system can fail if you're locked in. With routing, you degrade gracefully. "Claude's slow? Use GPT. Degrade quality but stay online."

If pricing changes dramatically, you have optionality. You can shift workloads. You're not hostage to one vendor's pricing decisions.

If a new model comes out that's better for your use case, you can integrate it into your router without rewriting everything. You're building for portability.

What This Requires

The honest downside:

If you're building a weekend project, this overhead isn't worth it. Pick a model, use it. Ship it.

If you're building something that needs to scale, needs to survive outages, needs to remain economically viable as volume grows: this is table stakes.

The Industry Momentum

This is where the industry is heading anyway. Companies like OpenRouter, Together, Replicate are building exactly this: abstraction layers that let you route to different models transparently.

The future isn't "which model is best?" It's "which model for which job?" And tooling to make that decision automatically.

If you build with that architecture from day one, you're not fighting against the momentum. You're swimming with it.

My Challenge to You

If you're standardized on one model, ask yourself: "Am I doing this because it's optimal, or because it's convenient?"

If it's convenient, start thinking about abstraction. It's a one-week project to build a basic router. It'll save you months of regret.

Lock-in is a slow trap. By the time you notice, you're already paying for it.

Don't be locked in. Stay flexible. Use the right tool for the job.