Engineering2026-03-105 min

How Free-First AI Routing Saves 60% on LLM Costs

Most AI platforms default to expensive flagship models. ORIS takes the opposite approach: every request hits the cheapest capable model first.

The routing waterfall: (1) Free models like Google Gemini Flash, Groq Llama 8B, and Mistral Small at $0 per request. (2) Ultra-cheap providers like Qwen Flash at $0.10/M tokens and DeepSeek V3 at $0.28/M. (3) Budget options like Kimi K2.5 and xAI Grok Fast. (4) Premium models like Claude Sonnet/Opus and GPT-4.1/5.4 only when quality demands it.

The cost optimizer scores each candidate model against a quality threshold. Starter tier requires 0.70, Pro requires 0.85, Enterprise requires 0.95. Within qualifying models, it picks the cheapest.

The circuit breaker tracks provider health in Redis. If a provider hits 30% error rate within 60 seconds, it is marked unhealthy for 120 seconds. The fallback chain transparently reroutes to the next candidate. Products never know which model answered.

Real-world impact: organisations on starter tier run entirely on free models. Pro tier averages $0.001 per request. Enterprise gets frontier models at 15% markup over wholesale.