The best model for each task, automatically. Four routing strategies. Real-time health monitoring. Automatic failover in under 100ms.
Outages bring you down
If your single provider has an outage, your whole product stops. No fallback, no warning, no recovery. Just downtime.
Price hikes are mandatory
Provider raises prices? You pay it. No alternatives, no negotiation. Your cost structure is outside your control.
Better models require rewrites
A better model launches from a competitor. Your hardcoded provider means months of migration work to adopt it.
EnGenAI abstracts all providers behind a single routing layer. Swap, fallback, and route — without changing a single line of your agent code.
Choose how traffic is distributed across your providers. Change strategy per agent, per task type, or globally.
Routes to the highest-priority provider in your list. Predictable and consistent — always your first choice unless it fails.
Best for
Regulated environments, provider lock-in preferences, audit trails requiring consistent model use.
Routes to the cheapest provider that meets the task's quality requirements. Cost savings without sacrificing output quality.
Best for
High-volume workloads, documentation generation, bulk processing, cost-sensitive environments.
Routes to the provider with the highest quality score for this specific task type. Quality scores updated from real usage data.
Best for
Architecture decisions, complex reasoning, code review, security analysis, critical business logic.
Routes to the provider with the lowest current response time. Continuously tracked. Adapts as provider performance changes.
Best for
Real-time interactions, streaming responses, time-sensitive tasks, user-facing completions.
Define your own routing logic with weighted combinations. Example: 60% priority, 30% price, 10% latency. Route specific task types to specific providers. Override routing for individual agents.
Watch it live. Simulate a provider failure and see traffic shift automatically. Real-time latency, uptime, and circuit breaker state for every connected provider.
Anthropic
Claude Opus 4.6
Latency: 3.2s
Uptime: 99.8%
OpenAI
GPT-4o
Latency: 2.4s
Uptime: 99.6%
Gemini 2.5 Pro
Latency: 6.8s
Uptime: 97.2%
Cohere
Command R+
Latency: 1.8s
Uptime: 99.1%
Mistral
Mistral Large
Latency: N/A
Last seen: 4m ago
Every request has a fallback chain. Primary → Secondary → Tertiary. If all fail, the circuit breaker opens and the user gets a clean error — not a hang.
INCOMING REQUEST
Agent task requires LLM response
ERROR: No providers available
Clean error returned to user. No silent hang. Circuit breakers remain OPEN until recovery.
Normal operation. Requests flow through. Failures counted but not yet triggering.
Provider taken offline after N consecutive failures. Cooldown period begins (60s default).
Single test request allowed. Success = CLOSED again. Failure = OPEN again.
Not all models are equal on latency. With latency-based routing, EnGenAI automatically selects the fastest healthy provider for each request.
p50 latency in seconds — illustrative benchmark
EnGenAI routes to the fastest healthy provider automatically. When Claude Opus is rate-limited, traffic shifts to Sonnet without interruption. Latency routing monitors real-time p50 values — not static estimates.
Prevents cascade failures. When a provider fails repeatedly, the circuit opens automatically. No thundering herd. No resource exhaustion. Clean recovery.
CLOSED
Normal operation
Requests flow through normally. Failures are counted but don't yet block the provider. Default state after recovery.
OPEN
Provider halted
After N consecutive failures, the circuit opens. All requests skip this provider immediately. Cooldown timer starts (default: 60s).
HALF-OPEN
Testing recovery
After cooldown, a single test request is allowed through. Success: CLOSED. Failure: OPEN with extended cooldown.
Circuit breakers operate per-provider, per-tenant. One customer's provider issues do not affect other tenants. State transitions are logged and visible in the observability dashboard.
Every provider enforces rate limits. EnGenAI tracks them proactively with sliding-window counters — requests per minute (RPM) and tokens per minute (TPM) — so your agents never hit a 429 error.
Sliding window tracks requests per minute per provider. When usage exceeds 80%, the router shifts new requests to an alternate provider before hitting the limit.
Token throughput tracked separately. Large prompts can consume the token budget even when request count is low. Both counters must be green to route.
When a provider exceeds 90% of its rate limit window, it receives a 10x routing penalty — effectively deprioritised until capacity frees up. This prevents clustering requests on a nearly-full provider.
Not every plan tier gets every model. Model access groups gate which models are available based on the organisation's subscription tier — enforced at the routing layer, not the UI.
Efficient models — Claude Haiku 4.5, GPT-4o mini, Gemini 2.0 Flash
All Starter models + Claude Sonnet 4.6, GPT-4o, Gemini 2.5 Pro
All models — Claude Opus 4.6, GPT-4.5, o3 + BYO API keys
Access groups are enforced at the routing decision point. If an agent requests a model its organisation cannot access, the router substitutes the best available model from the allowed group — logged and visible in the observability dashboard.
failover time
From failure detection to traffic shift
providers supported
Anthropic, OpenAI, Google, Cohere, Mistral
routing strategies
Priority, Price, Quality, Latency + Custom
Routing gets requests to the right model. Skills give agents the capabilities to act. Discover how every skill is vetted before it can run.