← All casesConsensus PatternAI

Multi-Model Consensus

4 LLM providers × 2-stage deliberation — 28% disagreement rate catches errors single-model misses

At a glance

A/B test: 150 decisions, false approvals 22% → 13%

Architect & sole developer

3 мес · solo

Python
asyncio
OpenAI API
Anthropic API
Google GenAI
DeepSeek API

Multi-model board — provider voting config

Expand the deep-dive ↓

Problem

What doesn't work

When a single LLM makes critical business decisions, errors are inevitable: hallucinations, bias toward its own patterns, no self-verification. In A/B test on 150 decisions, single-model gave 22% false approvals — every 5th decision was wrong.

Solution

Architectural approach

A panel of 4 independent providers (OpenAI o4-mini, Claude Opus + thinking, Gemini 2.5 Pro + thinking, DeepSeek Reasoner) evaluates in parallel. Decision by quorum ≥3/4. In round 2, models see each other's arguments and refine their positions.

My role & contribution

Architect & sole developer

Designed and implemented the entire architecture: selection of 4 providers with different strengths (reasoning tokens, thinking blocks), ≥3/4 quorum protocol, 2-stage deliberation with argument exchange, MIN_PROVIDERS=2 fault tolerance. Conducted A/B test: 150 decisions single-model vs consensus — false approvals dropped from 22% to 13%.

Multi-Model Consensus

What doesn't work

Architectural approach

Ready to discuss?