Skip to content
← All casesDAG Orchestration + Kill GatesAI

Autonomous Code Factory

Multi-agent pipeline with kill gates — Flutter app of 25-40 files in 15-30 min, 3× faster than sequential generation

Problem

What doesn't work

Single-call LLM code generation produces files that don't compile and aren't connected. Each error requires regenerating the entire project — expensive ($0.50–2.00) and slow (30+ min). Without quality evaluation, bad ideas pass through expensive stages.

Solution

Architectural approach

Multi-agent orchestration with built-in self-evaluation. Generation: PlannerAgent → PlanReviewPanel (4 models, ≥3/4 quorum) → DAG sort → TaskExecutor parallel → TaskReviewer retry ≤3×. Evaluation: Critic (0-10, 5 criteria) + 3 kill gates (min_critic_score, compliance_risk, cannibalization) kill weak ideas before expensive stages. Judge (26 checkpoints, 100 points) verifies final quality.

Challenges

What made it hard

The main challenge — debugging a multi-agent system of 11 microservices where a bug could be in any agent or in their interactions. First version (single LLM call) produced non-working code 60% of the time — had to completely rewrite the architecture to DAG with per-file retry. 553 tests written in parallel with code because without them it's impossible to refactor a system of this scale solo.

Role

My role & contribution

Architect & sole developer

Designed and built from scratch: PlannerAgent, DAG scheduler with topological sort, TaskExecutor/TaskReviewer, PlanReviewPanel (4 models), kill gates in orchestrator, Redis pub/sub for real-time progress. 11 microservices, 553 tests. Entire codebase is my project.

Demo

How it looks

Screenshots

Real screenshots

Architecture

System architecture

Autonomous App Factory -- 11 Microservices PipelinePIPELINEidea-svccritic-svcboard-svcspec-svccodegen-svcreview-svcpublish-svcXkill gateXkill gateINFRASTRUCTUREgateway:8000orchestrator:8001sharedDB + Redisanalytics-svc:8019React Frontend11 services | 553 tests | PostgreSQL + Redis Streams | Event-driven FSM
Implementation

How it works

Topological sort of tasks by dependencies → parallel execution per DAG level via asyncio.gather(). Claude CLI as agent engine. Redis pub/sub for real-time progress. PostgreSQL for plan and task persistence. Kill gates built into orchestrator router — decision made before launching expensive stages.

Architecture Decision

Why this way

DAG scheduler instead of queue

Alternative

Sequential file generation or batch prompt for entire project

Why it didn't fit

Sequential: independent files wait for nothing. Batch: context overflow. DAG: topological sort + parallelism within levels + per-file retry.

Result

3× faster code generation. Retry per file, not entire project

Metrics

Results

01
27/27 judge points — PASS 100/100, 553 tests, 0 failures
02
3× faster vs sequential generation (measured on 25-40 file Flutter projects)
03
−70% error costs (retry per file, not project)
04
3 kill gates across 7 stages — weak ideas killed before codegen
05
0 manual interventions in full run
Business Impact

Impact on business

Full cycle from idea to working code in 15–30 min without human involvement. Retry per file instead of entire project — up to 70% token savings on errors. Kill gates eliminate weak ideas before expensive codegen stage, saving $0.50–2.00 per rejected project.

Methods

Algorithms & patterns

DAG schedulingKill Gates (fail-fast)LLM-as-Judge (26 пунктов)Multi-provider review panelRedis Streams pub/subEvent SourcingCritic loops (retry ≤3×)
Stack

Technologies

  • Python
  • asyncio
  • Claude CLI
  • PostgreSQL
  • Redis Streams
  • FastAPI
  • SQLAlchemy

Ready to discuss?

If you need an architect who builds autonomous AI systems — reach out.

Serbia-based · CET/CEST timezone · EU-aligned working hours · International contracts experience