Autonomous Code Factory
Multi-agent pipeline with kill gates — Flutter app of 25-40 files in 15-30 min, 3× faster than sequential generation
What doesn't work
Single-call LLM code generation produces files that don't compile and aren't connected. Each error requires regenerating the entire project — expensive ($0.50–2.00) and slow (30+ min). Without quality evaluation, bad ideas pass through expensive stages.
Architectural approach
Multi-agent orchestration with built-in self-evaluation. Generation: PlannerAgent → PlanReviewPanel (4 models, ≥3/4 quorum) → DAG sort → TaskExecutor parallel → TaskReviewer retry ≤3×. Evaluation: Critic (0-10, 5 criteria) + 3 kill gates (min_critic_score, compliance_risk, cannibalization) kill weak ideas before expensive stages. Judge (26 checkpoints, 100 points) verifies final quality.
What made it hard
The main challenge — debugging a multi-agent system of 11 microservices where a bug could be in any agent or in their interactions. First version (single LLM call) produced non-working code 60% of the time — had to completely rewrite the architecture to DAG with per-file retry. 553 tests written in parallel with code because without them it's impossible to refactor a system of this scale solo.
My role & contribution
Architect & sole developer
Designed and built from scratch: PlannerAgent, DAG scheduler with topological sort, TaskExecutor/TaskReviewer, PlanReviewPanel (4 models), kill gates in orchestrator, Redis pub/sub for real-time progress. 11 microservices, 553 tests. Entire codebase is my project.
How it looks
Real screenshots
System architecture
How it works
Topological sort of tasks by dependencies → parallel execution per DAG level via asyncio.gather(). Claude CLI as agent engine. Redis pub/sub for real-time progress. PostgreSQL for plan and task persistence. Kill gates built into orchestrator router — decision made before launching expensive stages.
Why this way
DAG scheduler instead of queue
Sequential file generation or batch prompt for entire project
Sequential: independent files wait for nothing. Batch: context overflow. DAG: topological sort + parallelism within levels + per-file retry.
3× faster code generation. Retry per file, not entire project
Results
- 01
- 27/27 judge points — PASS 100/100, 553 tests, 0 failures
- 02
- 3× faster vs sequential generation (measured on 25-40 file Flutter projects)
- 03
- −70% error costs (retry per file, not project)
- 04
- 3 kill gates across 7 stages — weak ideas killed before codegen
- 05
- 0 manual interventions in full run
Impact on business
Full cycle from idea to working code in 15–30 min without human involvement. Retry per file instead of entire project — up to 70% token savings on errors. Kill gates eliminate weak ideas before expensive codegen stage, saving $0.50–2.00 per rejected project.
Algorithms & patterns
Technologies
- Python
- asyncio
- Claude CLI
- PostgreSQL
- Redis Streams
- FastAPI
- SQLAlchemy