Graph Analysis for Due Diligence
Neo4j + 3-model AI panel + PageRank — automated relationship graph ~5K nodes for counterparty verification
What doesn't work
Due diligence for counterparty verification requires analyzing hundreds of connections between companies, people, and addresses. Manual check of one counterparty: 3-5 analyst days. Hidden connections (nominee directors, mass registration addresses) are invisible in tabular data.
Architectural approach
Neo4j graph DB (~5K nodes, ~12K edges) with Cypher queries for pattern detection: nominee directors, mass addresses, ownership chains. Confidence scoring based on PageRank + degree centrality. 3-model AI panel analyzes findings. Human-in-the-loop with 3 confidence levels (high/medium/low).
What made it hard
Public registry data is dirty: inconsistent tax IDs, addresses with typos, duplicate individuals with different name spellings. PageRank on a ~5K node graph is sensitive to edge errors — one false connection can elevate an innocent node to the top. Legal liability: false positive = lost business partner, false negative = missed risk — confidence threshold was tuned with lawyers.
My role & contribution
Architect & sole developer
Designed and built from scratch: Neo4j graph DB (~5K nodes, ~12K edges), pattern detection (Cypher queries for nominee directors, mass addresses, ownership chains), confidence scoring based on PageRank + degree centrality, 3-model AI panel for finding analysis, human-in-the-loop workflow.
How it looks
System architecture
How it works
Neo4j stores entities (companies, people, addresses) and relationships. Cypher queries detect suspicious patterns. PageRank + degree centrality calculate confidence scores. 3-model AI panel (independent analysis + majority vote) evaluates each finding. Human-in-the-loop: 3 levels — high confidence (auto-flag), medium (review recommended), low (manual check required).
Why this way
Human-in-the-loop instead of full automation
Fully automated reports without human confirmation
Due diligence has legal consequences. False accusations → reputational damage. Human-in-the-loop: system finds patterns with confidence scoring, human confirms — optimal balance of speed and accuracy.
Counterparty check accelerated from days to hours without losing legal significance
Results
- 01
- Counterparty check: 3-5 days → 2-3 hours
- 02
- Graph: ~5K nodes, ~12K edges (companies, people, addresses)
- 03
- Pattern detection: nominee directors, mass addresses, ownership chains
- 04
- Confidence scoring: PageRank + degree centrality
- 05
- Human-in-the-loop: 3 levels (high/medium/low)
Impact on business
Counterparty verification time reduced from 3-5 analyst days to 2-3 hours. Hidden connections (nominee directors, mass registration addresses, ownership chains) detected automatically via Cypher queries. Confidence scoring based on PageRank + degree centrality reduces false positives to compliance-acceptable levels.
Algorithms & patterns
Technologies
- Python
- Neo4j
- FastAPI
- PageRank
- DeepSeek API
- Anthropic API