Graph Analysis for Due Diligence

Neo4j + 3-model AI panel + PageRank — automated relationship graph ~5K nodes for counterparty verification

Problem

What doesn't work

Due diligence for counterparty verification requires analyzing hundreds of connections between companies, people, and addresses. Manual check of one counterparty: 3-5 analyst days. Hidden connections (nominee directors, mass registration addresses) are invisible in tabular data.

Solution

Architectural approach

Neo4j graph DB (~5K nodes, ~12K edges) with Cypher queries for pattern detection: nominee directors, mass addresses, ownership chains. Confidence scoring based on PageRank + degree centrality. 3-model AI panel analyzes findings. Human-in-the-loop with 3 confidence levels (high/medium/low).

Challenges

What made it hard

Public registry data is dirty: inconsistent tax IDs, addresses with typos, duplicate individuals with different name spellings. PageRank on a ~5K node graph is sensitive to edge errors — one false connection can elevate an innocent node to the top. Legal liability: false positive = lost business partner, false negative = missed risk — confidence threshold was tuned with lawyers.

Role

My role & contribution

Architect & sole developer

Designed and built from scratch: Neo4j graph DB (~5K nodes, ~12K edges), pattern detection (Cypher queries for nominee directors, mass addresses, ownership chains), confidence scoring based on PageRank + degree centrality, 3-model AI panel for finding analysis, human-in-the-loop workflow.

Demo

How it looks

Architecture

System architecture

Implementation

How it works

Neo4j stores entities (companies, people, addresses) and relationships. Cypher queries detect suspicious patterns. PageRank + degree centrality calculate confidence scores. 3-model AI panel (independent analysis + majority vote) evaluates each finding. Human-in-the-loop: 3 levels — high confidence (auto-flag), medium (review recommended), low (manual check required).

Architecture Decision

Why this way

Human-in-the-loop instead of full automation

Alternative

Fully automated reports without human confirmation

Why it didn't fit

Due diligence has legal consequences. False accusations → reputational damage. Human-in-the-loop: system finds patterns with confidence scoring, human confirms — optimal balance of speed and accuracy.

Result

Counterparty check accelerated from days to hours without losing legal significance

Metrics

Results

01: Counterparty check: 3-5 days → 2-3 hours
02: Graph: ~5K nodes, ~12K edges (companies, people, addresses)
03: Pattern detection: nominee directors, mass addresses, ownership chains
04: Confidence scoring: PageRank + degree centrality
05: Human-in-the-loop: 3 levels (high/medium/low)

Business Impact

Impact on business

Counterparty verification time reduced from 3-5 analyst days to 2-3 hours. Hidden connections (nominee directors, mass registration addresses, ownership chains) detected automatically via Cypher queries. Confidence scoring based on PageRank + degree centrality reduces false positives to compliance-acceptable levels.

Methods

Algorithms & patterns

Graph traversal (Cypher)PageRankConfidence scoringEntity Resolution3-model AI panelHuman-in-the-loopNeo4j

Stack

Technologies

Python
Neo4j
FastAPI
PageRank
DeepSeek API
Anthropic API