Hybrid RAG + Knowledge Graph
3 search methods × RRF fusion — RAGAS faithfulness 0.91 on 200 questions over corporate documentation (3K+ pages)
What doesn't work
Vector search finds semantically similar documents but misses exact matches and entity relationships. Pure RAG over 3K+ page corporate documentation gives RAGAS faithfulness 0.67 — every third answer is incomplete or inaccurate.
Architectural approach
Three parallel search methods: Vector (Qdrant HNSW), BM25 (exact matches), Knowledge Graph (Neo4j, entity relationships). Results combined via Reciprocal Rank Fusion (RRF). Quality measured through RAGAS eval on 200 control questions.
What made it hard
Entity resolution in the knowledge graph — an NLP task with no perfect solution: 'PGK', 'First Freight', and 'ПГК' must be the same node. RRF weight tuning (vector 0.4, BM25 0.3, graph 0.3) — an iterative process on 200 control questions, each iteration requiring full recalculation. Semantic chunking for code vs text — fundamentally different algorithms, AST-based chunking breaks on invalid fragments.
My role & contribution
Architect & sole developer
Designed and built from scratch: Qdrant HNSW setup, Neo4j knowledge graph with entity resolution, BM25 index, RRF fusion with weight tuning. Ran RAGAS eval on 200 control questions. Deployed as internal tool for corporate documentation.
How it looks
Real screenshots
System architecture
How it works
Semantic chunking via AST for code and paragraph-based for text. Entity extraction + resolution builds knowledge graph in Neo4j. BM25 index for keyword search. RRF aggregates rankings from three sources. RAGAS eval: faithfulness, relevancy, context precision.
Why this way
RRF fusion instead of cascade
Cascade search: first vector, then BM25 on results, then graph
Cascade loses documents that don't pass the first stage. RRF: each method searches independently, then rankings merge. No relevant document is lost.
+25% accuracy compared to cascade approach
Results
- 01
- RAGAS faithfulness: 0.67 → 0.91 (on 200 control questions)
- 02
- MRR (Mean Reciprocal Rank): 0.48 → 0.79
- 03
- 3 parallel search methods, RRF fusion with tuned weights
- 04
- Knowledge Graph: Neo4j, ~2K entities, ~8K relationships
- 05
- Corporate documentation: 3K+ pages
Impact on business
RAGAS faithfulness improvement from 0.67 to 0.91 on 200 control questions over 3K+ page corporate documentation. MRR from 0.48 to 0.79 — relevant documents consistently rank in top positions. Deployed as internal tool for corporate documentation search.
Algorithms & patterns
Technologies
- Python
- Qdrant
- Neo4j
- Sentence-Transformers
- RAGAS
- FastAPI