Core Problem
- Most teams focus on retrieval algorithms first (vector search, rerank, graph traversal)
- They build complex architectures combining 3 papers (HippoRAG, A-MEM, CatRAG)
- But benchmarks show: single-hop queries get only +3% from graph vs vector, while multi-hop gets +34-53% but most real use cases are single-hop
- Good schema + query decomposition provides higher ROI than complex engines
Solution: Write → Index → Read Pipeline
Three design philosophies guide the pipeline:
- Schema-First: schema quality determines retrieval quality, not engine complexity
- Anti-Complexity: knowing when "good enough" is enough
- LLM-as-Judge: delegate contextual judgment to LLM instead of rule-based filtering
Write Stage: Atomic Decomposition + Confidence
Decompose responses into atomic knowledge units:
"Server migration in March, lead is Kim"
→ { migration_date: "March" }
+ { migration_lead: "Kim" }
Generate synthetic queries for each atom for better embedding precision.
Confidence Score
A single float 0.0 - 1.0 serves 4 roles:
| Role | Description |
|---|---|
| Initial Trust | How reliable the source was at write time |
| Time Decay | Confidence decreases as information ages |
| Conflict Resolution | Higher confidence wins when memories conflict |
| Explicit Correction | Manual overrides adjust confidence directly |
Write-time Conflict Detection
- Search similar existing memories at write time
- LLM judges whether new information is a complement, conflict, or update
Index Stage: Graph as Organization
Graph is for organization, NOT retrieval. Role separation is key.
- Retrieval = vector + metadata filtering
- Organization = graph
Simplification principle: don't combine 3 papers; patch one (HippoRAG) with ideas from another (CatRAG).
Read Stage: Query Decomposition + Scoring as Context
Pre-retrieval: Query Decomposition
LLM decomposes query into search terms + filter conditions:
"Recent meeting decision about deployment"
→ search: "deployment schedule"
+ filter: type=meeting, recency=recent
This is the highest ROI improvement point in any retrieval pipeline.
Scoring as Context, Not Filter
Pass confidence and recency scores to LLM context instead of hard filtering:
- A confidence
0.4piece of information is more useful presented as "not certain, but..." than being filtered out entirely - Let the LLM decide how to weigh uncertain information in its response