AI Memory Pipeline Principles

Core Problem

Most teams focus on retrieval algorithms first (vector search, rerank, graph traversal)
They build complex architectures combining 3 papers (HippoRAG, A-MEM, CatRAG)
But benchmarks show: single-hop queries get only +3% from graph vs vector, while multi-hop gets +34-53% but most real use cases are single-hop
Good schema + query decomposition provides higher ROI than complex engines

Three design philosophies guide the pipeline:

Schema-First: schema quality determines retrieval quality, not engine complexity
Anti-Complexity: knowing when "good enough" is enough
LLM-as-Judge: delegate contextual judgment to LLM instead of rule-based filtering

Decompose responses into atomic knowledge units:

"Server migration in March, lead is Kim"
  → { migration_date: "March" }
  + { migration_lead: "Kim" }

Generate synthetic queries for each atom for better embedding precision.

A single float 0.0 - 1.0 serves 4 roles:

Role	Description
Initial Trust	How reliable the source was at write time
Time Decay	Confidence decreases as information ages
Conflict Resolution	Higher confidence wins when memories conflict
Explicit Correction	Manual overrides adjust confidence directly

Graph is for organization, NOT retrieval. Role separation is key.

Simplification principle: don't combine 3 papers; patch one (HippoRAG) with ideas from another (CatRAG).

LLM decomposes query into search terms + filter conditions:

"Recent meeting decision about deployment"
  → search: "deployment schedule"
  + filter: type=meeting, recency=recent

This is the highest ROI improvement point in any retrieval pipeline.

Pass confidence and recency scores to LLM context instead of hard filtering:

A confidence 0.4 piece of information is more useful presented as "not certain, but..." than being filtered out entirely
Let the LLM decide how to weigh uncertain information in its response