← Back to patterns
build

The Pre-computed Index Trap

TRIGGER

Agents operating over large codebases or data needed relevant context but pre-computing embeddings created stale indexes, while loading everything into context caused attention degradation and context pollution.

APPROACH

Anthropic's Claude Code maintains lightweight identifiers (file paths, stored queries, web links) instead of pre-embedded content, then uses tools like glob and grep to dynamically load data at runtime. The agent navigates file hierarchies and naming conventions as implicit metadata—a file named test_utils.py in tests/ implies different purpose than the same name in src/core_logic/. CLAUDE.md files are loaded upfront for speed, while detailed exploration happens just-in-time. The agent writes targeted queries, stores results, and uses Bash commands like head and tail to analyze large volumes without loading full data objects into context.

PATTERN

Stale embeddings returning wrong context weeks after the code changed—the "pre-computed index trap" where your RAG pipeline can't keep up with a living codebase. File system metadata IS the retrieval index: folder hierarchies and naming conventions provide semantic signals that stay current without reindexing and cost zero tokens until accessed.

WORKS WHEN

  • Data has meaningful organizational structure (codebases, document repositories with naming conventions)
  • Content changes frequently enough that pre-computed embeddings would go stale
  • Agent has tools to efficiently navigate and sample data (glob, grep, head, tail)
  • Task allows for exploration latency—not sub-second response requirements
  • Naming conventions and folder structures encode semantic information about content purpose

FAILS WHEN

  • Data lacks organizational structure (flat dumps, inconsistently named files)
  • Sub-second retrieval latency is required and exploration overhead is unacceptable
  • Content is opaque to metadata inspection (binary files, encoded data)
  • Agent lacks primitives for efficient partial data access (must load entire files)
  • Task requires exhaustive search where exploration would miss edge cases that embedding similarity would catch

Stage

build

From

September 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.