← Back to patterns
build

Anthropic's Contextual Chunk Annotation Pattern

TRIGGER

RAG systems were failing to retrieve relevant information because chunks lost context when split from their source documents—a chunk saying 'revenue grew 3%' doesn't specify which company or time period, making it impossible to match against queries like 'What was ACME Corp's Q2 2023 revenue growth?'

APPROACH

Anthropic's team added a preprocessing step: before embedding each chunk, they passed the full document + chunk to Claude Haiku asking for 'short succinct context to situate this chunk within the overall document.' Input: full document + individual chunk. Output: chunk with 50-100 token context prepended (e.g., 'This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million.'). They applied this to both embedding and BM25 indexes, using prompt caching to amortize cost ($1.02 per million document tokens). Results: 35% reduction in top-20 retrieval failure rate with contextual embeddings alone (5.7% → 3.7%), 49% combined with contextual BM25 (5.7% → 2.9%), 67% with reranking added (5.7% → 1.9%).

PATTERN

The chunk loses what "revenue grew 3%" actually refers to—company, quarter, context—during splitting, not retrieval. Prepending 50-100 tokens of source context at index time restored 35-67% of retrieval failures.

WORKS WHEN

  • Chunks frequently reference entities or timeframes defined elsewhere in the document (financial filings, technical documentation, legal contracts)
  • Knowledge base exceeds 200k tokens (below this threshold, include entire knowledge base in prompt instead)
  • Documents have coherent structure where surrounding context changes meaning of individual chunks
  • Prompt caching is available to amortize the cost of passing full documents repeatedly
  • Documents fit in context window for annotation (article used 8k token documents with 800 token chunks)

FAILS WHEN

  • Chunks are already self-contained (FAQ entries, dictionary definitions, standalone articles with no cross-references)
  • Knowledge base is under 200k tokens—just include the entire knowledge base in the prompt with caching
  • Real-time indexing is required and LLM latency per chunk is unacceptable
  • Key terms and entities are defined in other documents rather than the source document being chunked
  • Documents lack coherent structure that would provide useful disambiguation context

Stage

build

From

September 2024

Want patterns like this in your inbox?

3 patterns weekly. No fluff.