Anthropic's Contextual Chunk Annotation Pattern
TRIGGER
RAG systems were failing to retrieve relevant information because chunks lost context when split from their source documents—a chunk saying 'revenue grew 3%' doesn't specify which company or time period, making it impossible to match against queries like 'What was ACME Corp's Q2 2023 revenue growth?'
APPROACH
Anthropic's team added a preprocessing step: before embedding each chunk, they passed the full document + chunk to Claude Haiku asking for 'short succinct context to situate this chunk within the overall document.' Input: full document + individual chunk. Output: chunk with 50-100 token context prepended (e.g., 'This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million.'). They applied this to both embedding and BM25 indexes, using prompt caching to amortize cost ($1.02 per million document tokens). Results: 35% reduction in top-20 retrieval failure rate with contextual embeddings alone (5.7% → 3.7%), 49% combined with contextual BM25 (5.7% → 2.9%), 67% with reranking added (5.7% → 1.9%).
PATTERN
“The chunk loses what "revenue grew 3%" actually refers to—company, quarter, context—during splitting, not retrieval. Prepending 50-100 tokens of source context at index time restored 35-67% of retrieval failures.”
✓ WORKS WHEN
- Chunks frequently reference entities or timeframes defined elsewhere in the document (financial filings, technical documentation, legal contracts)
- Knowledge base exceeds 200k tokens (below this threshold, include entire knowledge base in prompt instead)
- Documents have coherent structure where surrounding context changes meaning of individual chunks
- Prompt caching is available to amortize the cost of passing full documents repeatedly
- Documents fit in context window for annotation (article used 8k token documents with 800 token chunks)
✗ FAILS WHEN
- Chunks are already self-contained (FAQ entries, dictionary definitions, standalone articles with no cross-references)
- Knowledge base is under 200k tokens—just include the entire knowledge base in the prompt with caching
- Real-time indexing is required and LLM latency per chunk is unacceptable
- Key terms and entities are defined in other documents rather than the source document being chunked
- Documents lack coherent structure that would provide useful disambiguation context