← Back to patterns
build

Notion's Task-Based Model Routing Architecture

TRIGGER

AI features span different task types with conflicting requirements—some need deep reasoning and long-form coherence, others need fast responses at high volume with simpler outputs. Using a single model for all tasks means either overpaying for simple tasks or underperforming on complex ones.

APPROACH

Notion's AI team built a task-based routing layer that classifies each user request and dispatches it to the optimal model. Input: user request + inferred task category. Output: response from the category-optimal model, with ongoing regression testing across dozens of models and hundreds of prompts. Writing product specs routes to high-reasoning models (e.g., Claude, GPT-4) for fluency and long-form coherence. Question-answering about workspace history routes to models with large context windows and exhaustive reasoning for citation accuracy. High-volume structured tasks like auto-filling database fields route to specialized fine-tuned models, cutting latency by 50% while simultaneously improving output quality. The system is validated by AI Data Specialists (a hybrid QA/prompt engineering role) using an LLM-as-a-judge evaluation framework with custom criteria per feature.

PATTERN

A fine-tuned specialist beats your expensive generalist on both speed AND quality for narrow tasks—so 'which LLM' is the wrong question. Model selection is a per-task architecture decision. The trap is treating it as global configuration when your product's tasks have fundamentally different optimization targets.

WORKS WHEN

  • Product has distinct AI task categories with different quality/latency/cost priorities
  • High-volume simple tasks (field extraction, classification) coexist with low-volume complex tasks (generation, reasoning)
  • You have enough task-specific training data to fine-tune specialist models for high-volume categories
  • Latency requirements vary significantly across features (sub-second for autocomplete, multi-second acceptable for document generation)
  • Cost scales with volume and some task categories dominate inference spend

FAILS WHEN

  • All tasks have similar complexity and latency requirements
  • Volume is too low to justify fine-tuning costs or maintaining multiple model integrations
  • Tasks are highly interdependent and need consistent reasoning across a single context
  • Rapid model switching latency exceeds the gains from specialization
  • Team lacks infrastructure to manage routing logic and multiple model deployments

Stage

build

From

May 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.