build

What Linear Learned About Existing Data as Fine-Tuning

TRIGGER

Generic AI models don't understand how a specific team organizes work—what their labels mean, who handles what, or how they've historically classified similar issues. Without this context, suggestions feel generic and require constant correction.

APPROACH

Linear's team (engineer Yann-Edern Gillet) rebuilt their search infrastructure from basic keyword matching to a unified semantic backend using vector search. Input: new issue text. Output: ranked list of semantically similar historical issues. For Triage Intelligence, incoming issues are embedded and matched against the existing backlog to surface candidate similar issues. These candidates become few-shot context for LLMs (initially GPT-4o mini and Gemini 2.0 Flash, later upgraded to GPT-5 and Gemini 2.5 Pro for better nuanced reasoning) that evaluate duplicates, related issues, and property suggestions like labels and assignees. The backlog becomes an implicit training set that improves with each organized issue.

PATTERN

“Skip the fine-tuning project—your backlog is already training data. Instead of training custom models or writing elaborate prompts describing team conventions, retrieve examples of how this team has already solved similar problems and let the model infer the pattern. Every organized issue becomes implicit few-shot context.”

✓ WORKS WHEN

Teams have an existing backlog with consistent organization patterns (100+ well-labeled issues)
Classification rules are implicit in historical decisions rather than explicit documentation
Search infrastructure can surface semantically similar items, not just keyword matches
Teams want personalized suggestions without maintaining explicit configuration
The domain has natural clustering where similar issues should be handled similarly

✗ FAILS WHEN

Backlog is empty, inconsistent, or full of misclassified issues (garbage in, garbage out)
Organization rules are explicit and rule-based rather than pattern-based (use deterministic automation instead)
Privacy/security constraints prevent using historical data as context
Classification categories are new or changing rapidly—no historical precedent exists
Team wants to break from historical patterns rather than reinforce them

Stage

build

Source

Linear →

From

September 2025