Why Linear Chose Blob Storage Over Vector Databases for Prototyping
TRIGGER
The team needed to validate whether vector embeddings would solve their duplicate detection problem before committing to a specialized vector database, but evaluating multiple vector-specific databases revealed drawbacks including downtime to scale, high cost, and increased ops complexity for their small engineering team.
APPROACH
Linear initially stored vector embeddings as blob columns in their primary data store during proof-of-concept work with their own data, allowing rapid iteration while evaluating long-term storage options. They ensured the blob column wasn't selected in queries unnecessarily since vectors are large compared to typical data. After validating the approach worked, they migrated to PostgreSQL with pgvector on Google Cloud. The team moved development embeddings between providers multiple times during evaluation.
PATTERN
“Weeks lost evaluating Pinecone vs. Weaviate vs. pgvector before proving embeddings even solve your problem. A blob column in your existing database validates the feature hypothesis—specialized infrastructure decisions belong after you know what query patterns you actually need.”
✓ WORKS WHEN
- Validating whether embedding-based features solve the user problem before infrastructure investment
- Team is small and ops capacity for new infrastructure is limited
- Embedding query volume during prototype is low enough that blob retrieval + application-side similarity is acceptable
- Primary datastore has blob/JSONB support and sufficient storage headroom
- Feature requirements are still evolving and optimal vector dimensions/models are undecided
✗ FAILS WHEN
- Prototype needs to demonstrate production-realistic latency to stakeholders
- Dataset is large enough that application-side similarity computation is infeasible (>100k embeddings)
- Team already has vector database expertise and infrastructure
- Compliance requirements mandate specific data storage patterns from day one
- Multiple features will share the embedding infrastructure, justifying upfront investment