Why Canva Chose External Vector DB Over In-Memory Search
TRIGGER
Team needed to store and query 150M+ image embeddings with real-time updates as media library changed, plus filter on metadata fields like aspect ratio—but in-memory approach would require dedicated high-RAM machines that don't scale with library fluctuations.
APPROACH
Canva evaluated in-memory (Faiss on dedicated server) vs external vector database for 150M+ image embeddings. In-memory required costly dedicated machine with huge RAM, didn't align with infrastructure patterns, and would need migrations when library size fluctuated. External DB offered real-time updates matching media library changes, metadata field filtering (crucial for aspect ratio matching), and standard persistence patterns. They chose third-party external vector database. Input: image embeddings + metadata (aspect ratio). Output: top-N similar images filtered by metadata constraints.
PATTERN
“RAM-based vector search hits scaling cliffs when library size fluctuates. External vector DB trades milliseconds for metadata filtering and standard persistence patterns.”
✓ WORKS WHEN
- Library size fluctuates significantly (seasonal content, partnership changes)
- Metadata filtering is required (aspect ratio, content type, date ranges)
- Team prefers standard persistence patterns over custom infrastructure
- Latency requirements are relaxed (>100ms acceptable)
- Real-time index updates needed as content library changes
✗ FAILS WHEN
- Library size is stable and predictable
- Sub-10ms latency required for user-facing search
- No metadata filtering needed—pure similarity only
- Cost sensitivity favors one-time RAM investment over ongoing DB costs
- Air-gapped or offline deployment required