build

Why Canva Chose External Vector DB Over In-Memory Search

TRIGGER

Team needed to store and query 150M+ image embeddings with real-time updates as media library changed, plus filter on metadata fields like aspect ratio—but in-memory approach would require dedicated high-RAM machines that don't scale with library fluctuations.

APPROACH

Canva evaluated in-memory (Faiss on dedicated server) vs external vector database for 150M+ image embeddings. In-memory required costly dedicated machine with huge RAM, didn't align with infrastructure patterns, and would need migrations when library size fluctuated. External DB offered real-time updates matching media library changes, metadata field filtering (crucial for aspect ratio matching), and standard persistence patterns. They chose third-party external vector database. Input: image embeddings + metadata (aspect ratio). Output: top-N similar images filtered by metadata constraints.

PATTERN

“RAM-based vector search hits scaling cliffs when library size fluctuates. External vector DB trades milliseconds for metadata filtering and standard persistence patterns.”

✓ WORKS WHEN

Library size fluctuates significantly (seasonal content, partnership changes)
Metadata filtering is required (aspect ratio, content type, date ranges)
Team prefers standard persistence patterns over custom infrastructure
Latency requirements are relaxed (>100ms acceptable)
Real-time index updates needed as content library changes

✗ FAILS WHEN

Library size is stable and predictable
Sub-10ms latency required for user-facing search
No metadata filtering needed—pure similarity only
Cost sensitivity favors one-time RAM investment over ongoing DB costs
Air-gapped or offline deployment required

Stage

build

Source

Canva →

From

January 2025