← Back to patterns
build

Why Canva Chose External Vector DB Over In-Memory Search

TRIGGER

Team needed to store and query 150M+ image embeddings with real-time updates as media library changed, plus filter on metadata fields like aspect ratio—but in-memory approach would require dedicated high-RAM machines that don't scale with library fluctuations.

APPROACH

Canva evaluated in-memory (Faiss on dedicated server) vs external vector database for 150M+ image embeddings. In-memory required costly dedicated machine with huge RAM, didn't align with infrastructure patterns, and would need migrations when library size fluctuated. External DB offered real-time updates matching media library changes, metadata field filtering (crucial for aspect ratio matching), and standard persistence patterns. They chose third-party external vector database. Input: image embeddings + metadata (aspect ratio). Output: top-N similar images filtered by metadata constraints.

PATTERN

RAM-based vector search hits scaling cliffs when library size fluctuates. External vector DB trades milliseconds for metadata filtering and standard persistence patterns.

WORKS WHEN

  • Library size fluctuates significantly (seasonal content, partnership changes)
  • Metadata filtering is required (aspect ratio, content type, date ranges)
  • Team prefers standard persistence patterns over custom infrastructure
  • Latency requirements are relaxed (>100ms acceptable)
  • Real-time index updates needed as content library changes

FAILS WHEN

  • Library size is stable and predictable
  • Sub-10ms latency required for user-facing search
  • No metadata filtering needed—pure similarity only
  • Cost sensitivity favors one-time RAM investment over ongoing DB costs
  • Air-gapped or offline deployment required

Stage

build

Source

Canva

From

January 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.