Sub-millisecond p99 latency across 100M+ vectors. Self-host on your own infrastructure with one binary, or run on our managed cloud across 47 regions. Apache-2 licensed, no vendor lock-in.
Idiomatic clients for every major language. Drop-in compatibility for OpenAI embeddings, Cohere, Voyage, and your own models. No specialized infra knowledge required.
Vector search vendor claims are notoriously hand-wavy. We run the standard ANN-Benchmarks suite on our own infrastructure and publish the methodology, datasets, hardware specs, and raw results. Reproduce them yourself.
Latency at p99, queried against 100M 1024-dimension vectors with recall ≥ 99%. Single-node configuration, AWS us-east-1.
Dataset: LAION-100M
Hardware: c7g.8xlarge (32 vCPU, 64 GB)
Index: HNSW (M=32, ef=200)
Concurrent queries: 64
Replication: 3x
Last run: 2025-11-08
Vector storage is the easy part. The hard parts — filtering, hybrid search, reranking, multi-tenancy, schema evolution — are first-class primitives, not bolt-ons.
BM25 sparse and dense vectors combined natively with tunable weighting. Beats either approach alone on every retrieval benchmark we've tested.
Pre-filtered HNSW with no post-filtering correctness compromise. Filter on 100+ fields without latency tax — even at multi-billion vector scale.
Run thousands of customer-scoped namespaces in a single index without storage or compute overhead. Built for AI products with many small users.
Upserts visible to search within 250ms. Live deletes without index rebuilds. Designed for workloads where data changes every second.
First-class reranker integration. Use ours, or bring your own — Cohere, Voyage, Jina, any cross-encoder you trust. Final top-k in one round-trip.
Full SQL interface alongside the native API. JOIN vectors against your existing tables. Plays nicely with the BI tools your team already uses.
Most vector databases stack ANN libraries on RocksDB or FAISS. We wrote our own storage engine in Rust, optimized end-to-end for the read and write patterns vector search actually exhibits.
Memory-mapped indexes with adaptive paging. SIMD-vectorized distance calculations on AVX-512 and ARM NEON. Zero-copy reads from disk to wire when the working set spills.
Lattice-PQ-v2 achieves 32× compression at >99% recall — without the typical accuracy degradation. Working sets that needed 96 GB of RAM now fit in 3 GB.
Same SQL-style planner approach as Postgres, adapted for vector workloads. Picks the cheapest path through hybrid, filtered, and reranked queries automatically.
Recent vectors stay in RAM. Older vectors page to local NVMe. Archival vectors live in object storage. Query planner makes the tradeoff transparently.
Open core. The product is Apache-2 forever — run it yourself, fork it, ship it in your own infra. We make money when you'd rather not run it yourself.
Start with our free tier — 1 million vectors, 5 million queries per month. No credit card. Production-grade infrastructure from day one.