Overview
Vector databases are purpose-built systems for storing, indexing, and querying high-dimensional vector embeddings at scale. As AI applications — particularly retrieval-augmented generation (RAG), semantic search, and recommendation systems — have moved into production, choosing the right vector database has become a critical infrastructure decision. [s1]
The landscape in 2026 spans fully managed services like Pinecone, open-source solutions like Milvus, Weaviate, and Qdrant, lightweight embedded options like Chroma, and PostgreSQL extensions like pgvector. Each occupies a distinct niche: Pinecone optimizes for operational simplicity, Milvus for billion-scale throughput, Qdrant for raw query performance, Weaviate for hybrid search, Chroma for prototyping speed, and pgvector for teams already committed to PostgreSQL. [s2] [s3]
This comparison evaluates these vendors across features, performance benchmarks, pricing, and production readiness to help ML engineers, backend developers, and architects make an informed choice.
Vendor Profiles
Pinecone
Pinecone is a fully managed, serverless vector database that abstracts away all infrastructure management. It offers automatic scaling, real-time indexing, and strong SLA guarantees, making it the go-to choice for teams that want production vector search without operational overhead. Pinecone supports metadata filtering with operators like $eq, $lte, and others, and integrates natively with LangChain and LlamaIndex. [s1] [s4]
Pinecone reports 7ms p99 latency and has been proven at billions of vectors in production workloads. Its serverless architecture eliminates capacity planning, but the trade-off is vendor lock-in and higher costs at scale compared to self-hosted alternatives. In late 2025, Pinecone introduced Dedicated Read Nodes (DRN) for predictable performance at high throughput, a "cascading search" pipeline with re-ranking and a proprietary sparse vector embedding model, and enterprise security features including RBAC, audit logs, and AWS PrivateLink. [s5] [s13]
Weaviate
Weaviate is an open-source vector database with a strong focus on hybrid search — combining dense vector similarity with keyword matching and structured metadata filtering. It uses a GraphQL-first API (REST is also available) and supports automatic vectorization through pluggable modules like text2vec-openai. Built-in generative search (RAG) capabilities make it unique among vector databases. In early 2026, Weaviate launched three AI-powered agents — Query Agent (natural language search), Transformation Agent (automated data enrichment), and Personalization Agent (per-user result tailoring) — along with a generally available embedding service. [s2] [s4] [s14]
Weaviate achieves sub-100ms latency on 768-dimensional embeddings and is well-suited for applications that need both semantic and keyword relevance. It runs efficiently below 50 million vectors but requires significantly more memory and compute above that threshold. In late 2025, Weaviate rebranded its cloud tiers — Serverless became "Shared Cloud" and Enterprise became "Dedicated Cloud" — with pricing now starting at $25/month for Standard SLA. [s5] [s6] [s14]
Qdrant
Qdrant is a high-performance, open-source vector database written in Rust, designed for real-time applications with strong filtering requirements. Its advanced query planning engine avoids the common filtered search pitfalls (speed degradation, accuracy collapse) that affect competitors. HNSW indexing is configurable (e.g., m=16, ef_construct=100), and on-disk vector storage is supported for cost-effective large dataset handling. Version 1.16 (early 2026) added the ACORN algorithm for higher-accuracy filtered HNSW queries (97.2% vs 53.3% without it at 4% selectivity) and Inline Storage, which stores quantized vectors directly in HNSW nodes for 10x QPS improvement on disk-based workloads. [s4] [s7] [s15]
Qdrant achieves the lowest latencies in most benchmark scenarios according to its published benchmarks, with 1ms p99 on smaller datasets. Its free tier — 1 GB of vector storage forever, no credit card required — is the most generous among managed offerings. The Rust foundation provides memory safety and a compact deployment footprint suitable for edge scenarios. In January 2026, Qdrant added vendor-agnostic GPU-accelerated indexing (via Vulkan API), delivering up to 10x faster index builds than CPU-only methods. Enterprise features now include SSO, RBAC, and tiered multitenancy. [s5] [s7] [s15] [s16]
Milvus
Milvus is a cloud-native, open-source vector database backed by Zilliz, designed from the ground up for billion-scale deployments. It separates storage and compute, supports GPU acceleration, and offers the widest range of indexing algorithms including IVF_FLAT, HNSW, and product quantization (PQ). SDKs are available for Python, Java, Go, and more. [s2] [s4]
With over 42,000 GitHub stars (as of early 2026), Milvus is the most popular open-source vector database, used in production by NVIDIA, Salesforce, eBay, Airbnb, and DoorDash. Milvus 2.6, released in mid-2025, introduced RaBitQ 1-bit quantization that reduces memory usage by 72% while delivering 4x throughput improvements, built-in embedding functions for OpenAI, AWS Bedrock, and Vertex AI, and full-text search with 3–4x higher throughput than Elasticsearch at equivalent recall. The trade-off is operational complexity — production Milvus typically requires Kubernetes expertise. Managed hosting is available through Zilliz Cloud. [s3] [s5] [s8] [s17]
Chroma
Chroma is an open-source (Apache 2.0) embedded vector database purpose-built for developer experience. Its Python-first, NumPy-like API requires zero configuration and runs within your application process, eliminating network latency. A 2025 Rust rewrite delivered 4x performance improvements over the original Python implementation. Chroma 1.5.0 (February 2026) added collection forking, Chroma Sync for real-time replication, and up to 70% higher data throughput through base64 vector encoding and continued Rust optimization. [s5] [s9] [s18]
Chroma supports vector, full-text, regex, and metadata search out of the box. It is ideal for prototyping, MVPs, and applications under 10 million vectors, but is not designed for production scale beyond that. Chroma Cloud offers managed hosting with $5 of free credits to start. [s9] [s10]
pgvector + pgvectorscale
pgvector is a PostgreSQL extension that adds vector similarity search to the world's most popular relational database. Combined with pgvectorscale (which adds DiskANN indexing and Statistical Binary Quantization), it offers competitive performance at a fraction of the cost of dedicated solutions — Instacart migrated from Elasticsearch to pgvector in 2025, achieving 80% cost savings. pgvector 0.8.0 (released late 2025) introduced iterative index scans that solve the "overfiltering" problem in filtered vector queries, delivering up to 9x faster query processing and 100x more relevant results for filtered searches compared to previous versions. [s5] [s11] [s19]
pgvectorscale achieves 471 QPS at 99% recall on 50 million vectors, and benchmarks show 28x lower p95 latency and 16x higher throughput than Pinecone's storage-optimized index at 25% the cost. The key advantage is unified transactions — vectors and relational data live in one system. Scaling beyond 100 million vectors hits architectural limits. [s8] [s11]
Feature Comparison
The following table summarizes key features across all six vendors.
| Feature | Pinecone | Weaviate | Qdrant | Milvus | Chroma | pgvector |
|---|---|---|---|---|---|---|
| License | Proprietary | BSD-3 | Apache 2.0 | Apache 2.0 | Apache 2.0 | PostgreSQL |
| Deployment | Managed only | OSS + Cloud | OSS + Cloud | OSS + Zilliz Cloud | Embedded + Cloud | PG extension |
| Primary Language | N/A (SaaS) | Go | Rust | Go / C++ | Rust (v2) | C |
| Index Types | Proprietary | HNSW | HNSW | IVF, HNSW, PQ, RaBitQ, GPU | HNSW | IVFFlat, HNSW, DiskANN |
| Hybrid Search | Sparse + dense + reranking | Vector + BM25 | Vector + payload | Vector + full-text + scalar | Vector + full-text | Vector + SQL |
| Max Practical Scale | Billions | ~50M efficient | ~50M efficient | Billions | ~10M | ~100M |
| SDK Languages | Python, JS, Java, Go | Python, JS, Java, Go | Python, JS, Rust, Go | Python, Java, Go, JS | Python, JS | Any PG client |
| GPU Support | No | No | Yes (indexing) | Yes | No | No |
Performance Benchmarks
Performance varies significantly with dataset size, dimensionality, hardware, and recall targets. The numbers below aggregate findings from multiple independent benchmarks including Qdrant's benchmark suite (which uses ann-benchmarks datasets), VectorDBBench by Zilliz, and community comparisons. All figures should be treated as relative guides rather than absolutes. [s7] [s8]
1M Vectors, 1536 Dimensions
At a common RAG-scale workload (1 million OpenAI-dimension vectors), the following results have been reported: [s4]
| Metric | Pinecone | Weaviate | Qdrant | Milvus |
|---|---|---|---|---|
| Query Latency (p50) | 20 ms | 15 ms | 8 ms | 12 ms |
| Query Latency (p99) | 50 ms | 40 ms | 25 ms | 35 ms |
| Queries / Second | 500 | 800 | 1,500 | 1,200 |
| Index Build Time | ~5 min | ~8 min | ~4 min | ~6 min |
Large-Scale Observations
At 50 million vectors, pgvectorscale achieves 471 QPS at 99% recall — 11.4x the throughput of Qdrant (41 QPS) at the same recall level on the same benchmark. At this scale, pgvectorscale also delivers 28x lower p95 latency and 16x higher throughput than Pinecone's storage-optimized (s1) index, at roughly 25% the cost. [s8] [s11]
Milvus dominates at extreme scale. Milvus 2.6's RaBitQ 1-bit quantization compresses indexes to 1/32 of FP32 size; benchmarks on 1M 768-dimensional vectors show 864 QPS with IVF_RABITQ (3x higher than IVF_FLAT) at 94.7% recall. With multi-node configurations and GPU acceleration, Milvus sustains approximately 120,000 inserts per second. Qdrant 1.16's Inline Storage mode achieved 211 QPS vs 20 QPS (10x improvement) on low-RAM disk-based systems with comparable accuracy to in-memory setups. [s7] [s8] [s15] [s17]
What Matters More Than Benchmarks
Tail latency (p99) matters more than median. A system with 10ms median but 500ms p99 feels slower to users than one with 20ms median and 50ms p99. Additionally, filtered search performance — how well the database handles queries with metadata constraints — varies dramatically between vendors and is often poorly represented in standard benchmarks. Qdrant's ACORN algorithm (v1.16) achieves 97.2% accuracy on filtered HNSW queries at 4% selectivity, compared to 53.3% without it. pgvector 0.8.0's iterative index scans solve the "overfiltering" problem that previously caused filtered queries to return incomplete results. [s7] [s8] [s15] [s19]
Pricing
Pricing models have shifted from per-pod billing to serverless consumption-based pricing. This is advantageous for low-traffic workloads but can become expensive at high query volumes. [s12]
| Vendor | Free Tier | Paid Starting At | Pricing Model |
|---|---|---|---|
| Pinecone | Yes (limited) | $0.33/GB storage + read/write units | Consumption-based (serverless) |
| Weaviate Cloud | Free sandbox | $25/month (Shared Cloud) | Dimensions stored + SLA tier |
| Qdrant Cloud | 1 GB forever | $25/month; Hybrid $99/month | Storage + compute |
| Zilliz (Milvus) | 5 GB storage | ~$0.15/CU/hour; Serverless from $89 | Compute Units or serverless |
| Chroma Cloud | $5 free credits | Usage-based after credits | Consumption-based |
| pgvector | Free (extension) | Your PostgreSQL costs only | Infrastructure cost |
For datasets under 50 million vectors, managed SaaS (Pinecone, Weaviate Cloud) is often cheaper than self-hosting when accounting for the hidden cost of DevOps, monitoring, and on-call. At high query volumes (1,000+ QPS), Pinecone's consumption-based read units can scale costs linearly — teams at this scale should evaluate Pinecone's Dedicated Read Nodes (DRN) for predictable pricing, or self-hosted Qdrant or Milvus for cost control. Pinecone now offers annual commit discounts for Standard and Enterprise plans. [s5] [s12] [s13]
pgvector stands out as the most cost-effective option for teams already running PostgreSQL. Instacart's 2025 migration from Elasticsearch to pgvector reportedly achieved 80% cost savings on storage and indexing while simplifying their architecture. [s11]
Use Case Recommendations
The right vector database depends on your team's engineering resources, scale requirements, and existing infrastructure. The following recommendations are based on common deployment patterns.
Prototyping and MVPs
Chroma is the fastest path from idea to working prototype. Its embedded architecture eliminates infrastructure setup, and the Python-first API feels native to ML workflows. For early-stage projects that may need to scale later, starting with Chroma and migrating to a production database is a well-trodden path. [s5] [s9]
PostgreSQL-Native Teams
pgvector + pgvectorscale is the obvious choice when your data already lives in PostgreSQL. You get vector search alongside relational queries, ACID transactions, and your existing backup and monitoring infrastructure. It handles up to ~100 million vectors before hitting architectural limits, which covers 80% of real-world AI workloads. [s5] [s11]
Production RAG Systems
Pinecone is the safe choice for teams that want production vector search with zero operational overhead — particularly when SLA guarantees, automatic scaling, and ecosystem integrations (LangChain, LlamaIndex) matter more than unit cost. Qdrant is the alternative for teams comfortable with self-hosting who want lower latency and lower cost, with the best free tier in the market. [s1] [s5]
Hybrid Search (Semantic + Keyword)
Weaviate leads in hybrid search, combining dense vector similarity with BM25 keyword matching in a single query. Its schema-based data modeling and built-in generative search make it the strongest choice for applications where both meaning and specific keywords determine relevance — such as e-commerce search or document retrieval. [s2] [s5] [s6]
Billion-Scale Deployments
Milvus is the only open-source option purpose-built for billion-vector scale with GPU acceleration and distributed compute-storage separation. Teams choosing Milvus should budget for Kubernetes expertise and operational complexity, or use Zilliz Cloud for a managed experience. Pinecone is the managed alternative at this scale, trading cost efficiency for simplicity. [s3] [s5] [s8]
Budget-Conscious and Edge Deployments
Qdrant offers the best balance of performance, cost, and deployment flexibility for datasets under 50 million vectors. Its Rust-based binary is compact enough for edge deployments, the 1 GB free cloud tier is genuinely useful for small projects, and self-hosted Qdrant on modest hardware delivers excellent price-performance. [s5] [s7]
References
- Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs FAISS vs Milvus vs Chroma (2025) — LiquidMetal AI
- How do I choose between Pinecone, Weaviate, Milvus, and other vector databases? — Milvus
- Best 17 Vector Databases for 2026 — lakeFS
- Vector Database Comparison 2026: Pinecone vs Weaviate vs Qdrant vs Milvus — Jishu Labs
- Best Vector Databases in 2025: A Complete Comparison Guide — Firecrawl
- Top 9 Vector Databases as of February 2026 — Shakudo
- Vector Database Benchmarks — Qdrant
- VectorDBBench — Zilliz (GitHub)
- Chroma — Open-source search and retrieval database for AI — Chroma
- Chroma Pricing — Chroma
- PostgreSQL vs Vector Database: Why PostgreSQL Wins (2025) — DBA Dataverse
- Top 5 Vector Databases for Enterprise RAG: Cost Comparison (2026) — Rahul Kolekar
- 2025 Release Notes: Dedicated Read Nodes, Sparse Vectors, RBAC — Pinecone Docs
- Weaviate Blog: AI Agents (Query, Transformation, Personalization) — Weaviate
- Qdrant 1.16: Tiered Multitenancy, ACORN, Inline Storage — Qdrant
- Qdrant 2025 Recap: GPU Indexing, Enterprise Features — Qdrant
- Milvus 2.6: Affordable Vector Search at Billion Scale — Milvus Blog
- Chroma Changelog: v1.5.0, Collection Forking, Sync — Chroma
- pgvector 0.8.0 Released: Iterative Index Scans — PostgreSQL