AI and ML engineering

Vector Database Selection: The Reference We Wish Existed

By La BoétieUpdated June 28, 202623 min read

Vector database selection is where a lot of AI and ML engineering work quietly goes wrong, and it rarely shows up as a database problem. It shows up as a retrieval-augmented generation feature that hallucinates in front of a customer, a monthly cloud bill that triples without warning, or a launch that slips because nobody can say which system survives real traffic. This pillar is the reference we wish existed when we made the call ourselves: opinionated, dated, and willing to be wrong in public. A vector database is a system that stores high-dimensional embeddings (numeric representations of text, images, or audio produced by a model) and retrieves them by similarity rather than exact match. Getting the vector database selection right is the difference between a demo and a system that holds.

Key takeaways:

The global vector database market grew from 3.02 billion dollars in 2025 to 3.73 billion in 2026, a 23.5% compound annual growth rate, according to The Business Research Company, and industry forecasts put it near 18 billion by 2034. The field is crowded on purpose, and most comparisons are written by vendors.

Naive RAG, built on fixed-character chunking and plain cosine similarity, fails to retrieve the correct context up to 40% of the time, according to Towards Data Science. The database is rarely the bottleneck; the retrieval design is.

Scale is the first cut: under 1 million vectors, pgvector or Chroma are enough; from 1 to 50 million, Weaviate, Pinecone, and Qdrant all hold; above 50 million, the candidate list narrows fast (Redis, 2026).

The studio house position: pick the database your team can operate, not the one that wins the synthetic benchmark. Operability beats peak queries per second on every engagement we have run.

What vector database selection actually decides

Every entry under this hub answers one question: given your workload, your team, and your budget, which vector store should you commit to, and when should you change your mind? That is the charter. It is narrower than "what is a vector database" and broader than "Pinecone or Qdrant," because the right vector database selection depends on facts about your situation that no vendor page knows.

The decision touches four things at once. It sets your recall ceiling (the share of true nearest neighbours a search returns), your tail latency under load, your monthly cost curve as data grows, and the operational burden your team carries at 3 a.m. when an index goes cold. Treating these as one undifferentiated "performance" question is the first mistake. A system that returns 98% recall in a benchmark and 30 ms answers in a demo can still be the wrong choice if your engineers cannot run it.

The market rewards this confusion. There are more than a dozen credible vector stores in 2026, and the four most-cited learning resources, Pinecone Learn, Chroma, Qdrant, and Weaviate, are all published by vendors with a product to sell. Each is genuinely useful. None is neutral. This pillar exists because no top-ranking page on vector database selection commits to a named engagement, a dated benchmark, or a decision rule you could defend in a board meeting. We commit to all three.

If you take one thing from this section: vector database selection is an operational decision wearing a technical costume. The technical inputs matter, and we cover them in depth below, but the binding constraint is almost always the team that has to live with the choice.

The La Boétie house position on vector database selection

Here is where we disagree with the field. The default advice, repeated across vendor explainers and consultancy white papers, is to start from peak performance: find the database with the lowest latency and highest queries per second, then build around it. We start from the opposite end. The right vector database is the one your team can operate without a dedicated platform engineer, until the day your data forces a change.

That position has consequences. For most teams shipping their first or second AI feature, our default is pgvector, the PostgreSQL extension that adds vector search to a database you already run. It is a no-brainer under 10 million vectors if you are already on Postgres, because every external service you add is a new failure point, a new on-call rotation, and a new latency hop. Instaclustr's 2026 analysis is blunt: pgvector handles single-digit millions of vectors efficiently, and its latency only degrades notably beyond that. Most products never cross that line.

We part from the field a second time on benchmarks. Synthetic single-query benchmarks are close to worthless for vector database selection, because they measure the one condition production never sees: a single client, no filters, no concurrent writes. Tiger Data's May 2025 numbers make the point in the other direction, pgvectorscale hit 471 QPS (queries per second) at 99% recall on 50 million vectors, 11.4 times Qdrant's 41 queries per second at the same recall in that test. You could read that as "Postgres wins." You would be wrong to, because change the recall target, the filter complexity, or the concurrency and the ranking inverts. A benchmark is a fact about a configuration, not about a database.

Where we agree with the consensus: above roughly 50 million vectors, or when you need rich metadata filtering fused with keyword search, a purpose-built engine earns its keep. Qdrant, written in Rust, applies filters during graph traversal rather than before or after, and its quantization can cut memory usage up to 64 times while holding search quality, per Qdrant's own documentation. That is a real architectural advantage, not marketing. The studio position is not "avoid dedicated databases." It is "do not adopt one before your data and your filters demand it." For the deeper version of this argument, see our selection walkthrough and the Pinecone versus pgvector side-by-side.

Schematic of matching an AI workload to a vector database during vector database selection

The selection map: matching your workload to a database

Vector database selection becomes tractable when you stop asking "which is best" and start asking "best for what." Three inputs decide almost everything: your vector count, your filter complexity, and your team's operational capacity. Scale sets the outer bounds. Under 1 million vectors, pgvector and Chroma are comfortable; Chroma is the default in LangChain and LlamaIndex tutorials and the right tool for a proof of concept under 500,000 vectors, per its own positioning. From 1 to 50 million, Weaviate, Pinecone, and Qdrant all hold the line. Above 50 million, the field thins to engines built for distribution, with Pinecone and Milvus the usual finalists (Redis, 2026).

The table below is the studio's working map, current as of June 2026. It is deliberately opinionated.

Database	Deployment model	Comfortable scale	Standout strength	Watch out for
pgvector	Self-hosted on Postgres	Under 10 million	Transactional, zero new infrastructure	Post-filtering, latency past a few million
Chroma	Embedded or local	Under 500,000	Zero-config prototyping	Not built for production concurrency
Qdrant	Open-source plus managed cloud	1 to 100 million-plus	Rust speed, one-stage filtering	You operate it, or pay for cloud
Weaviate	Open-source plus managed cloud	1 to 50 million	Native hybrid search, rich schema	Heavier resource footprint
Pinecone	Managed only	10 million to billions	Operational simplicity, serverless	Proprietary, usage cost at scale

Read the table as a starting hypothesis, not a verdict. The second input, filter complexity, can override scale entirely. Real production queries rarely ask for the nearest neighbours in the abstract; they ask for the nearest neighbours where tenant equals this customer and date is within ninety days. That requires either metadata filtering or full hybrid search, which fuses BM25 keyword scoring with vector similarity. Weaviate, Qdrant, and Pinecone support hybrid search natively; pgvector's reliance on post-filtering, applying the filter after the vector search rather than during it, becomes a real liability on complex queries. If your queries are filter-heavy, weight that far above raw latency in your vector database selection. The decision framework entry quantifies this trade-off with worked numbers.

The third input, operational capacity, is the one teams underweight most and regret most. A vector database selection that ignores who will run the thing at 2 a.m. is not a technical decision, it is a deferred incident. Ask three concrete questions before you commit. Does anyone on the team know how to tune an HNSW index, or rebuild one after a bad deploy? Who owns the upgrade path when the engine ships a breaking change? What is your recovery plan when an index falls out of memory under load? If the answers are vague, weight managed offerings or pgvector heavily, because a database your team cannot operate confidently will cost more in downtime than it ever saves in license fees. Operational fit is a first-class input, not a footnote.

Reading the benchmarks without getting played

Benchmarks are the most abused input in vector database selection, and the abuse is usually structural, not dishonest. A vendor publishes a configuration where their engine wins, which is true, then lets the reader generalise, which is false. To read a benchmark honestly, anchor on four numbers together: recall at a stated k, p50 and p99 latency (median and 99th-percentile response time), queries per second under realistic concurrency, and the dataset size. Any benchmark missing one of those four is decoration.

The 2026 numbers are tighter than the marketing suggests. Most modern vector databases land between 95 and 99% recall at k equals 10 with an HNSW index (Hierarchical Navigable Small World, the graph structure most engines use for approximate nearest neighbour search). DataCamp's 2026 comparison puts Qdrant around 98.5%, Milvus around 97.9%, and Weaviate around 97.2%. Those gaps are real but small, and they invert the moment you change quantization settings or the index build parameters. Latency separates the field more clearly: Qdrant posts a 4 ms p50, and at 100 million vectors it holds a p99 query latency of 30 to 40 ms, according to SaltTechno's 2026 benchmark. That tail-latency number matters more than the median, because users feel the slowest 1% of requests.

The honest move is to benchmark on your own data, with your own filters, at your own concurrency. The minimum viable test is not exotic: sample 100,000 to 1 million of your real vectors, replay a representative query log that includes your actual metadata filters, drive it at the concurrency you expect at peak, and record recall against a brute-force ground truth alongside p50 and p99 latency. Run the same harness against two or three shortlisted engines with comparable index settings. The numbers you get will rarely match the vendor's, and that gap is precisely the information your vector database selection needs. Every team that skipped this step and trusted a vendor chart paid for it later in re-platforming. Our vector db benchmarks entry walks through building a reproducible test harness you can run in an afternoon. Treat published benchmarks as a way to build a shortlist, never as a way to make the final call.

What managed costs, and what self-hosting costs

Cost is the input most teams model last and regret first. The two pricing models, managed and self-hosted, diverge sharply as you scale, and vector database selection that ignores the curve produces nasty surprises. Pinecone's serverless model, introduced in 2024, charges $0.33 per gigabyte per month for storage, $4 per million write units, and $16 per million read units, with a new Builder tier at $20 per month for small teams (MarkTechPost, May 2026). In practice that lands around $70 per month at 10 million vectors and crosses $700 per month at 100 million. Predictable, low-operations, and it climbs with usage. Pinecone Inference adds about $0.08 per million tokens if you generate embeddings on the same platform, a cost teams routinely forget to model.

Self-hosting inverts the trade. Qdrant Cloud runs about $0.078 per gigabyte-hour, roughly $57 per month per gigabyte of RAM, and at 50 million vectors it came in 32% under the comparable Pinecone configuration, $1,824 against $2,700, in LeanOps' 2026 analysis. The same analysis puts the tipping point near 60 to 80 million queries per month: above that, self-hosted Qdrant or Weaviate on a fixed-cost server undercuts managed serverless by three to ten times. The catch is that "fixed-cost server" means your team now owns uptime, upgrades, and incident response.

This is why the studio frames cost as a function of team capacity, not just data volume. A managed database that costs $700 per month but frees an engineer is cheaper than a self-hosted one that saves $400 and consumes a week of on-call attention every quarter. We model both curves explicitly before recommending a path; the vector db cost breakdown entry shows the full spreadsheet, and our managed versus self-hosted analysis names the threshold where the math flips for a typical seed-stage team.

A retrieval pipeline failure being untangled into an ordered grid, illustrating vector database anti-patterns

The anti-patterns that quietly sink projects

Most failed AI features do not fail at the model. They fail earlier, in retrieval, and the failure is almost always one of a handful of anti-patterns. The first and most expensive is treating vector database selection as the whole retrieval problem. The shortcut "RAG equals vector database" is, in Towards Data Science's words, the single biggest source of expensive failures at scale. Naive retrieval with fixed-character chunking and plain cosine similarity misses the correct context up to 40% of the time. No database fixes that; the fix is hybrid retrieval and better chunking, which sit above the database layer.

The second anti-pattern is premature scale. Teams adopt a distributed, billion-vector engine for a product with 200,000 vectors, then spend their first quarter operating infrastructure that solves a problem they do not have. The third is benchmark-driven selection, choosing the database that won a chart instead of the one that fits the workload, then discovering at 10 million vectors that the winning configuration was nothing like production. The fourth is ignoring filtering until launch: building on a post-filtering engine, then watching latency collapse the first time a query combines vector similarity with three metadata conditions.

The pattern under the patterns is optimism about operational load. Every external service is a failure point and a latency source, and teams consistently underestimate the cost of running one more system. We catalogue the full list, with the production symptoms each one produces, in the vector db anti-patterns entry, and we trace one painful real example in the scale issue postmortem. The cheapest anti-pattern to fix is the one you never commit to, which is the entire argument for reading the field before you pick.

Three engagements where this call was load-bearing

The studio position is not theoretical. Three engagements turned on vector database selection, and in each the right answer contradicted the obvious one.

A financial-search platform, roughly 2 million document chunks, regulated EU market, came to us after a stalled build on a managed vector service the team could not afford at projected scale. We moved retrieval onto pgvector inside their existing Postgres instance. Result: one fewer system to secure, transactional consistency between documents and embeddings, and a retrieval cost that effectively disappeared into infrastructure they already paid for. At 2 million vectors, the dedicated engine was solving a problem they did not have.

An insurance-comparison engagement, around 18 million policy-clause vectors with heavy metadata filtering by product, region, and date, went the other way. Here filter complexity, not scale, drove the call. We selected Qdrant for one-stage filtering during graph traversal, which held recall above 97% while filters tightened, where the team's earlier post-filtering setup had collapsed under combined conditions. The lesson: filter-heavy workloads weight the database choice far more than vector count alone.

A catalogue-search build for an auction house, about 600,000 lots with rich descriptions, needed nothing exotic. We shipped it on Chroma for the prototype, validated retrieval quality with real queries, then promoted it to pgvector for production. Total new vector infrastructure operated by the client: none. Each engagement reached a different database, and that is the point. There is no house favourite, only a house method. Our SaaS vector db case study and enterprise field report document the full read from real engagements.

Where to start: a decision tree by starting condition

If you read only one entry under this hub next, pick it by your starting condition, not by curiosity. The order below is the studio's decision tree, and each step bolds the judgement it encodes.

Start with your scale. Count your vectors honestly, including projected growth over twelve months. Under 1 million, read the selection walkthrough and default to pgvector or Chroma. Over 50 million, jump straight to the benchmarks and cost entries, because operability and bill size dominate.
Weigh your filter complexity. If most queries combine vector similarity with metadata conditions, treat hybrid search as a hard requirement and read the decision framework before anything else.
Audit your team's operational capacity. No platform engineer and no appetite for on-call means managed or pgvector, full stop. Read the managed-versus-self-hosted analysis.
Model the cost curve at your real volume. Run both the managed and self-hosted numbers at your projected scale before you commit. The cost breakdown entry has the template.
Pressure-test against the anti-patterns. Before you sign off, check your plan against the four failure modes above. The anti-patterns entry is the checklist.
Read one engagement that matches your shape. Whichever case study, SaaS or enterprise, looks most like your situation, read it last to calibrate expectations.

This map spans the hub's topical tier (the walkthroughs, benchmarks, and field reports that explain the field), its focal tier (the side-by-sides, case studies, and cost breakdowns that decide a single question), and its special tier (the postmortems that show what failure actually looks like). Start where your decision is least certain.

What is changing in vector database selection this year

Three shifts are reshaping vector database selection in 2026, and each moves the default answer. First, Postgres keeps absorbing the low end. pgvectorscale's throughput gains, 471 QPS (queries per second) at 99% recall on 50 million vectors in Tiger Data's test, push the "just use Postgres" ceiling higher every release, which widens the range where adding a dedicated engine is premature.

Second, the managed and self-hosted curves are converging on operability rather than price. Pinecone's serverless pricing and new 20-dollar Builder tier lower the cost of staying managed, while Qdrant's quantization, cutting memory up to 64 times, lowers the cost of self-hosting at scale. The decision is migrating from "what does it cost" to "who do you want operating it," which is exactly where the studio has argued it belonged all along.

Third, hybrid search is becoming table stakes. As teams internalise that naive vector-only retrieval fails up to 40% of the time, native fusion of keyword and vector scoring is shifting from a premium feature to a baseline expectation, which favours engines that built it in. The forward read: by 2027, "vector database" will increasingly mean "hybrid retrieval engine," and vector database selection will be judged on filtering and fusion quality more than on raw nearest-neighbour speed. Plan your selection for the workload you will have, not the one you have today.

How vector database selection connects to the rest of your AI stack

Vector database selection does not live alone. It sits inside the AI and ML engineering family, and the database choice is downstream of decisions you make elsewhere in the stack. Your embedding model determines your vector dimensionality, which drives storage cost and index size. Your chunking and retrieval design, the part that actually decides whether RAG works, sits above the database entirely; the 40% naive-retrieval failure rate is a retrieval-design problem, not a database problem.

Embedding dimensionality is the clearest example of this coupling. A 1,536-dimension embedding stores and indexes very differently from a 384-dimension one; the larger vector roughly quadruples storage and slows index traversal, which can move your vector database selection from "pgvector is fine" to "we need quantization" without a single query changing. Decide your embedding model and dimensionality first, then size the database to it, never the reverse.

Three sibling hubs in the same family carry the rest of the load. The RAG architecture hub covers chunking, reranking, and the retrieval pipeline that wraps your database. The evals hub covers how you measure whether retrieval is actually working, without which every benchmark is theatre. The observability and cost-control hub covers what happens after launch, when query patterns drift and bills move. Read this pillar to choose the store; read those to make the store earn its place. The throughline across all of them is the studio's sovereignty thesis: you should own the system you run, not rent a black box you cannot inspect or move.

How La Boétie helps you make the call

La Boétie is a venture studio, digital agency, and technical consultancy that builds production-grade AI systems for founders and operators, and vector database selection is one of the calls we make on nearly every AI engagement. We do not sell a database. We assess what your workload actually needs and build the right thing, which is frequently simpler and cheaper than what you arrived asking for.

Selection and architecture. We start from your real numbers, vector count, filter complexity, team capacity, and projected cost, and recommend the store you can operate, not the one that wins a chart. Across recent engagements that has meant pgvector at 2 million vectors, Qdrant at 18 million with heavy filtering, and Chroma-to-pgvector for a 600,000-lot catalogue. Different answers, one method.

Build and migration. We implement the retrieval pipeline end to end, chunking, hybrid search, reranking, and the database underneath, and we migrate stalled or over-provisioned systems onto infrastructure that fits. Our team of five to six engineers ships in hours what a DIY build spends a month getting wrong.

Ownership, not lock-in. Grounded in Étienne de La Boétie's 1548 thesis on refusing voluntary servitude, we refuse vendor lock-in by default. You keep ownership of everything we build, and we choose open, portable stores wherever the workload allows. If you are weighing a vector database selection and want a defensible answer rather than a vendor pitch, book a studio intro call and we will pressure-test your plan against your real constraints.

FAQ: vector database selection

What is the best database for vector database selection in 2026?

There is no single best database; the right vector database selection depends on your scale, filter complexity, and team capacity. Under 1 million vectors, pgvector or Chroma are usually enough. From 1 to 50 million, Weaviate, Pinecone, and Qdrant all perform well. Above 50 million, Pinecone or Milvus handle distribution better. Match the database to your workload rather than to a benchmark chart, and weight operability heavily.

When should I use pgvector instead of a dedicated vector database?

Use pgvector when you already run PostgreSQL and stay under roughly 10 million vectors, per Instaclustr's 2026 guidance. It keeps vectors transactional with your relational data and adds no new system to operate. Move to a dedicated engine like Qdrant or Pinecone when you cross that scale, or when filter-heavy queries expose pgvector's post-filtering, which applies metadata filters after the vector search rather than during it.

How much does a vector database cost?

It depends on model and scale. Pinecone serverless runs about $70 per month at 10 million vectors and $700-plus at 100 million (MarkTechPost, 2026). Qdrant Cloud is roughly $57 per month per gigabyte of RAM and came 32% under Pinecone at 50 million vectors (LeanOps, 2026). Self-hosting undercuts managed by three to ten times above 60 to 80 million queries per month, but you absorb the operational cost.

Are vector database benchmarks reliable?

Only partially. Synthetic single-query benchmarks measure a condition production never sees: one client, no filters, no concurrent writes. A meaningful benchmark reports recall at a stated k, p50 and p99 latency, queries per second under realistic concurrency, and dataset size together. Use published benchmarks to build a shortlist, then benchmark on your own data and filters before committing. Numbers invert with quantization and index settings.

Why do most RAG systems fail in production?

Most RAG systems fail in retrieval, not in the model. Naive retrieval with fixed-character chunking and plain cosine similarity misses the correct context up to 40% of the time (Towards Data Science, 2026). The fix is hybrid retrieval, fusing keyword and vector scoring, plus better chunking and reranking. The database matters, but treating "RAG equals vector database" is the single biggest source of expensive failures.

How do I migrate from one vector database to another?

Plan the migration around your filters and your downtime tolerance, not just data copy. Re-embed or export your vectors, rebuild the index on the target engine, validate recall against a held-out query set, then cut over behind a feature flag. The risky step is filtering parity: a database that pre-filters and one that post-filters can return different results for the same query, so validate filtered queries explicitly before you trust the new store.

Conclusion

Vector database selection rewards teams that start from their own constraints and punishes teams that start from a vendor chart. Count your vectors, weigh your filters, audit your operational capacity, and model both cost curves before you commit. Under 10 million vectors on Postgres, pgvector is usually the honest answer; above 50 million or under heavy filtering, a purpose-built engine like Qdrant or Pinecone earns its place. The field will keep moving, hybrid search is becoming the baseline and Postgres keeps climbing, but the method outlasts the rankings. Make your vector database selection the choice your team can operate and own, and revisit it the day your data, not a benchmark, tells you to.

Sources

Further reading:

External sources:

Vector Database Global Market Report: The Business Research Company, 2026
The Top 5 Vector Databases: DataCamp, 2026
pgvector vs Qdrant Benchmark: Tiger Data, 2025
pgvector Key Features and Pros and Cons: Instaclustr, 2026
Best Vector Databases in 2026: Pricing and Scale: MarkTechPost, 2026
Vector Database Cost Comparison 2026: LeanOps, 2026
Vector Database Performance Benchmark 2026: SaltTechno, 2026
Best Open Source Vector Databases Comparison: Redis, 2026
10 Common RAG Mistakes We Keep Seeing in Production: Towards Data Science, 2026
Pinecone Learn: Pinecone, 2026
Qdrant: Qdrant, 2026
Weaviate: Weaviate, 2026
Chroma: Chroma, 2026

Questions

What is the best database for vector database selection in 2026?

When should I use pgvector instead of a dedicated vector database?

How much does a vector database cost?

It depends on model and scale. Pinecone serverless runs about 70 dollars per month at 10 million vectors and 700-plus at 100 million (MarkTechPost, 2026). Qdrant Cloud is roughly 57 dollars per month per gigabyte of RAM and came 32 percent under Pinecone at 50 million vectors (LeanOps, 2026). Self-hosting undercuts managed by three to ten times above 60 to 80 million queries per month, but you absorb the operational cost.

Are vector database benchmarks reliable?

Why do most RAG systems fail in production?

Most RAG systems fail in retrieval, not in the model. Naive retrieval with fixed-character chunking and plain cosine similarity misses the correct context up to 40 percent of the time (Towards Data Science, 2026). The fix is hybrid retrieval, fusing keyword and vector scoring, plus better chunking and reranking. The database matters, but treating RAG as equal to a vector database is the single biggest source of expensive failures.

How do I migrate from one vector database to another?

Work with the studio.