Most Teams Don’t Need a Vector Database

Share This Article

Share on linkedin
Share on facebook
Share on pinterest
Share on reddit
Share on twitter

The word “embeddings” shows up in a planning doc, and within a sentence or two someone has added a vector database to the architecture. I’ve watched it happen on calls more than once. RAG gets mentioned, a box labeled Pinecone or Chroma appears on the diagram, and nobody asks whether it needs to be there. It just arrives, the way a queue or a search cluster used to.

The reflex makes sense on the surface. Vectors feel new and specialized, so they seem to want a new and specialized home. Every RAG tutorial reaches for a dedicated store in step one, which quietly teaches a generation of engineers that this is simply how it’s done. And the dedicated tools are good. Pinecone, Qdrant, Weaviate, and Milvus are real, well-built systems that earn their place at the scale they’re designed for.

Here’s the narrower claim I’d actually defend: most teams adding a vector database don’t have a vector problem yet. They have a few hundred thousand embeddings, a Postgres instance already running, and a reflex. Is that a vector problem, or a habit? The database you already operate can almost certainly do this job, and the point where it stops being enough sits much further out than the instinct to add a second system suggests. This isn’t a “you never need a vector database” argument. It’s a “check first” argument, and the check is cheap. The same was true for search, queues, and JSONB.

Why does every team suddenly want a vector database?

Mostly because the tutorials told them to, not because their workload demands it. The reflex is narrative-driven. RAG walkthroughs default to a dedicated store on the first page, the reference architectures from model vendors include a vector DB box, and “vector” sounds different enough from “database” that reusing the one you already have feels like a hack rather than the obvious move.

In the systems I’ve reviewed, the embedding count at launch is usually in the low hundreds of thousands. That’s a rounding error for a competent index. The dedicated store was sized for a scale the product hadn’t reached and, in a few cases, never would. The tool wasn’t bad. The timing was.

There’s nothing wrong with the dedicated systems. The problem is sequencing. Reaching for one in step one means you take on a second system before you’ve measured whether the first one falls short. What would it cost to check before committing? Almost nothing, which is the whole point.

Can Postgres be your vector database?

Yes, for most teams, through pgvector. pgvector is an open-source Postgres extension that adds a vector type and approximate nearest-neighbor search, with three distance operators (L2, inner product, and cosine) and two mature index types: IVFFlat and the now-default HNSW. It turns the database you already run into a capable vector store without adding a service to operate.

The “Postgres is slow at vectors” line you may have heard is dated. It comes from the IVFFlat era, before HNSW indexing landed in pgvector 0.5.0 and closed most of the gap. With an HNSW index, similarity search at moderate scale commonly returns in single-digit milliseconds.

sql
-- Enable the extension that ships vector types and ANN indexes
CREATE EXTENSION IF NOT EXISTS vector;

-- Store embeddings next to the rows they describe
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- Build an HNSW index for the distance you actually query with
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Nearest-neighbor search, filtered by ordinary SQL, in one query
SELECT id, title
FROM documents
WHERE tenant_id = 42          -- a plain WHERE clause, not a bolt-on filter
ORDER BY embedding <=> $1     -- cosine distance to the query embedding
LIMIT 5;

The quiet advantage is in that last query. Your embeddings live in the same place as the rows they belong to, so a similarity search and an ordinary WHERE clause run together, in one round trip, inside one transaction. There’s no copy of the data in a second system and no tenant filter bolted on after the fact. On modern hardware a single well-tuned instance handles low millions of vectors comfortably, and with the pgvectorscale extension’s StreamingDiskANN index, Postgres stays competitive into the tens of millions.

Where’s the line? HNSW wants its graph in memory, and that appetite grows with both vector count and dimensions. Ten million 1536-dimension float32 vectors are roughly 60 GB of raw data before the index graph sits on top, which is more RAM than most single instances carry. The standard vector type also indexes up to 2,000 dimensions; past that you reach for the halfvec type or trim dimensions. When the working set stops fitting in RAM on a box you’re willing to pay for, that’s the real signal, not a number from a tutorial.

What does a dedicated vector database actually buy you?

At genuine scale it buys tuned ANN performance, strong filtered search, and autoscaling someone else operates. Qdrant is known for fast filtered search and a clean self-hosting story. Pinecone sells zero-ops managed scale. Weaviate leans into native hybrid search, and Milvus targets billion-vector workloads. Those are real advantages, at the scale that needs them.

Each of these wins a specific fight. If you’re serving sustained high query volume against a tight p99 target, a purpose-built engine tuned for exactly that will beat a general database trying to do everything. If your search is mostly filtered (this user, this workspace, this language) at large scale, Qdrant’s filtering is genuinely ahead. If you want to never think about the box it runs on, Pinecone’s managed tier is the path of least resistance.

Where’s the line? The catch is the word “genuine.” Most of these advantages only show up above several million vectors, at high and sustained QPS, or with filtering needs your primary database can’t express. Below that, you’re buying a faster engine for a race you’re not running, and paying for it with an extra system on the diagram and an extra page in the runbook.

Isn’t the vector search the slow part?

Usually not. In a normal retrieval-augmented-generation request, the approximate nearest-neighbor query often returns in single-digit milliseconds, while embedding the query and waiting on the model call eat the overwhelming majority of the latency budget. The datastore is rarely the bottleneck people assume it is.

This is the part that reframes the whole decision. If the vector query takes five milliseconds and the round trips to the embedding model and the LLM take several hundred, then shaving a millisecond off retrieval by switching databases is invisible to the user. You’d be funding a migration and a permanent second system to optimize the fastest step in the pipeline. So what exactly are you speeding up?

Measure where the time actually goes before you choose a store on speed. More often than not the clock points at the model calls and the network, not the index. Retrieval does become the bottleneck eventually, but mainly at high QPS or across a huge corpus, which loops right back to the same scale threshold.

The cost nobody puts in the RAG diagram

A separate vector store quietly adds a synchronization problem you didn’t have before. The source rows live in your primary database and their embeddings live somewhere else, so every insert, update, and delete now has to land in two places and stay consistent. That pipeline is invisible on the architecture slide and very visible at 2am.

The failures I’ve seen here are dull and expensive. A backfill job half-finishes and the vector store drifts out of sync with the rows it indexes. A document gets deleted from Postgres but lingers in the vector DB, so search keeps returning a hit that no longer exists. None of it is exotic. It’s just the ordinary tax of keeping two systems agreeing about the same data, and someone has to own that tax. Who owns the pipeline when it drifts?

You also lose the easy join. With embeddings in Postgres, “find similar documents this user is allowed to see” is one query. Split across two systems, it becomes fetch from one, filter in the other, or replicate your access rules into a place they were never meant to live. Add the second failure mode and the second on-call surface, and the true price of the dedicated store is a lot more than its monthly bill.

Where’s the line? When the embeddings genuinely belong to a different scale or access pattern than your relational data, the sync cost can be worth paying. Until then it’s pure overhead, and it’s the line item teams underprice the most.

So how do you actually decide?

Skip the benchmark for a minute. The decision is mostly four questions, and you can answer them before writing any code.

1. How many vectors will you really have in twelve months? Not the pitch-deck number, the honest one. Under a few million and you’re comfortably in pgvector territory. Tens of millions and pgvectorscale is still in play. Hundreds of millions or billions is where a dedicated system starts to be the reasonable default.

2. What’s your actual latency and throughput target? A handful of queries per second with a relaxed latency budget is easy for Postgres. Sustained high QPS against a tight p99 is where purpose-built engines pull ahead. Write the number down before you shop, not after.

3. Do the embeddings need to live with the rest of your data? If your searches are filtered by tenant, permission, status, or date, that’s a join, and keeping it in one database is a real advantage. If the vectors are an island with no relational neighbors, a separate store costs you less.

4. Is the data already in Postgres? If yes, the burden of proof sits with the new system. You’re not adding vector search to your stack. You’re deciding whether to add a second database, and that should have to justify itself.

If three of the four point at the database you already run, that’s your answer, and you’ve saved yourself a system. If three point the other way, you have a real reason rather than a reflex, and now you can choose between Pinecone, Qdrant, Weaviate, and the rest on their merits.

Start with the database you already run

The default that holds for most teams is unglamorous. Turn on pgvector, put the embeddings next to the rows they describe, index them with HNSW, and ship. You get vector search, transactional consistency, your existing backups and access controls, and one system to operate instead of two. When real measured load tells you it isn’t enough, you’ll know, and you’ll know exactly which dedicated store solves the specific limit you hit.

The expensive mistake runs the other way. A team stands up a dedicated vector database for a few hundred thousand embeddings, inherits a sync pipeline and a second on-call surface, and spends the next year maintaining infrastructure sized for a scale they may never reach. The dedicated store wasn’t wrong as a technology. It was just early, and early is its own kind of cost.

So before the vector database goes on the diagram, ask the cheap question first. Most teams who ask it find they already own a perfectly good answer. Most teams don’t need a vector database. They need to check whether they do.

Naveen Chandra

Hi, I am Naveen Chandra, a Cloud Engineer and Web Developer. I work with companies that take their technology seriously and want a long-term partner, not a short-term contractor. From AWS infrastructure and DevOps automation to full-stack web platforms and React Native apps, I focus on systems that compound in value over years rather than projects that end in weeks.