Postgres Does More Than You Think

Share This Article

I’ve sat in a lot of architecture conversations, and they tend to share the same moment. Someone says “we’ll need a search engine for that,” or “let’s put a queue in front of it,” or “we should stand up a vector database.” Everyone nods. A second system quietly enters the diagram. Nobody argues, because each of those tools is genuinely good at what it does. The reflex is reasonable. It’s also, more often than I’d like, unnecessary.

The data is already in Postgres. The team already knows how to operate Postgres. And Postgres, the boring dependable thing already sitting in the middle of the system, can probably do the job you’re about to bring in a new dependency for. Not always. But often enough that “can Postgres do this?” deserves to be the first question, not the fallback you reach for after the dedicated tool turns out to be more trouble than it’s worth.

This isn’t a “you never need anything but Postgres” argument. Specialized systems exist for real reasons, and there are clear points where reaching for one is the correct call. My point is narrower and more useful. The line where Postgres stops being enough usually sits much further out than the instinct to add a second system suggests, and crossing it early costs more than people account for.

The part nobody prices in

Adding a datastore feels free in the design meeting. It isn’t. The cost shows up later, spread across months, which is exactly why it’s so easy to underweight while you’re still sketching boxes on a whiteboard.

A second system is a second thing to deploy, version, patch, and back up. It’s a second failure mode in every incident. Now when something is slow, you’re checking two systems and the network between them. It’s a second consistency boundary too. The moment your data lives in both Postgres and the new store, you own the synchronization problem, and dual-write bugs are some of the most miserable to track down, because they only surface under specific orderings of events. It’s a second access-control surface, a second set of metrics to learn, and a second body of operational knowledge that has to live in more than one person’s head or it becomes a single point of human failure.

None of these is catastrophic on its own. Together, they’re the reason small teams with five backing services spend more time keeping the plumbing alive than building anything. The question I keep coming back to isn’t “is the dedicated tool better at this one task than Postgres?” It frequently is. The real question is whether it’s enough better, at your scale, to justify permanently carrying a second system. That’s a much higher bar, and plenty of architecture decisions don’t clear it once you price the whole thing honestly.

With that framing, here’s the tour. For each capability I’ll cover what people reach for, what Postgres actually does, and the part that matters most: where the line is.

Can Postgres handle full-text search instead of Elasticsearch?

For the search requirement most applications actually have, yes. And it stays consistent with your data, which a separate index never quite manages. Postgres has had full-text search built in for over a decade. You get a tsvector type that holds the parsed, normalized, stemmed tokens of your text, a tsquery type for the search expression, a @@ match operator between them, and a GIN index to make it fast.

sql

ALTER TABLE articles
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
  ) STORED;

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

SELECT title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgres replication')
ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', 'postgres replication')) DESC
LIMIT 20;

That generated column keeps the index current automatically, so you never have to remember to update it. websearch_to_tsquery accepts the kind of query syntax users already type, like “quoted phrases” and -exclusions, and ts_rank gives you relevance ordering. Stemming, stop words, and language configuration are handled for you. For typo tolerance you add the pg_trgm extension and trigram matching, and for prefix or autocomplete you can lean on trigram indexes too.

Where’s the line? When search is the product. Faceted filtering across many dimensions, aggregations over hits, sophisticated relevance tuning, multi-language analyzers, search-as-you-type over millions of documents with sub-50ms expectations. That’s when a dedicated engine earns its place. But for “let users find a record by typing words that appear in it,” which is the actual requirement behind most search tickets, Postgres isn’t a compromise. It’s the right tool, and it keeps your search results transactionally consistent with the data instead of eventually consistent through an indexing pipeline you now have to babysit.

Is JSONB a real alternative to a document database?

For schema-flexible data inside an otherwise relational application, JSONB is a genuine alternative, not a stopgap. The reflex here is to reach for MongoDB or another document database the moment the data is “semi-structured” or the shape isn’t nailed down yet. But the jsonb type stores binary JSON, indexes it, and queries into it. You get schema flexibility where you want it and relational guarantees where you need them, in the same table, inside the same transaction.

sql

CREATE TABLE events (
  id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id     bigint NOT NULL REFERENCES users(id),
  payload     jsonb  NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX events_payload_idx ON events USING GIN (payload);

-- Documents containing this structure (containment)
SELECT id
FROM events
WHERE payload @> '{"type": "signup", "plan": "pro"}';

-- Index a hot path for equality / range queries
CREATE INDEX events_source_idx ON events ((payload->>'source'));

The @> containment operator backed by a GIN index is the workhorse for “find documents that look like this.” For specific fields you filter on constantly, an expression index on payload->>'field' gives you targeted speed. And because it’s still Postgres, the flexible document column sits right next to your foreign keys, your NOT NULL constraints, and your joins. You get the loose part and the strict part without running two databases with two consistency models.

Where’s the line? If your access pattern is genuinely document-first at large scale, sharded across many nodes, with write throughput beyond what a single primary handles and a data model that never touches relational structure, then a document database built for that topology will serve you better. But “we’re not sure of the schema yet” and “some fields vary by record” aren’t that. They’re jsonb. Reaching past it usually means taking on a distributed system’s operational weight to solve a problem you don’t have yet.

Can Postgres do vector search without a dedicated vector database?

Yes, and for most retrieval features it’s the better default, because your embeddings stay in the same row as the content they describe and the metadata you filter on. The pgvector extension adds a vector type, distance operators, and approximate-nearest-neighbor indexes.

sql

CREATE EXTENSION IF NOT EXISTS vector;

ALTER TABLE documents ADD COLUMN embedding vector(1536);

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops);

-- Nearest neighbors to a query embedding, with a plain SQL filter
SELECT id, content
FROM documents
WHERE workspace_id = $2
ORDER BY embedding <=> $1
LIMIT 10;

The <=> operator is cosine distance, and HNSW gives you fast approximate search. The detail I think most “do we need a vector DB” debates skip right past is that WHERE workspace_id = $2 sitting quietly in the query. Metadata filtering and vector similarity in one statement, against data that’s transactionally consistent with the rest of your application. With a standalone vector store you’re keeping vectors in one system and their source records in another, which buys you a sync pipeline, a staleness window, and the eventual 2am question of why a search result points at a document that no longer exists.

Where’s the line? Billions of vectors, or query volume that needs you to shard the index itself horizontally. That’s where purpose-built vector infrastructure pulls ahead. Most teams shipping a retrieval feature are working with thousands to low millions of vectors, which is comfortably within what Postgres handles on hardware you’re already paying for. This one is enough of its own topic that it’s the subject of the next post: why most teams reaching for a vector database don’t actually need one.

Should you use Postgres as a job queue?

For the background-job workload most applications actually have, a Postgres queue is reliable and quietly removes a moving part. The reflex is to add Redis, RabbitMQ, or SQS the moment work needs to happen asynchronously. The feature that makes the Postgres version work is FOR UPDATE SKIP LOCKED. It lets multiple workers pull from the same table at once without stepping on each other and without blocking. Each worker locks the row it claims and skips the rows other workers have already locked.

sql

WITH next_job AS (
  SELECT id
  FROM jobs
  WHERE status = 'pending'
    AND run_at <= now()
  ORDER BY run_at
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE jobs
SET status = 'running', started_at = now()
FROM next_job
WHERE jobs.id = next_job.id
RETURNING jobs.id, jobs.payload;

That gives you exactly-once claiming under concurrency, in a single round trip. And because the queue is just a table, you get things for free that are awkward or impossible in a lightweight broker. Jobs are enqueued in the same transaction as the data change that triggered them, so there’s no “I committed the row but the enqueue failed” gap. Failed jobs are inspectable with a SELECT. Retries and scheduling are columns you can query and reason about. Your whole job history is data you can audit. Mature libraries in most ecosystems are built on exactly this pattern, so you’re not hand-rolling any of it.

Where’s the line? Very high throughput, the sustained tens-of-thousands-of-messages-per-second kind. Or fan-out patterns, streaming semantics, and topic-based routing. Those are where a real message broker is the right tool. For the “run this work in the background, reliably, with retries” requirement most applications have, a Postgres-backed queue gives you transactional enqueue, and that’s worth more than it sounds. You’ll appreciate it the first time a dual-system setup loses a job in the gap between the commit and the publish.

Lightweight pub/sub and cache

Two smaller ones, because the reflex to add Redis often bundles them in. Neither replaces Redis at the high end. But a surprising number of Redis deployments exist to do exactly these two jobs.

For event notification, Postgres has LISTEN / NOTIFY. A session subscribes to a channel, any transaction can publish to it, and the notification fires when that transaction commits.

sql

-- In the transaction that created the work
NOTIFY job_created, '{"id": 42}';

-- In a worker session
LISTEN job_created;

This is ideal for “wake up, there’s work to do” signaling. Pair it with the queue table above and your workers react immediately instead of polling on a timer. The commit-coupling matters here: listeners only hear about work that actually committed.

For caching and ephemeral key-value, an UNLOGGED table skips the write-ahead log, trading crash-durability for speed. That’s exactly the right trade for derived data you can regenerate. It won’t replace Redis as a dedicated high-throughput cache, and it isn’t trying to. But for moderate caching needs it can take a whole service out of your stack.

Where’s the line? These are deliberately the modest end of the spectrum. LISTEN/NOTIFY is signaling, not a durable event log, so don’t go building Kafka semantics on it. And once caching is genuinely hot-path and high-volume, Redis is the answer. The point was never that Postgres replaces Redis. It’s that plenty of Redis instances are out there doing jobs Postgres was already positioned to handle.

What about time-series and geospatial data?

Both have strong answers inside Postgres, and each tends to summon a dedicated system before it needs to. For time-series data, meaning metrics, readings, logs, anything append-mostly and time-ordered, native range partitioning plus a BRIN index goes a long way. BRIN indexes stay tiny because they store only the min and max per block range, and on naturally time-ordered data they’re remarkably effective.

sql

CREATE INDEX readings_time_brin ON readings USING BRIN (recorded_at);

Partition by month, drop old partitions instead of running expensive deletes, and you’ve covered a large share of time-series needs. If you need more, TimescaleDB is a Postgres extension, so you get hypertables and continuous aggregates without leaving Postgres or operating a separate system.

For geospatial, PostGIS isn’t a lightweight approximation of a GIS database. It is one of the most capable spatial engines available, and it runs inside your Postgres instance. Find everything within this radius, polygon containment, spatial joins, distance ordering: all of it, indexed, sitting right alongside your relational data.

So how do you actually decide?

The tour can read like a list of clever tricks. It’s more useful as a habit of mind. Before any second datastore enters the diagram, three questions settle most cases for me.

1. What’s the real scale, today and on a realistic 12-month curve? Not the scale you picture at a hypothetical Series C. The honest numbers. A lot of specialized infrastructure gets provisioned for load that never arrives, and the dedicated tool’s advantages mostly show up at scales most systems never reach. If a single well-indexed Postgres instance handles your projected load with headroom, the second system is solving a problem you don’t have.

2. Is this capability the core of the product, or a supporting feature? When search, or vectors, or time-series is the thing you sell, the specialized tool’s depth is a competitive necessity, so invest there. When it’s a feature inside a broader application, that depth is usually overkill, and the integration cost outweighs the marginal capability.

3. What does the second system actually cost you, fully loaded? Deployment, monitoring, backups, the new failure mode, the consistency boundary, the on-call knowledge, the synchronization code. Write it all down. Often the act of listing it honestly is enough to settle the question, because the dedicated tool now has to be enough better to justify all of that, not merely better at the one task.

If the answers come back “modest scale, supporting feature, real operational cost,” the default should be Postgres. If they come back “large or fast-growing scale, core product capability, and the cost is justified by the advantage,” then reach for the specialized tool with confidence. That’s the case it was built for.

What to build instead

The move I’d argue for is consolidation first, separation when the data tells you to. Start with the capability inside Postgres. Index it properly. Measure it under realistic load. Let it run. If and when you hit a real wall, a measured one rather than an imagined one, you’ll know exactly which capability needs to move out, why, and what “better” has to mean to be worth the second system. That’s a far stronger position than provisioning three datastores up front because each one might be needed, then operating all of them forever whether they earn their keep or not.

The version of this that goes wrong is the team carrying a search cluster for a thousand documents, a message broker pushing a few hundred jobs a day, and a vector database holding ten thousand embeddings. Three systems, three on-call surfaces, three sync pipelines, all doing work one Postgres instance would have absorbed without noticing. Every one of those decisions was defensible in the meeting where it got made. The aggregate is an operational burden nobody chose on purpose.

Postgres does more than you think. Not everything, not forever, not at every scale. But far more than the reflex to add a system assumes. And the discipline of asking first is most of the value, because it forces the second system to justify itself before it joins the diagram, instead of after it’s too entrenched to pull back out.

February 20, 2026

Postgres Does More Than You Think

Share This Article

The part nobody prices in

Can Postgres handle full-text search instead of Elasticsearch?

Is JSONB a real alternative to a document database?

Can Postgres do vector search without a dedicated vector database?

Should you use Postgres as a job queue?

Lightweight pub/sub and cache

What about time-series and geospatial data?

So how do you actually decide?

What to build instead

Naveen Chandra

Cloud, Databases & AI - explained for engineers.

Practical articles on Cloud & DevOps, Database engineering, and AI systems, written with clarity and depth. Published regularly. No noise.

No spam. Unsubscribe anytime.

Work With a Cloud Engineer Who Sticks Around

I take on a small number of engagements at any one time so I can be genuinely involved in the work. If you are looking for a long-term tech partner, let's talk.