Vector Databases Are Overhyped — Do You Need One?

Pinecone is exploring a potential sale. The company that raised $138 million in total funding at a $750 million valuation in 2023 — the poster child of the vector database boom — reportedly lost Notion as a customer and is now shopping for an exit. That should tell you something about where this market is actually heading.

I've been building data pipelines and production ML systems for years. I watched the vector database hype wave build in 2023, peak in 2024, and start crashing into reality in 2025. The pattern is always the same: new technology appears, VCs pour money in, startups multiply, everyone says you need it, and then the boring incumbent quietly adds the feature and most of the startups die.

That's exactly what's happening with vector databases right now.

This isn't a hit piece. Vector databases solve real problems for specific workloads. But the gap between "who actually needs one" and "who's paying for one" is enormous. I'm going to lay out the market data, show you head-to-head benchmarks, walk through real code, and give you a decision framework for figuring out whether you need a dedicated vector database or whether PostgreSQL will do the job just fine.

Spoiler: for most of you, it's PostgreSQL.

The Money Tells the Story

The vector database market was valued at $2.55 billion in 2025 and is projected to reach $8.9 billion by 2030, growing at a 27.5% CAGR. That sounds like a booming industry. And it is — sort of.

But look at who's actually capturing that value.

Pinecone raised $138 million in total funding at a $750 million valuation. Weaviate raised $135.8 million. Chroma, Qdrant, Milvus — the list goes on. VC money flooded in because every AI startup needed embeddings and every embedding needed somewhere to live. The logic was simple: vectors are a new data type, new data types need new databases, new databases mean new billion-dollar companies.

That logic had a fatal flaw.

Here's the problem: while these startups were raising capital, every major database company was quietly adding vector support. PostgreSQL got pgvector. MongoDB added Atlas Vector Search. Elasticsearch launched native vector capabilities. Redis added vector similarity. DuckDB got vector extensions. Even SQLite has them now. In the span of about 18 months, vector search went from "you need a specialized database for this" to "every database you already use supports this."

As VentureBeat put it, vectors went "from a database category to a data type". Vector search became a "checkbox feature in cloud data platforms". Not a product category — a feature. That's the worst possible outcome if you're a pure-play vector database company trying to justify a $750 million valuation.

The same VentureBeat analysis noted that "very few vector DB startups are breaking out" because the market is fragmented, commoditized, and getting swallowed by incumbents. When your differentiator becomes a built-in feature of the database everyone already uses, your business model has a problem.

I've seen this pattern before. Remember when time-series databases were the hot thing? InfluxDB, TimescaleDB, QuestDB — everyone needed a specialized time-series database. Then PostgreSQL got good enough at time-series workloads for 90% of use cases, and most of those startups either pivoted or shrank. The ones that survived found niches where PostgreSQL genuinely couldn't compete. The same thing is happening with vector databases, and the timeline is even faster because the underlying technology is simpler.

pgvector Is Better Than You Think

Most people who dismiss pgvector haven't looked at the benchmarks recently. I hear "pgvector is slow" from engineers who tested it in 2023 when the extension was still in its early days. The numbers from 2025 benchmarks are frankly embarrassing for the dedicated vector database companies.

Let me start with the headline stat: pgvectorscale achieves 471 queries per second at 99% recall on 50 million vectors. That's 11.4x better throughput than Qdrant, which managed just 41 QPS at the same recall level. Not 11.4% better. Not twice as fast. Eleven point four times the throughput.

Against Pinecone, the numbers are even more stark. pgvectorscale delivers 28x lower p95 latency than Pinecone's storage-optimized index (s1) and 16x higher query throughput. And here's the kicker: it costs 75% less when self-hosted on AWS compared to Pinecone's managed service. You're paying four times more for a service that's 16 to 28 times slower.

For datasets under 10 million vectors, pgvector "matches or beats" dedicated vector databases in most benchmarks, with sub-50ms query latency at moderate scale. Most RAG applications — the ones powering chatbots, internal search tools, documentation assistants — have between 10,000 and 1 million vectors. At that scale, pgvector doesn't just compete with dedicated databases. It wins.

Here's a comparison table with real benchmark data:

Metric	pgvectorscale	Pinecone (s1)	Qdrant
QPS (50M vectors, 99% recall)	471	~30	41
p95 latency	~5ms	~140ms	~24ms
Cost (self-hosted, monthly)	~$200	~$800	~$300
Recall at target QPS	99%	99%	99%
Max comfortable scale	~50M vectors	Billions	~100M vectors
Requires separate service	No	Yes	Yes
ACID transactions	Yes	No	No
SQL joins with relational data	Yes	No	No

Those numbers aren't theoretical. They come from standardized benchmarks on comparable hardware and dataset configurations. The pgvectorscale extension, built by Timescale, adds StreamingDiskANN indexing on top of pgvector's HNSW. That's what gets you the performance at scale — it's disk-based approximate nearest neighbor search that doesn't require your entire index to fit in RAM.

I want to be fair here. These benchmarks have caveats. The Pinecone s1 tier is their cheapest option, not their highest-performance tier. Qdrant's performance varies significantly based on configuration. And pgvectorscale requires some tuning to hit those peak numbers. But even with generous adjustments, the story stays the same: pgvector in 2025 is not the pgvector of 2023. It's a different beast entirely.

What You Get by Staying in PostgreSQL

Performance is only half the argument. The operational benefits of keeping vectors in your existing PostgreSQL database are massive, and this is where I think most technical comparisons get the analysis wrong. They focus on QPS and latency benchmarks but ignore the engineering cost of running two databases instead of one.

One database to manage. One connection string. One backup strategy. One monitoring dashboard. One set of credentials. One billing page. One point of failure instead of two. When something breaks at 2 AM — and it will, eventually — you're debugging one system, not trying to figure out whether the problem is in your relational database, your vector database, or the sync layer between them.

ACID transactions. This one is huge and gets overlooked constantly. When you delete a user, you delete their embeddings in the same transaction. When you update a document, you update its vectors in the same commit. No sync jobs. No eventual consistency. No "the document was deleted but the vectors are still there" bugs that silently corrupt your search results for hours or days before someone notices.

I've personally debugged a system where a separate vector store was serving stale results because the sync job between PostgreSQL and Pinecone had silently failed. The main database showed 5,000 products. Pinecone had embeddings for 7,200 — including 2,200 discontinued products. Customers were getting search results for things they couldn't buy. The fix was straightforward, but the bug lived in production for three weeks before anyone caught it. That doesn't happen when everything is in one database.

SQL for everything. You can join your vector search results with relational data in a single query. Find me the 10 most similar products that are also in stock, priced under $50, and have a rating above 4.0. That's one SQL query with pgvector. With a separate vector database, that's a vector search, then a filter on your main database, then an intersection, then a re-sort. Four operations instead of one, with network hops in between each.

Existing tooling works. Your ORMs, your migration tools, your monitoring, your CI/CD pipelines, your connection poolers — they all work with PostgreSQL already. Adding vector columns doesn't change any of that. Adding Pinecone means new SDKs, new API clients, new error handling patterns, new retry logic, new authentication secrets to manage, and new deployment configurations to maintain.

Team knowledge transfers. Everyone on your team knows SQL. Not everyone knows the Pinecone API, the Weaviate GraphQL syntax, or the Qdrant REST endpoints. When someone new joins the team, they can understand vector queries immediately because they're just SQL with a distance operator. There's no separate query language to learn.

Setting Up pgvector: It Takes 5 Minutes

If you're on a managed PostgreSQL provider like Neon, Supabase, or RDS, pgvector is already installed. You just enable it:

CREATE EXTENSION IF NOT EXISTS vector;

Create a table with a vector column:

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}',
    embedding VECTOR(1536),
    created_at TIMESTAMPTZ DEFAULT now()
);

The 1536 matches OpenAI's text-embedding-3-small output dimensions. If you use Cohere's embed-v3, that's 1024 dimensions. If you use text-embedding-3-large, that's 3072. Match this number to your model.

Add an HNSW index for fast approximate nearest neighbor search:

CREATE INDEX idx_documents_embedding
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The m parameter controls the number of bidirectional links per node in the HNSW graph. Higher values improve recall but increase index size and build time. 16 is a solid default. ef_construction controls the search depth during index building. 64 gives you good recall without excessive build times. For production workloads with millions of vectors, you might bump these to m = 32 and ef_construction = 128.

Now run a similarity search:

SELECT id, content, metadata,
       1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

The <=> operator is cosine distance. pgvector also supports L2 distance (<->) and inner product (<#>). For normalized embeddings from OpenAI, cosine distance is the standard choice. If your embeddings aren't normalized, inner product can be faster since it skips the normalization step.

You can also combine vector search with traditional SQL filters:

SELECT id, content,
       1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'engineering'
  AND created_at > now() - interval '30 days'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

That's a semantic search filtered to engineering documents from the last 30 days. Try doing that in one API call with Pinecone.

Four SQL statements to enable the extension, create a table, add an index, and run a query. That's your entire vector database setup. No new service to deploy. No new SDK to install. No new credentials to manage.

Embedding and Inserting with Python

Here's a complete Python script that embeds text and stores it in PostgreSQL with pgvector:

import os
import json
import psycopg2
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Connect to PostgreSQL
conn = psycopg2.connect(os.getenv("DATABASE_URL"))
cur = conn.cursor()

def embed_text(text: str) -> list[float]:
    """Generate embedding for a text string."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def store_document(text: str, metadata: dict = {}):
    """Embed text and store in PostgreSQL."""
    embedding = embed_text(text)
    cur.execute(
        """
        INSERT INTO documents (content, metadata, embedding)
        VALUES (%s, %s::jsonb, %s::vector)
        """,
        (text, json.dumps(metadata), str(embedding))
    )
    conn.commit()

def search(query: str, limit: int = 5, min_similarity: float = 0.7):
    """Search for similar documents with a similarity threshold."""
    query_embedding = embed_text(query)
    cur.execute(
        """
        SELECT content, metadata,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        WHERE 1 - (embedding <=> %s::vector) > %s
        ORDER BY embedding <=> %s::vector
        LIMIT %s
        """,
        (
            str(query_embedding),
            str(query_embedding),
            min_similarity,
            str(query_embedding),
            limit,
        )
    )
    return cur.fetchall()

# --- Usage ---

# Store some documents
store_document(
    "PostgreSQL 17 introduced incremental backup support and new JSON functions.",
    {"source": "pg-release-notes", "version": 17}
)

store_document(
    "The HNSW index in pgvector provides fast approximate nearest neighbor search.",
    {"source": "pgvector-docs", "topic": "indexing"}
)

store_document(
    "Connection pooling with PgBouncer reduces overhead for high-concurrency apps.",
    {"source": "pg-best-practices", "topic": "performance"}
)

# Search
results = search("What indexing options does pgvector support?")
for content, metadata, similarity in results:
    print(f"[{similarity:.3f}] {content}")
    print(f"        Source: {metadata}")

No LangChain. No vector database SDK. No orchestration framework. Just psycopg2 and the OpenAI client. The entire embedding, storage, and retrieval layer is plain Python and SQL. Anyone on your team who can read Python and SQL can understand, debug, and modify this code. Try saying the same about a LangChain chain with a Pinecone retriever and three layers of abstraction.

For production use, you'd want to add batch embedding (OpenAI supports sending multiple texts in a single API call), connection pooling, error handling, and maybe an async version. But the core pattern stays exactly the same: embed, insert, query.

When You Actually Need a Dedicated Vector Database

I've spent most of this article arguing that most people don't need Pinecone or Weaviate or Qdrant. But intellectual honesty requires me to tell you when they're genuinely the right choice. Here's the breakdown.

You have hundreds of millions of vectors as your primary workload. If vectors are the main thing your database does — not a side feature of a relational app — and you're operating at hundreds of millions to billions of vectors, a purpose-built system will give you better resource utilization. pgvector at 50 million vectors is great. pgvector at 500 million vectors starts to strain because the HNSW index gets enormous. At a billion vectors, you need something built specifically for that problem, with sharding and distributed query execution baked into the architecture.

You need complex filtered vector search with adaptive query strategies. Simple metadata filters work fine in pgvector, but if you need the database to dynamically choose between pre-filtering, post-filtering, and single-stage filtering based on filter selectivity — that's where dedicated systems like Weaviate and Qdrant have invested heavily. PostgreSQL's query planner wasn't designed to reason about the interaction between B-tree filters and HNSW traversal. Dedicated vector databases have built optimizers specifically for this problem.

You have high-velocity real-time ingestion. If you're ingesting millions of new vectors per hour while simultaneously serving queries, PostgreSQL's MVCC overhead becomes a real factor. Every update creates a new tuple version. Every query needs to check visibility. Dedicated vector databases that use LSM-tree or append-only storage patterns handle concurrent writes and reads with less contention. If your write throughput matters as much as your read latency, this is where pgvector shows its age as a general-purpose database.

You're serving thousands of concurrent queries with strict latency SLAs. At extreme concurrency levels — thousands of simultaneous similarity searches — purpose-built systems have an edge. They can partition vectors across nodes, route queries to the optimal shard, and parallelize the search in ways that PostgreSQL's single-node architecture can't match. If you're building a consumer search product with millions of daily users, this matters.

Your team has zero database management expertise and doesn't want to learn. This one is pragmatic, not technical. Managed vector databases like Pinecone require zero operational overhead. You don't tune HNSW parameters, you don't worry about vacuuming bloated indexes, you don't manage connection pools, you don't size your shared_buffers. If your team is four ML engineers who've never touched a PostgreSQL config file and have no interest in starting, Pinecone's simplicity has real value. The operational cost of running PostgreSQL well is non-zero, and pretending otherwise is dishonest.

Here's a decision framework:

Scenario	Best Choice	Why
RAG app with under 1M vectors	pgvector	Zero additional cost, same DB as your app
RAG app with 1-10M vectors	pgvector + pgvectorscale	Still beats dedicated DBs in benchmarks
10-50M vectors, mixed workload	pgvector + pgvectorscale	471 QPS at 99% recall, hard to beat
50-100M vectors, vector-primary	Evaluate both	Test your actual query patterns and write load
100M+ vectors, vector-only	Dedicated vector DB	Purpose-built for this scale
Real-time ingestion + search	Dedicated vector DB	Better write/query concurrency
Complex multi-filter vector search	Dedicated vector DB	Better filter-aware query planning
Team with no DB expertise	Pinecone (managed)	Operational simplicity has real value

The 5-10% That Actually Matters

Here's the insight that most people miss when they obsess over vector database selection: your vector database choice accounts for maybe 5-10% of your RAG system's quality.

Read that again. Five to ten percent.

The other 90-95% comes from:

Your chunking strategy. How you split documents matters more than where you store them. Bad chunks produce bad embeddings. Bad embeddings produce bad search results. No amount of database optimization fixes garbage input. I've seen a team double their retrieval accuracy just by switching from fixed-size 512-character chunks to semantic paragraph splitting. The database didn't change at all.
Your embedding model. The difference between text-embedding-3-small and a fine-tuned domain-specific model is far larger than the difference between pgvector and Pinecone. If you're searching legal documents, an embedding model fine-tuned on legal text will outperform a general-purpose model by 15-30%, regardless of what database you're using underneath.
Your retrieval pipeline. Hybrid search (combining vector similarity with BM25 keyword matching), re-ranking with a cross-encoder, query expansion, hypothetical document embedding — these techniques improve retrieval quality by 20-40%. Switching from pgvector to Qdrant might improve it by 2%. The math doesn't lie: spend your engineering time on the pipeline, not the storage layer.
Your prompt engineering. How you format retrieved context for the LLM, what instructions you give it, how you handle conflicting information across chunks, whether you include metadata — this is where quality lives or dies. I've watched teams go from 60% answer accuracy to 85% by changing nothing except the system prompt and how they formatted the retrieved chunks. Same database. Same embeddings. Same documents.

I've seen teams spend three weeks evaluating vector databases and three hours on their chunking strategy. That's backwards. Get your chunking right first. Get your embedding model right second. Get your retrieval pipeline right third. Then, and only then, think about whether pgvector is limiting you.

It almost certainly isn't.

The Commoditization Is Accelerating

The trend is clear and it's not reversing. Traditional databases are getting better at vectors faster than vector databases are getting better at traditional database features.

PostgreSQL has pgvector and pgvectorscale. MongoDB has Atlas Vector Search. Elasticsearch has dense vector fields with HNSW and quantization. Redis has vector similarity search. DuckDB has vector extensions for analytical workloads. Even Chroma did a full Rust rewrite achieving 4x faster writes and queries — not because they wanted to, but because they had to keep up with the incumbents eating their lunch.

When VentureBeat says "vector search is now a checkbox feature in cloud data platforms", they're describing a market where the standalone product is being absorbed into the platform. That's textbook commoditization. It happened to NoSQL databases in the 2010s (MongoDB survived, RethinkDB didn't, CouchDB became a niche product). It happened to graph databases (Neo4j survived, but barely, as PostgreSQL's recursive CTEs and JSON handling covered most graph-like queries). It's happening to vector databases right now, in real time, and the pace is accelerating.

The startups that will survive are the ones that find use cases beyond basic similarity search. Multi-tenant vector isolation with hardware-level separation. Hybrid search with learned sparse embeddings that adapt to your domain. Hardware-accelerated quantization for running on edge devices. There's real innovation happening at the edges of this field. But for the core use case of "store embeddings, find similar ones, return them fast" — the use case that 95% of developers actually have — PostgreSQL already does that well enough. And "well enough" is death for startups that raised hundreds of millions of dollars.

A Note on Cost

I want to put some real numbers on this because I think cost is underappreciated in the vector database discussion.

Pinecone's starter plan costs $70/month for a single pod. Their standard plans start around $200/month and go up quickly as you add replicas and pods. For a production deployment with redundancy, you're looking at $500-$2,000/month easily — and that's just for the vector search component, before you add your relational database, your cache layer, and everything else.

pgvector runs on your existing PostgreSQL instance. If you're already paying for a Neon Pro plan ($19/month) or an RDS instance, the incremental cost of adding vector columns and HNSW indexes is zero. Literally zero additional monthly spend. Your existing compute handles the vector queries alongside your normal relational queries.

Even if you need a beefier PostgreSQL instance to handle the vector workload, self-hosted pgvectorscale costs roughly 75% less than Pinecone for equivalent performance. At a startup burning through runway, saving $500-$1,500/month on infrastructure that performs better is not a trivial decision. That's $6,000-$18,000/year you can spend on an engineer's time, better embedding models, or literally anything else.

What I Actually Think

I run two production applications on Neon PostgreSQL. Both use pgvector for different things. One does semantic search over job listings at birjob.com. The other does similarity matching for content recommendations. Both handle their vector workloads without breaking a sweat. Neither has ever needed a dedicated vector database, and I don't see that changing.

My honest opinion: the vector database boom was a classic case of VCs funding a solution before verifying the size of the problem. Yes, vectors need to be stored somewhere. No, that somewhere doesn't need to be a separate specialized database for 95% of applications. The AI hype cycle created a gold rush, and vector database startups were selling shovels. But it turns out the people already selling shovels (PostgreSQL, MongoDB, Elasticsearch) just added "vector" to their existing product line, and their shovels work just as well.

If you're starting a new project today that needs vector search, start with pgvector. If you already have PostgreSQL — and you almost certainly do — add the extension. Create an HNSW index. Write a SQL query. Ship it. You can always migrate to a dedicated system later if you outgrow it, but the odds of that happening are small. Most projects never outgrow PostgreSQL. That's been true for relational data for 30 years, and it's turning out to be true for vector data too.

The companies that genuinely need Pinecone-scale infrastructure are indexing billions of documents, serving millions of users concurrently, and operating at a scale where the engineering team has dedicated infrastructure engineers who do nothing but tune databases. If that describes your company, you probably aren't reading a blog post for this kind of advice. You have a team of specialists who've already benchmarked both options on your actual workload.

For the rest of us — the people building RAG apps, semantic search features, recommendation engines, content matching systems, and AI-powered tools — PostgreSQL is enough. It's been enough for a while. The benchmarks prove it. The operational simplicity proves it. The cost savings prove it.

Stop paying for a separate vector database. Add a column to your existing one.

Vector Databases Are Overhyped — When You Actually Need One

The Money Tells the Story

pgvector Is Better Than You Think

What You Get by Staying in PostgreSQL

Setting Up pgvector: It Takes 5 Minutes

Embedding and Inserting with Python

When You Actually Need a Dedicated Vector Database

The 5-10% That Actually Matters

The Commoditization Is Accelerating

A Note on Cost

What I Actually Think

Sources

Enjoyed this article?

The Money Tells the Story

pgvector Is Better Than You Think

What You Get by Staying in PostgreSQL

Setting Up pgvector: It Takes 5 Minutes

Embedding and Inserting with Python

When You Actually Need a Dedicated Vector Database

The 5-10% That Actually Matters

The Commoditization Is Accelerating

A Note on Cost

What I Actually Think

Sources