---
title: "Vector Store vs. Vector Database"
description: "Vector store vs vector database is easy to confuse. Learn the difference between them, how they are related, and what that means for you."
section: "Postgres and vector data"
---

> **TimescaleDB is now Tiger Data.**

*Originally published on Aug. 29, 2024*

The vector store vs. vector database distinction matters when you are choosing retrieval infrastructure for an AI application. A vector store is a storage system optimized for holding and retrieving high-dimensional vector embeddings. A vector database adds full database capabilities around that core: persistence, metadata filtering, ACID transactions, scalable indexing, and access control. In most production systems, a vector database includes a vector store as an internal component - the two are not a binary choice but a spectrum.

## Vector store vs. vector database: at a glance

The table below compares vector stores and vector databases across eight dimensions most relevant to developers choosing between them.

| **Dimension** | **Vector store** | **Vector database** |
| --- | --- | --- |
| Primary function | Store and retrieve embeddings via ANN search | Full database with embedding storage, retrieval, and data management |
| Scalability range | Typically suited for under 1M-10M vectors | Designed for tens of millions to billions of vectors |
| Query complexity | Approximate nearest neighbor (ANN) only | ANN + metadata filtering + SQL-compatible queries |
| Data persistence | Often in-memory or file-based; optional persistence | Full ACID persistence; durable by default |
| Metadata filtering | Limited or none | Native support for filtering by structured metadata |
| Setup complexity | Low - minimal configuration | Higher - requires database administration |
| Best use case | Prototypes, MVPs, small-scale semantic search | Production RAG systems, multi-tenant applications, large-scale AI |
| Examples | Chroma (default in-memory mode), FAISS | pgvector + pgvectorscale, Pinecone, Weaviate, Qdrant |

These are not mutually exclusive categories. Chroma can be configured with persistent backends, and pgvector runs inside PostgreSQL - making it a vector store within a full relational database.

## What's the difference between a vector store and a vector database?

### What is a vector store?

A vector store is a system built specifically for storing and retrieving vector embeddings. Embeddings are high-dimensional numerical representations of data - think 768 or 1,536 floating-point values per item - used in semantic search, recommendation systems, and RAG pipelines.

Key characteristics of vector stores:

- **Optimization for high-dimensional data.** Vector embeddings typically consist of hundreds or thousands of dimensions. Standard database indexes are not designed for this, so vector stores use specialized index structures like HNSW or IVFFlat to make similarity search practical.

- **Nearest neighbor search.** The core operation is finding the K most similar vectors to a query vector, ranked by a distance metric (Euclidean distance, cosine similarity, or inner product). Vector stores are built around this operation.

- **Minimal data model.** Vector stores focus on numeric vector data. Most support optional metadata fields, but the query model is not designed to filter or join across complex structured data.

- **Optional persistence.** Many vector stores default to in-memory operation. Persistence to disk is often available but not always ACID-compliant or durable by default.

- **Low operational overhead.** Vector stores are designed to be fast to set up. Chroma and FAISS, for example, can be running inside a Python process in a few lines of code - which is exactly what makes them useful for development and prototyping.

Vector stores trade breadth for speed and simplicity. For prototyping and local development, these tradeoffs make sense. For production applications with real users and real data, they usually do not.

### What is a vector database?

A vector database does everything a vector store does, plus adds the features that production applications require: ACID-compliant storage, metadata filtering, access control, backup/restore, and the ability to combine vector search with structured queries.

Key characteristics of a vector database:

- **Extended database functionality.** Vector databases are often built as extensions of existing database systems - adding vector storage and retrieval to proven database technologies rather than building from scratch. PostgreSQL with pgvector is the clearest example: a 30-year-old database system gains vector search capabilities without giving up any of its existing features.
- **Integration of vector and relational data.** A vector database can store the embedding alongside the source data it represents - the document text, user ID, timestamp, and any other structured metadata. Queries can filter on any of these fields.

- **Broader query support.** Vector databases support combined queries: find the 10 most similar documents to this embedding, but only from user Y's account, created in the last 30 days, and ranked by recency within the similarity results. This requires both ANN search and relational filtering working together.

- **Flexible data model.** Vector databases handle a mix of vector and non-vector data types in the same table. This removes the need for a separate store to hold the source data alongside embeddings.

- **ACID persistence and operational features.** Data is durable by default, transactions are supported, and the system includes backup/restore, access control, and the operational tooling that production deployments need.

Most vector databases are built by wrapping a vector store component inside a database system. The database layer handles durability, indexing, and query coordination; the vector store component handles the similarity search.

## How vector stores and vector databases are related

Most systems marketed as "vector stores" are actually vector databases. The store is a component inside the database, not a standalone product. This distinction is important because it clarifies what you are actually evaluating when you compare tools.

The architecture works in layers:

- **The vector store** is the inner component that holds the embeddings and handles approximate nearest neighbor search.
- **The database wrapper** adds persistence, transactions, access control, and query integration around that store.
- **The query layer** lets you combine vector similarity search with traditional database operations in a single query.

When you use Pinecone, Weaviate, or Qdrant, you are using a vector database - each has an internal embedding store plus surrounding database infrastructure for metadata, access control, and persistence. When you use FAISS or Chroma in default in-memory mode, you are using a vector store - you get the ANN search component without the database wrapper.

**Example - pgvector in PostgreSQL:**

- Vector tables created with pgvector act as the vector store component.
- PostgreSQL with the [<u>pgvector extension</u>](https://www.tigerdata.com/learn/postgresql-extensions-pgvector) is the full vector database.
- A SQL query can call pgvector's similarity functions alongside any other PostgreSQL operation - joins, filters, aggregates.

The pgvector case is instructive because it makes the layers visible. The extension adds a new column type (vector) and a new index type (HNSW or IVFFlat) to PostgreSQL. The embedding storage and retrieval live in those constructs - that is the vector store. Everything else - authentication, transactions, backup, SQL query planning, JSONB support, connection pooling - is provided by PostgreSQL. The combination is a vector database.

### Vector stores in LangChain and LlamaIndex

If you are coming from LangChain or LlamaIndex, "vector store" means something slightly different from the architectural definition above.

In LangChain, VectorStore is an abstract interface in the retrieval layer. It is not a product - it is an API that different backends implement. Chroma (in-memory), Pinecone (cloud-native), and pgvector (PostgreSQL) all implement the same interface, so LangChain calls all of them "vector stores" regardless of what the underlying system actually is.

This is the most common source of confusion. When you use PGVector in LangChain, you are running queries against a full relational database. The LangChain label does not change what the underlying system can do. See the full walkthrough of [<u>pgvector as a vector store in LangChain</u>](https://www.tigerdata.com/blog/how-to-build-llm-applications-with-pgvector-vector-store-in-langchain) for implementation details.

## When to use a vector store vs. a vector database

This decision comes down to application stage and scale. It is not a permanent architectural commitment - most teams start with a vector store and migrate later.

### Choose a vector store if:

- You are building a prototype, proof-of-concept, or internal demo with a small, bounded dataset (under 1M vectors)
- Your queries are pure similarity search - no metadata filtering, no joins with structured data, no access controls needed
- Setup speed matters more than production readiness (Chroma or FAISS can be running in a Python process in minutes)
- You have no requirements around data durability, backup/restore, or multi-tenancy
- You are working in-memory or from a local file during early development

### Choose a vector database if:

- You need to filter similarity search results by metadata (for example: find documents similar to X, but only from user Y's account)
- You are moving toward production and need ACID guarantees, persistent storage, and the ability to update or delete vectors without rebuilding the index
- Your dataset is growing past millions of vectors, or you anticipate horizontal scaling
- You are building a multi-tenant RAG system, a recommendation engine, or any application where vectors live alongside relational data
- Your team already runs PostgreSQL - pgvector gives you a vector database inside your existing infrastructure with no added operational overhead

### The migration path

Most teams start with Chroma or FAISS, then migrate when the application needs metadata filtering, multi-tenancy, backup/restore, or production SLAs.

Migrating to pgvector is a natural step because pgvector exposes a standard PostgreSQL interface. ORMs, connection pools, and database tools that developers already use with relational data work without modification. The transition does not require a rewrite.

For more on making this decision, see [<u>how to choose a vector database</u>](https://www.tigerdata.com/blog/how-to-choose-a-vector-database).

## What to look for in a vector store and vector database

### A fast, accurate vector store component

The vector store inside a production system needs to handle real-world scale:

- **Insert, update, and delete speed.** Fast writes matter as much as fast reads - a store that slows down on bulk inserts will bottleneck ingestion pipelines. Some in-memory indexes (IVFFlat, for example) require a full index rebuild when vectors are added beyond a certain threshold.

- **Accurate ANN algorithms.** The tradeoff between recall and latency is the core design decision in [<u>vector similarity search</u>](https://www.tigerdata.com/learn/vector-search-vs-semantic-search). HNSW provides high recall at low latency and is the current standard for production workloads. DiskANN-based approaches (like pgvectorscale's Streaming DiskANN) extend this by allowing the index to live partially on disk, making large-scale deployment cheaper. Avoid systems that only offer IVF-based indexes - recall degrades significantly at scale without careful tuning.

- **Scale without degradation.** The store should maintain query latency as the vector count grows from millions to billions. An in-memory-only index that performs well at 1M vectors may become unusably slow or prohibitively expensive at 100M vectors.

- **Memory efficiency.** In-memory indexes are fast but expensive. At 768 dimensions, a HNSW index for 50 million vectors can consume hundreds of gigabytes of RAM. Systems with on-disk index support can serve large datasets without proportionally large memory footprints - a practical requirement for production deployments at scale.

- **Multiple distance metrics.** Different embedding models expect different metrics - cosine similarity, L2 distance, inner product. The store should support all three. Using the wrong metric for a given embedding model produces incorrect similarity rankings.

### Clean integration with the database layer

A vector store bolted onto a database creates operational friction. Look for systems where the two layers were designed together:

- **Query syntax that feels natural.** Vector operations should compose with standard SQL, not require a separate query language or a separate API call. If filtering by metadata requires fetching candidates from the vector store and then running a second query against a relational database, that is not a vector database - it is two separate systems with a thin wrapper.

- **Compatibility with database tooling.** Indexing, transactions, backups, and replication should cover vector tables the same way they cover relational tables. If vector data requires a separate backup strategy, that is a maintenance burden.

- **Filtered search at the index level.** Filtering similarity results by metadata should happen inside the ANN search pass, not by over-fetching candidates and filtering in application code. Post-hoc filtering degrades recall under strict filters - the system may not return enough results after filtering to satisfy the K-nearest-neighbor request.

- **Standard data types.** Vector columns should coexist with text, integer, timestamp, and JSONB columns in the same table without special handling. If vectors and metadata require separate tables with a join to query together, the system is not truly integrated.

- **Query planning across both.** The database optimizer should reason about combined vector-and-relational queries, not execute them as independent steps and merge results at the application layer.

### A database system you already understand

The operational model of the underlying database matters - especially for teams that will own the system in production:

- Teams familiar with PostgreSQL do not need to learn a new query language, connection model, backup procedure, or monitoring setup. Existing ORMs, migration frameworks, and database tools work with vector tables the same way they work with any other table.
- Established database systems have broad tooling support built up over years. The ecosystem of PostgreSQL connectors, cloud providers, and extensions is larger than any purpose-built vector database.
- Large user communities mean documented solutions to common problems. When something goes wrong at 2 AM, "PostgreSQL + pgvector" has far more Stack Overflow answers than any specialized vector store.
- Long-term maintenance cost is lower when the database vendor has a decades-long track record of security patches, major version upgrades, and backward compatibility commitments.

To understand [<u>how vector search works</u>](https://www.tigerdata.com/learn/understanding-vector-search) at the algorithm level, including how HNSW graphs are built and queried, see the linked resource.

## What Tiger Data offers: PostgreSQL as a high-performance vector database

Tiger Cloud runs Tiger Data's vector database system on fully managed PostgreSQL. Developers get pgvector, pgvectorscale, and pgai without managing database infrastructure.

- **pgvector.** The open-source extension for native vector storage and similarity search in PostgreSQL.
- **pgai.** Runs AI workflows - embedding creation, model completion - directly in PostgreSQL, simplifying RAG application architectures.
- **pgvectorscale.** Extends pgvector with Streaming DiskANN indexing and Statistical Binary Quantization for large-scale, cost-efficient vector search.

### Production-level vector store performance

Benchmark results using 50 million Cohere embeddings at 768 dimensions, comparing PostgreSQL with pgvector and pgvectorscale against a [<u>dedicated vector database like Pinecone</u>](https://www.tigerdata.com/blog/pgvector-vs-pinecone):

- **28x lower p95 latency** for approximate nearest neighbor queries at 99% recall
- **16x higher query throughput** vs. Pinecone's storage-optimized index (s1)
- **75-79% lower cost** vs. Pinecone (self-hosted option)

What drives these numbers:

- **Streaming DiskANN.** pgvectorscale's index stores part of the index on disk, removing the constraint that the entire index must fit in memory. This makes large-scale deployments significantly cheaper.

- **Time-based partitioning.** Tiger Cloud's hypertables automatically partition data by time, so queries that target recent embeddings skip older partitions entirely - a major latency reduction for time-sensitive workloads.

- **Single unified stack.** Vector embeddings, relational data, time-series data, and event data all run in the same PostgreSQL service. No separate vector store to sync.

- **Streaming filtering.** pgvectorscale applies secondary filters during the ANN search pass, not after - which maintains recall accuracy under strict metadata filters.

### Built on Tiger Data's PostgreSQL foundation

Tiger Cloud gives you the full PostgreSQL feature set alongside vector search:

- **Three decades of production use.** PostgreSQL's reliability at scale is documented across thousands of production deployments. The same transactional guarantees, replication behavior, and ACID semantics apply to vector tables - there is no separate set of rules for the vector layer.

- **Complete SQL support.** Combine similarity search with joins, window functions, full-text search, JSONB queries, and any other standard SQL operation. A single query can rank results by vector similarity, filter by structured metadata, apply a time-range constraint, and join to another table - without leaving SQL.

- **Time-based partitioning for vector workloads.** Tiger Data's hypertables automatically partition data by time. For applications that search recent embeddings - a common RAG pattern where you want results from documents created in the last 30 days - hypertable chunk exclusion means the query only touches relevant partitions. This keeps latency stable as the total embedding count grows.

- **Tiered storage to S3.** Tiger Data's tiered storage architecture moves older data to object storage while keeping it fully queryable via SQL. For vector workloads with long data retention requirements, this reduces storage cost without requiring a separate archival system.

- **No new tooling.** pg_dump, pg_restore, existing ORMs, connection poolers, and monitoring tools all work with vector tables. Teams that already operate PostgreSQL in production do not need to introduce a new operational model for the vector layer.

### One database for vectors, relational data, and time-series

Tiger Cloud is not a single-purpose vector store. The same service handles vector embeddings, relational data, time-series data, and event data - which matters for AI applications that need to combine similarity search with structured filters, time-range queries, or transactional writes.

For applications that combine vector and time-series workloads - such as searching recent embeddings within a time window - Tiger Cloud includes dedicated optimization via hypertables and time-based partitioning.

## FAQ: vector store vs. vector database

### Is a vector store the same as a vector database?

No. A vector store is a storage layer optimized for embedding retrieval, while a vector database adds full database capabilities (persistence, metadata filtering, ACID transactions, access control) around that core. In practice, most production vector databases include a vector store as an internal component. The terms are often used interchangeably in developer tooling, including LangChain, which is the primary source of confusion.

### What is a vector store in LangChain?

In LangChain, VectorStore is an abstract interface in the retrieval layer that standardizes how the framework interacts with embedding storage backends. It is not a product - it is an API abstraction. Backends ranging from Chroma (in-memory) to pgvector (full PostgreSQL) all implement the same LangChain VectorStore interface, which means LangChain calls them all "vector stores" regardless of their underlying database capabilities.

### Can PostgreSQL be used as a vector store?

Yes. The pgvector extension adds native vector storage and ANN search to PostgreSQL, making it function as a vector store within a full relational database. You get embedding retrieval alongside the persistence, metadata filtering, and SQL querying that PostgreSQL already provides. Tiger Data's pgvectorscale extension extends pgvector's indexing performance further, particularly at large dataset sizes.

### When should I use a vector store instead of a vector database?

A standalone vector store is the right choice for prototyping, local development, or applications where the dataset is small (under 1M vectors) and queries are pure similarity search with no metadata filtering. When an application moves toward production - particularly when it needs persistent storage, filtering by metadata, multi-tenancy, or operational reliability - a vector database is the appropriate choice. Most teams start with a lightweight vector store and migrate to a vector database as requirements grow.

### What are examples of vector stores vs. vector databases?

Common lightweight vector stores include Chroma (in default in-memory mode) and FAISS - both are widely used in LangChain and LlamaIndex prototypes because of their minimal setup requirements. Full vector databases include pgvector + pgvectorscale (PostgreSQL-based), Pinecone (cloud-native), Weaviate (open-source), and Qdrant (open-source). The line between categories is not always sharp: Chroma can be configured with a persistent backend, and pgvector runs inside PostgreSQL, giving it the full capabilities of a relational database.

### What is the difference between a vector database and a vector index?

A vector index (such as HNSW or IVFFlat) is a data structure that makes approximate nearest neighbor search efficient - it is an algorithm, not a database. A vector database uses one or more indexes internally but adds data management capabilities around them: storage, persistence, metadata, access control, and operational features like backup and restore. Think of the index as the search mechanism and the database as the complete system it runs within.

## Conclusion

Vector stores and vector databases solve different problems at different stages of application development. Vector stores are the right starting point: fast to set up, minimal configuration, good enough for prototyping. Vector databases are the right endpoint: persistent, filterable, scalable, and operable in production.

The decision is not permanent. Most teams migrate from one to the other as requirements grow - and the migration path is straightforward when the target is a PostgreSQL-native system like pgvector, where existing application code, ORMs, and database tooling require no changes.

The broader pattern is consistent: choose the simplest tool that meets your current requirements, then migrate when requirements change. The cost of starting with a vector database when you only needed a vector store is low. The cost of staying with a vector store when you need a vector database - rewriting queries, rebuilding persistence, adding metadata filtering by hand - is high.

Tiger Cloud gives development teams the vector database capabilities they need for production - benchmarked 28x faster than Pinecone at 99% recall, 75-79% cheaper, and running on PostgreSQL infrastructure that most backend teams already know how to operate.
