---
title: "Using Pgvector With Python"
description: "Pgvector is a powerful PostgreSQL vector database extension. Learn how to use it in Python for powerful machine learning-powered applications. "
section: "Postgres and vector data"
---

> **TimescaleDB is now Tiger Data.**

*Written by *[*Haziqa Sajid*](https://www.timescale.com/blog/author/haziqa/)


Have you ever wondered how artificial intelligence (AI) systems can understand our language? The key concept is embeddings, where words and phrases are converted into high-dimensional vectors that capture their meanings and relationships. These vectors allow computers to perform mathematical operations on language data. The challenge then becomes storing these high-dimensional vectors efficiently.

The good news is that you can use an old friend, PostgreSQL, and transform it into a full-fledged vector database with the [<u>pgvector extension</u>](https://www.timescale.com/learn/postgresql-extensions-pgvector). Adding Python to the mix allows you to build Python applications with machine learning elements. 

For those who aren’t sure about how to achieve this, here’s what you’ll learn in this guide:

- What are vector databases and pgvector
- How to install and use pgvector in Python
- How to leverage PostgreSQL for your AI applications


## Pgvector and Python

Let’s start by explaining the concepts required to understand pgvector. Then, we will provide the technical details to get up and running with pgvector in Python.


### What is pgvector (and how does it make PostgreSQL a vector database)

A vector database stores data as high-dimensional vectors, mathematical representations of features or attributes. Vectors, with dimensions ranging from tens to thousands, are derived from raw data like text, images, and even videos through embedding algorithms. These databases are designed for fast and precise similarity search and enhanced search based on semantic meaning.

> **Editor’s Note: **Check this article to learn the difference between [<u>vector search and semantic search</u>](https://www.timescale.com/learn/vector-search-vs-semantic-search).


PostgreSQL does not have native vector capabilities (as of PostgreSQL 16), and **pgvector **is designed to fill this gap. [<u>Pgvector is an open-source vector similarity search extension for PostgreSQL</u>](https://github.com/pgvector/pgvector). Here's how it enhances PostgreSQL, making it a proficient vector database:


1. **Vector and traditional store combined: **With pgvector, you can store both vector and traditional data, ensuring compatibility with traditional database features while providing newer capabilities.

2. **Advanced search: **Pgvector stores data as vectors, enabling various nearest-neighbor search algorithms for exact or approximate searches (the latter are used when the search space is extensive), such as L2, inner product, and cosine distance. This facilitates the efficient finding of the most similar content based on a given query. Here are all the other search algorithms:


*<u></u>*[*<u>Supported operators</u>*](https://github.com/pgvector/pgvector)*<u></u>*[*<u> in pgvector</u>*](https://github.com/pgvector/pgvector)


3. **Integration with PostgreSQL features:** Pgvector seamlessly integrates with standard PostgreSQL features, enhancing its capabilities, which include the following:

- [*<u>ACID compliance</u>*](https://www.timescale.com/learn/understanding-acid-compliance): ensuring transactional integrity
- *Point-in-time recovery*: enabling database restoration to specific moments
- [*<u>JOIN support</u>*](https://www.timescale.com/learn/strategies-for-improving-postgres-join-performance): facilitating data combination from multiple tables

Enough theory; let’s get into practicality. 


## Setting Up Pgvector for Python

In the previous section, we built an understanding of pgvector. This section will set the foundation for seeing it in action using Python. 


### Set up a PostgreSQL database

If you haven't already, start by [<u>installing a PostgreSQL database</u>](https://www.postgresql.org/download/). Then, [<u>install </u>**<u>psql </u>**<u>on your OS</u>](https://www.timescale.com/blog/how-to-install-psql-on-mac-ubuntu-debian-windows/) to connect to it more easily. **Psql**, a terminal-based interactive program from PostgreSQL, facilitates direct interaction with the PostgreSQL server. This interface allows users to execute SQL statements and manage various database objects seamlessly. 

If you have installed PostgreSQL correctly, executing the following command in the terminal will provide version information:


*PostgreSQL installation verification*


**Note:** You must manually add the path to the environment variables on Windows.

Now, we are ready to connect to the database.


### Connect to database

To connect to the database using **psql**, we will run the following command in the terminal:

`> psql -U postgres
`

It will prompt for the password, and upon successful authentication, you will be connected to the database.


*Connecting to the database using psql*


### Adding the pgvector extension

Let’s add the pgvector extension and see the relevant requirements. Later, we will walk you through how to use pgvector with Python.

**
Installing pgvector
**On Windows, ensure that [<u>C++ support in Visual Studio</u>](https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170#download-and-install-the-tools) is installed. If not, install it from [<u>here</u>](https://www.timescale.com/blog/how-to-install-psql-on-mac-ubuntu-debian-windows/) and check the **Desktop development with C++**. 


*C++ support in Visual Studio*


After this, run the following command in the Command prompt. Make sure it is open in administrator mode.

`> call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat
`

**Note:** The specific path will vary based on the version and edition of Visual Studio.

In the next steps, we will set the root and clone the **pgvector **repository. Then we will use `nmake` to build:

`> set "PGROOT=C:\Program Files\PostgreSQL\16"
> cd %TEMP%
> git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
> cd pgvector
> nmake /F Makefile.win
> nmake /F Makefile.win install`


**Note:** `nmake` is Microsoft's implementation of the make tool. It comes packaged with the C++ development kit. If the path is not recognized in the command prompt, ensure it is added.

After success, using **psql **run the following query:

`postgres=# CREATE EXTENSION VECTOR;
`

To ensure `pgvector` is installed, run this query:

`postgres=# \dx
`

**Importing pgvector to Python**

To use the `pgvector` extension in your Python projects, follow these steps:

1. Install the `pgvector` package in your Python environment. This can be done using `pip`:

` %pip install pgvector


`2. Once installed, import the `pgvector` package in your Python script:

`import pgvector
`

3. Additional packages are required to connect to a PostgreSQL database. A commonly used package is psycopg2.

`import psycopg2

`

4. Depending on your specific requirements, you may need to install and import other packages. For example:

-    For data manipulation and analysis, you might use `pandas`:

`%pip install pandas
`

`import pandas as pd
`

-    For object-relational mapping (ORM), you might use `SQLAlchemy`:

`%pip install SQLAlchemy

from sqlalchemy import create_engine

`These steps will give you the tools and packages to work with `pgvector` and PostgreSQL in your Python environment. 


## Using Pgvector in Python

This section will explore using the `pgvector` extension in PostgreSQL with Python, from connecting to the database to querying. We set up all the installations and imports, so let’s get started. 


### Setting up the connection 

First, we must connect to our PostgreSQL database using `psycopg2`. Here's how you can do it:

`import psycopg2 
conn = psycopg2.connect( host="localhost", 
database="your_database_name", 
user="your_username", 
password="your_password")


`In this snippet, we import the `psycopg2` library and use it to connect to the PostgreSQL server. Replace `your_database_name`, `your_username`, and `your_password` with your actual database credentials.


### Creating a cursor 

Next, create a cursor to execute SQL commands. 

`cur = conn.cursor()
`

The cursor is used to execute SQL commands and fetch results. We can also ensure the connection by the following “hello world” example.

`cur = conn.cursor()
cur.execute("SELECT 'hello world'")
print(cur.fetchone())

`

### Creating a pgvector table 

- **Basic example:** Start by creating a simple table with an embedding vector. In the following snippet, we define and execute an SQL command to create a table named vectors with an embedding column of dimension three (3).

`# Define a SQL command to create a table
create_table_command = """
CREATE TABLE vectors (
    id bigserial primary key,
    embedding vector(3)  -- specify the dimension of the vector
);
"""
# Execute the SQL command
cur.execute(create_table_command)
# Commit the transaction
conn.commit()

`We can confirm the creation of a table by querying `\dt` in **psql**:


*Psql service to ensure the table creation*

**

- **Example with more features**: Let's create a table with more features for language model training. This command creates a table named embeddings with additional columns such as `label`, `url`, `content`, `tokens`, and an `embedding vector` of dimension three (3).


`# Define a SQL command to create a more complex table
create_table_command = """
CREATE TABLE embeddings (
    id bigserial primary key,
    label text,
    url text,
    content text,
    tokens integer,
    embedding vector(3)  -- specify the dimension of the vector
);
"""
# Execute the SQL command
cur.execute(create_table_command)
# Commit the transaction
conn.commit()

`

### Data insertion 

Let's insert some data in the newly created table. In the real world, the embeddings are generated using a model trained on a vast corpus of data. These models understand the semantics behind the words. 

For this example, we resort to a random array. We use the NumPy library to generate and insert random vectors into the embeddings table. Each entry includes a label, URL, content, token count, and a vector.


`import numpy as np

# Define the SQL command for inserting data
insert_command = """
INSERT INTO embeddings (label, url, content, tokens, embedding)
VALUES (%s, %s, %s, %s, %s);
"""

# Create sample data
data = [
    ("label1", "http://example.com/1", "The article on dogs", 100, np.random.rand(3).tolist()),
    ("label2", "http://example.com/2", "The article on cats", 150, np.random.rand(3).tolist()),
    ("label3", "http://example.com/3", "The article on cars", 200, np.random.rand(3).tolist()),
    ("label4", "http://example.com/4", "The article on books", 250, np.random.rand(3).tolist()),
    ("label5", "http://example.com/5", "The article on embeddings", 300, np.random.rand(3).tolist())
]

# Insert data into the table
for record in data:
    cur.execute(insert_command, record)
# Commit the transaction
conn.commit()

`

Let's check the data stored in the `embeddings` table. We will use `fetchall()` to retrieve the data from the table.

`fetch_contents = """
select * from embeddings
"""
# Execute the SQL command
cur.execute(fetch_contents)
# Commit the transaction
cur.fetchall()

`

### Querying similar objects 

Retrieve the top five similar objects from the database using nearest neighbors. In this example, we generate a random query vector to find the top five similar entries in the embeddings table based on vector similarity. Note the explicit type cast`::vector` to ensure the comparison is understood by PostgreSQL.


`# Generate a random query vector
query_vector = np.random.rand(3).tolist()  # Example query vector

print(query_vector)
# Define the SQL command to retrieve similar objects
retrieve_command = """
SELECT content
FROM embeddings
ORDER BY embedding <=> %s::vector
LIMIT 5;
"""

# Execute the command with the query vector
cur.execute(retrieve_command, (query_vector,))
# Fetch the results
similar_objects = cur.fetchall()

# Print the similar objects
for obj in similar_objects:
    print(obj[0])
`


### Closing the connection 

Finally, close the cursor and connection. Closing the cursor and connection ensures that all resources are properly released.

`cur.close()
conn.close()
`


## Supercharged Pgvector With Python

So far, we've explored `pgvector` for vector storage and similarity search in PostgreSQL. But you can experience a supercharged version of `pgvector`: with [<u>Timescale Cloud</u>](https://www.timescale.com/ai), developers can use `pgvector` alongside `pgvectorscale` and [`pgai`](https://www.tigerdata.com/blog/pgai-giving-postgresql-developers-ai-engineering-superpowers), two open-source extensions that turn PostgreSQL into an easy-to-use and high-performance vector database, plus a fully managed cloud database experience.

- Timescale Cloud enhances PostgreSQL for AI applications by packaging `pgvector` as part of its offering. It retains all the capabilities of `pgvector`, such as the vector data type and indexing methods like [`HNSW`](https://www.timescale.com/blog/vector-database-basics-hnsw) and [`IVFFlat`](https://www.timescale.com/blog/nearest-neighbor-indexes-what-are-ivfflat-indexes-in-pgvector-and-how-do-they-work), making it a complement rather than a replacement for `pgvector`. 
- Timescale Cloud introduces new features on top of `pgvector`, such as time-based vector search. This makes it easy to migrate your existing pgvector deployment and take advantage of additional features for scale. You also have the flexibility to create different index types suited to your needs.
- With pgvectorscale, Timescale Cloud accelerates approximate nearest-neighbor (ANN) searches on large-scale vector datasets by incorporating a cutting-edge ANN index inspired by the DiskANN algorithm (StreamingDiskANN). Also included is pgai, which brings more AI workflows to PostgreSQL, making it easier for developers to build search and retrieval-augmented generation (RAG) applications. This complete stack will make the development of AI applications faster, more efficient, and scalable. 

Check out this blog post to [<u>learn more about </u>`<u>pgai</u>`](https://www.timescale.com/blog/pgai-giving-postgresql-developers-ai-engineering-superpowers/) and how it* *brings embedding and generation models closer to the database. We also explained [<u>how </u>`<u>pgvectorscale</u>`<u> makes PostgreSQL faster and cheaper than other specialized vector databases</u>](https://www.timescale.com/blog/pgvector-is-now-as-fast-as-pinecone-at-75-less-cost/), like Pinecone.

Let’s set up **Timescale Cloud** with Python and see it in action.


### Set up Timescale Cloud with Python

1. **Installations: **Ensure you have installed the required Python packages, `timescale_vector`, and `python-dotenv`, using the following command:

`%pip install timescale_vector python-dotenv`

****

2.** Import required libraries:** Import the necessary libraries in your Python script as shown below:

`from dotenv import load_dotenv, find_dotenv
import os
from timescale_vector import client
import uuid
from datetime import datetime, timedelta
`
3.** Setting up Timescale: **To start, [<u>sign up</u>](https://console.cloud.timescale.com/signup), create a new database, and follow the provided instructions. For more information, refer to the [<u>Getting Started with Timescale guide</u>](https://docs.timescale.com/getting-started/latest/services#create-your-timescale-account). 

After signing up, connect to the Timescale database by providing the service URI, which can be found under the service section on the dashboard. The URI will look something like this:


`postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require
`


*Configuration dashboard for connecting to the service*


The password can be created by going to **project settings** and clicking **Create credentials**. 


*Project settings page for creating credentials*

**

4. **Set Up .env File:** Create a .env file in your project directory and include your PostgreSQL credentials in the following format:

`TIMESCALE_SERVICE_URL=your_service_url_here 
`

5. **Load PostgreSQL Credentials:** Load your PostgreSQL credentials from the .env file:

`_ = load_dotenv(find_dotenv(), override=True)
service_url = os.environ['TIMESCALE_SERVICE_URL']
`


### Using Timescale Cloud With Python

A vector can be created with the client that takes the following arguments:

| **Name** | **Description** |
| --- | --- |
| service_url | Timescale service URL / connection string |
| table_name | Name of the table to use for storing the embeddings |
| num_dimensions | Number of dimensions in the vector. |

Initialize the vector client with your service URL, table name, and the number of dimensions in the vector:

`vec = client.Sync(service_url, "embeddings", 3)
`


### Create tables and insert data 

Now, we will create tables for the collection and insert data. The data is represented as follows: 

- UUID serves as a unique identifier for the embedding. 
- Metadata about the embedding is stored in JSON format.
- Text representing the embedding.
- Embedding itself is also included.


`vec.create_tables()
`


`vec.upsert([
    (uuid.uuid1(), {"service": "pgvectors"}, "Pgvector is an open-source vector similarity search extension of PostgreSQL.", [1.0, 1.3, 2.0]),
    (uuid.uuid1(), {"service": "timescale_vector"}, "Timescale Vector amplifies Pgvector's capabilities by integrating it into PostgreSQL for AI applications, adding new functionalities like the Timescale Vector index and time-based vector search, while also enhancing scalability with advanced ANN indexing inspired by the DiskANN algorithm.", [1.0, 10.8, 3.0]),
])
`

Let's ensure that the data is updated in the table as well. We can see it in the **Explorer** tab in the Timescale console:


*Ensuring data insertion in the Timescale console*


### Query similar objects 

Timescale Cloud provides a very simple API for searching similar vectors. As mentioned before, these vectors are generated by specialized models when given data. But in this example, it's just a random array being queried.

`vec.search([2.0, 9.4, 3.0])
`
Here’s the result:

`>> [[UUID('5773dc0f-20d6-11ef-8777-dae2664cc367'),
  {'service': 'timescale_vector'},
  "Timescale Vector amplifies Pgvector's capabilities by integrating it into PostgreSQL for AI applications, adding new functionalities like the Timescale Vector index and time-based vector search, while also enhancing scalability with advanced ANN indexing inspired by the DiskANN algorithm.",
  array([ 1. , 10.8,  3. ], dtype=float32),
  0.006858884829482381],
 [UUID('5773dc0e-20d6-11ef-a334-dae2664cc367'),
  {'service': 'pgvectors'},
  'Pgvector is an open-source vector similarity search extension of PostgreSQL.',
  array([1. , 1.3, 2. ], dtype=float32),
  0.2235118982678087]]
`


## Conclusion

In the article, we briefly understood the concept of vector databases and elaborated on pgvector. The article covered the installation of the extension, and we used Python to play around with it. Later, we used [<u>Timescale Cloud</u>](https://www.timescale.com/ai)—which includes pgvector, pgvectorscale, and pgai—to leverage its faster search capabilities.   

Utilizing pgvector with Python empowers the development of robust machine-learning applications. With [<u>pgai</u>](https://github.com/timescale/pgai/) and [<u>pgvectorscale</u>](https://github.com/timescale/pgvectorscale), Timescale Cloud elevates [PostgreSQL vector](https://www.tigerdata.com/blog/postgresql-as-a-vector-database-using-pgvector) databases to new heights, offering enhanced capabilities and performance at scale. 


Pgai and pgvectorscale are both open source under the PostgreSQL License and available for you to use in your AI projects today. You can find installation instructions on the [<u>pgai</u>](https://github.com/timescale/pgai/?ref=timescale.com) and [<u>pgvectorscale</u>](https://github.com/timescale/pgvectorscale) GitHub repositories (Git ⭐s welcome!).  You can also access them on any database service on [<u>Timescale’s cloud PostgreSQL platform</u>](https://console.cloud.timescale.com/signup?ref=timescale.com).