Skip to main content

Configure PQ index compression

Overview

Product quantization (PQ) is a form of data compression that reduces the memory footprint of a vector index. HNSW is an in-memory vector index, so enabling PQ for HNSW lets you work with larger datasets. For a discussion of how PQ saves memory, see this concepts section.

PQ makes tradeoffs between recall, performance, and memory usage. This means a PQ configuration that reduces memory may also reduce recall. There are similar trade-offs when you use HNSW without PQ. If you use PQ compression, you should also tune HNSW so that they compliment each other.

To configure HNSW, see Configuration: Indexes .

To learn how to configure PQ, follow the discussion on this page.

note

Before you enable PQ, be sure to provide a set of vectors to train the algorithm. For details, see Enable and train PQ

Prerequisites

This How-to page uses a dataset of 1000 Jeopardy questions. Download the data.

import requests
import json

# Download the data
resp = requests.get(
"https://raw.githubusercontent.com/weaviate-tutorials/intro-workshop/main/data/jeopardy_1k.json"
)

# Load the data so you can see what it is
data = json.loads(resp.text)

# Parse the JSON and preview it
print(type(data), len(data))
print(json.dumps(data[1], indent=2))

Enable PQ compression

To enable PQ compression, complete the following steps.

  1. Connect to a Weaviate instance
  2. Configure an initial schema without PQ
  3. Load some training data
  4. Enable and train PQ
  5. Load the rest of your data

The next few sections work through these steps.

Step 1. Connect to a Weaviate instance

Use one of the Weaviate client libraries to connect to your instance.

After you install the client, connect to your instance.

import weaviate, os, json
import weaviate.classes as wvc

client = weaviate.connect_to_local(
headers={
"X-OpenAI-Api-Key": os.environ[
"OPENAI_API_KEY"
] # Replace with your OpenAI API key
}
)

client.is_ready()

Weaviate returns True if the connection is successful.

Step 2. Configure an initial schema without PQ

Every collection in your Weaviate instance is defined by a schema. This example defines a collection called Questions. Weaviate uses this schema during your initial data load. After the initial data set is loaded, you modify this schema to enable PQ.

client.collections.create(
name="Question",
description="A Jeopardy! question",
vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(),
generative_config=wvc.Configure.Generative.openai(),
properties=[
wvc.Property(name="title", data_type=wvc.DataType.TEXT),
],
)

Step 3. Load some training data

This example uses a relatively small data set to demonstrate loading data.

If you are starting with a new Weaviate instance, you should load between 10,000 and 100,000 objects from your data set. If you have multiple shards, you need to load between 10,000 and 100,000 objects on each shard.

You can use any of the objects in your data set. If possible, chose the objects at random so that they are independent and identically distributed.

By default Weaviate uses the first 100,000 objects in your database for the training step. If you have more than 100,000 objects Weaviate ignores the excess objects during the training period. However, the excess objects still take up memory. If you have a large dataset, consider training PQ on an initial set of 10,000 to 100,000 objects first and then uploading the rest of your data after PQ is enabled.

If you already have data in your Weaviate instance, you can move ahead to the next step.

def parse_data():
object_list = []
for obj in data:
object_list.append(
{
"question": obj["Question"],
"answer": obj["Answer"],
"round": obj["Round"],
}
)

return object_list

jeopardy = client.collections.get("Question")
jeopardy.data.insert_many(parse_data())

# Check upload
response = jeopardy.aggregate.over_all(total_count=True)

# Should equal the number of objects uploaded
print(response.total_count)

Step 4. Enable and train PQ

You can enable PQ compression by changing the relevant configuration at the collection (i.e. class) level.

PQ relies on a codebook to compress the original vectors. The codebook defines "centroids" that are used to calculate the compressed vector. Weaviate’s PQ implementation uses existing data to train the codebook. You must have some vectors loaded before you enable PQ so Weaviate can use them to define the centroids. You should have 10,000 to 100,000 vectors loaded before you enable PQ.

After you update the schema, Weaviate trains PQ on the first 100,000 objects in your database. To use a different value, set a new trainingLimit. If you increase trainingLimit, the training period will take longer. You could also have memory problems if you set a high trainingLimit.

To change the compression rate, specify the number of segments. The number of vector dimensions must be evenly divisible by the number of segments. Fewer segments means smaller quantized vectors.

For additional configuration options, see the parameter table.

To enable PQ, update your schema as shown below.

import weaviate.classes as wvc

jeopardy = client.collections.get("Question")
jeopardy.config.update(
vector_index_config=wvc.Reconfigure.vector_index(
pq_enabled=True, pq_segments=96, pq_training_limit=100000
)
)

Step 5. Load the rest of your data

If you are starting with a new Weaviate instance, you can load the rest of your data now. Weaviate compresses the new data when it adds it to the database.

If you already have data in your Weaviate instance, Weaviate automatically compresses the remaining objects (the ones after the initial training set).

PQ Parameters

You can configure PQ compression by setting the following parameters at the collection level.

ParameterTypeDefaultDetails
enabledbooleanfalseEnable PQ. Weaviate use product quantization (PQ) compression when true.
trainingLimitinteger100000Object limit. The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory.
segmentsinteger--The number of segments to use. By default segments is equal to the number of vector dimensions. Reducing the number of segments reduces the size of the quantized (PQ compressed) vectors.

The number of vector dimensions must be evenly divisible by the number of segments.
centroidsinteger256The number of centroids to use. Reducing the number of centroids reduces the size of the quantized (PQ compressed) vectors at the price of recall.

If you use the kmeans encoder, centroids is set to 256 (one byte) by default.
encoderstringkmeansEncoder specification. There are two encoders. You can specify the type of encoder as either kmeans(default) or tile.
distributionstringlog-normalEncoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal.

Additional tools and considerations

Check the system logs

When compression is enabled, Weaviate logs diagnostic messages like these.

pq-conf-demo-1  | {"action":"compress","level":"info","msg":"switching to compressed vectors","time":"2023-11-13T21:10:52Z"}

pq-conf-demo-1 | {"action":"compress","level":"info","msg":"vector compression complete","time":"2023-11-13T21:10:53Z"}

If you use docker-compose to run Weaviate, you can get the logs on the system console.

docker compose logs -f --tail 10 weaviate

You can also view the log file directly. Check docker to get the file location.

docker inspect --format='{{.LogPath}}' <your-weaviate-container-id>

Review the current pq configuration

To review the current pq configuration, you can retrieve it as shown below.

jeopardy = client.collections.get("Question")
config = jeopardy.config.get()
pq_config = config.vector_index_config.pq

# print some of the config properties
print(f"Enabled: { pq_config.enabled }")
print(f"Training: { pq_config.training_limit }")
print(f"Segments: { pq_config.segments }")
print(f"Centroids: { pq_config.centroids }")