Skip to main content

PQ vector compression

note

Starting in v1.23, AutoPQ simplifies configuring PQ on new collections.

Product quantization (PQ) is a form of data compression for vectors. PQ reduces the memory footprint of a vector index, so enabling PQ for HNSW lets you work with larger datasets. For a discussion of how PQ saves memory, see HNSW with compression.

PQ makes tradeoffs between recall, performance, and memory usage. This means a PQ configuration that reduces memory may also reduce recall. There are similar trade-offs when you use HNSW without PQ. If you use PQ compression, you should also tune HNSW so that they compliment each other.

To configure HNSW, see Configuration: Vector index .

Enable PQ compression

AutoPQ is a feature that streamlines PQ configuration for new collections. AutoPQ is not currently available in Weaviate Cloud Services (WCS).

If you are using WCS, or cannot enable asynchronous indexing, you can still use the manual, two phase method to enable PQ.

Configure AutoPQ

Added in v1.23.0

If you have a new collection, enable AutoPQ. AutoPQ automates the PQ training step so you don't have to load your data in two phases.

1. Set the environment variable

AutoPQ requires asynchronous indexing. To enable AutoPQ, set the environment variable ASYNC_INDEXING=true and restart your Weaviate instance. You cannot enable AutoPQ without asynchronous indexing.

AutoPQ is not currently available in WCS.

2. Configure PQ

To enable PQ for a collection, update your collection definition. Once you enable PQ, AutoPQ automates the PQ training step for you.

For additional configuration options, see the PQ parameters.

import weaviate.classes as wvc

jeopardy = client.collections.get("Question")
jeopardy.config.update(
vector_index_config=wvc.config.Reconfigure.VectorIndex.hnsw(
quantizer=wvc.config.Reconfigure.VectorIndex.Quantizer.pq()
)
)



client.close()

3. Load your data

Load your data. You do not have to load an initial set of training data. AutoPQ creates the PQ codebook when the object counts reach the training limit. By default, the training limit is 100,000 objects per shard.

Manually configure PQ

If you cannot enable AutoPQ, use the manual method to enable PQ. When you manually configure PQ on a new collection, be sure to import a set of 10,000 to 100,000 training objects per shard before you enable PQ.

Weaviate logs messages when PQ is enabled and when vector compression is complete. Do not import the rest of your data until the training step is complete.

To manually enable PQ compression, follow these steps:

The next few sections work through these steps.

1. Configure an initial schema without PQ

Use one of the Weaviate client libraries to connect to your instance.

Every collection in your Weaviate instance is defined by a schema. Weaviate uses the schema during your initial data load.

client.collections.create(
name="Question",
description="A Jeopardy! question",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
generative_config=wvc.config.Configure.Generative.openai(),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
],
)


client.close()

2. Load some training data

If you are starting with a new collection, load between 10,000 and 100,000 objects from your data set. If you have multiple shards, you need to load between 10,000 and 100,000 objects on each shard.

If you already have data in an existing collection, move to the next step.

When you load data for this training phase, you can use any of the objects in your data set to create the codebook. However, try to chose the objects at random so that they are independent and identically distributed.

Download sample data
Use these scripts to get the data for these examples. If you are configuring your own system, you do not need to import this sample data.
import requests
import json

# Download the data
resp = requests.get(
"https://raw.githubusercontent.com/weaviate-tutorials/intro-workshop/main/data/jeopardy_1k.json"
)

# Load the data so you can see what it is
data = json.loads(resp.text)

# Parse the JSON and preview it
print(type(data), len(data))
print(json.dumps(data[1], indent=2))


client.close()
def parse_data():
object_list = []
for obj in data:
object_list.append(
{
"question": obj["Question"],
"answer": obj["Answer"],
"round": obj["Round"],
}
)

return object_list

jeopardy = client.collections.get("Question")
jeopardy.data.insert_many(parse_data())

# Check upload
response = jeopardy.aggregate.over_all(total_count=True)

# Should equal the number of objects uploaded
print(response.total_count)


client.close()

3. Enable PQ and create the codebook

To enable PQ compression, update your collection (class) schema to set pq_enabled=True (or define a quantizer if you use the Python Client v4.). After you update the schema, Weaviate uses up to pq_training_limit objects to train PQ.

PQ relies on a codebook to compress the original vectors. The codebook defines "centroids" that are used to calculate the compressed vector. If you are not using AutoPQ, you must have some vectors loaded before you enable PQ so Weaviate can define the centroids. The should have from 10,000 to 100,000 vectors loaded before you enable PQ.

To enable PQ, update your schema as shown below. For additional configuration options, see the PQ parameter table.

import weaviate.classes as wvc

jeopardy = client.collections.get("Question")
jeopardy.config.update(
vector_index_config=wvc.config.Reconfigure.VectorIndex.hnsw(
quantizer=wvc.config.Reconfigure.VectorIndex.Quantizer.pq()
)
)



client.close()

4. Load the rest of your data

If you are starting with a new Weaviate instance, you can load the rest of your data after PQ creates the codebook. Weaviate compresses the new data when it adds it to the database.

If you already have data in your Weaviate instance when you create the codebook, Weaviate automatically compresses the remaining objects (the ones after the initial training set).

PQ Parameters

You can configure PQ compression by setting the following parameters at the collection level.

ParameterTypeDefaultDetails
enabledbooleanfalseEnable PQ when true.

The Python client v4 does not use the enabled parameter. To enable PQ with the v4 client, set a quantizer in the collection definition.
trainingLimitinteger100000The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory.
segmentsinteger--The number of segments to use. The number of vector dimensions must be evenly divisible by the number of segments.

Starting in v1.23, Weaviate uses the number of dimensions to optimize the number of segments.
centroidsinteger256The number of centroids to use. Reducing the number of centroids reduces the size of the quantized (PQ compressed) vectors at the price of recall.

If you use the kmeans encoder, centroids is set to 256 (one byte) by default.
encoderstringkmeansEncoder specification. There are two encoders. You can specify the type of encoder as either kmeans(default) or tile.
distributionstringlog-normalEncoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal.

Additional tools and considerations

Change the codebook training limit

For most use cases, 100,000 objects is an optimal training size. There is little benefit to increasing trainingLimit. If you do increase trainingLimit, the training period will take longer. You could also have memory problems if you set a high trainingLimit.

If you have fewer than 100,000 objects per shard and want to enable compression, consider using binary quantization (BQ) instead. BQ is a better choice for smaller data sets.

Check the system logs

When compression is enabled, Weaviate logs diagnostic messages like these.

pq-conf-demo-1  | {"action":"compress","level":"info","msg":"switching to compressed vectors","time":"2023-11-13T21:10:52Z"}

pq-conf-demo-1 | {"action":"compress","level":"info","msg":"vector compression complete","time":"2023-11-13T21:10:53Z"}

If you use docker-compose to run Weaviate, you can get the logs on the system console.

docker compose logs -f --tail 10 weaviate

You can also view the log file directly. Check docker to get the file location.

docker inspect --format='{{.LogPath}}' <your-weaviate-container-id>

Review the current pq configuration

To review the current pq configuration, you can retrieve it as shown below.

jeopardy = client.collections.get("Question")
config = jeopardy.config.get()
pq_config = config.vector_index_config.pq

# print some of the config properties
print(f"Enabled: { pq_config.enabled }")
print(f"Training: { pq_config.training_limit }")
print(f"Segments: { pq_config.segments }")
print(f"Centroids: { pq_config.centroids }")


client.close()

Multiple vectors

Added in v1.24.0

Weaviate collections support multiple, named vectors.

Collections can have multiple, named vectors. Each vector is independent. Each vector space has its own index, its own compression, and its own vectorizer. This means you can create vectors for properties, use different vectorization models, and apply different metrics to the same object.

You do not have to use multiple vectors in your collections, but if you do, you need to adjust your queries to specify which vector you want to use.

Similarly, compression must be enabled independently for each vector. The procedure varies slightly by client language, but in each case the idea is the same. Each vector is independent and can use PQ, BQ, or no compression.