Skip to main content

Product Quantization (compression)

note

Starting in v1.23, AutoPQ simplifies configuring PQ on new collections.

Product quantization (PQ) is a form of data compression for vectors. PQ reduces the memory footprint of a vector index, so enabling PQ for HNSW lets you work with larger datasets. For a discussion of how PQ saves memory, see this page.

PQ makes tradeoffs between recall, performance, and memory usage. This means a PQ configuration that reduces memory may also reduce recall. There are similar trade-offs when you use HNSW without PQ. If you use PQ compression, you should also tune HNSW so that they compliment each other.

To configure HNSW, see Configuration: Vector index .

Enable PQ compression

PQ is configured at a collection level. There are two ways to enable PQ compression:

Configure AutoPQ

Added in v1.23.0

For new collections, use AutoPQ. AutoPQ automates triggering of the PQ training step based on the size of the collection.

1. Set the environment variable

AutoPQ requires asynchronous indexing.

  • Open-source Weaviate users: To enable AutoPQ, set the environment variable ASYNC_INDEXING=true and restart your Weaviate instance.
  • Weaviate Cloud (WCD) users: Enable async indexing through the WCD Console and restart your Weaviate instance.

2. Configure PQ

Specify PQ settings for each collection for which it is to be enabled.

For additional configuration options, see the PQ parameters.

import weaviate.classes.config as wc

client.collections.create(
name="Question",
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
vector_index_config=wc.Configure.VectorIndex.hnsw(
quantizer=wc.Configure.VectorIndex.Quantizer.pq(training_limit=50000) # Set the threshold to begin training
),
properties=[
wc.Property(name="question", data_type=wc.DataType.TEXT),
wc.Property(name="answer", data_type=wc.DataType.TEXT),
],
)


client.close()

3. Load your data

Load your data. You do not have to load an initial set of training data.

AutoPQ creates the PQ codebook when the object count reaches the training limit. By default, the training limit is 100,000 objects per shard.

Manually configure PQ

As an alternative to AutoPQ, you can also manually enable PQ on an existing collection. Upon enabling PQ, Weaviate will train the PQ codebook, using the pre-loaded set of objects.

To manually enable PQ, follow these steps:

How large should the training set be?

When PQ is enabled, Weaviate uses the smaller of training limit or the collection object count to train PQ.

We recommend importing a set of 10,000 to 100,000 training objects per shard before you enable PQ.

note

Weaviate logs messages when PQ is enabled and when vector compression is complete. Do not import the rest of your data until the training step is complete.

The next few sections work through these steps.

1. Configure an initial schema without PQ

Create a collection without specifying a quantizer.

client.collections.create(
name="Question",
description="A Jeopardy! question",
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
generative_config=wc.Configure.Generative.openai(),
properties=[
wc.Property(name="question", data_type=wc.DataType.TEXT),
wc.Property(name="answer", data_type=wc.DataType.TEXT),
],
)


client.close()

2. Load training data

Add objects that will be used to train PQ. Weaviate will use the greater of the training limit, or the collection size, to train PQ.

We recommend loading a representative sample such that the trained centroids are representative of the entire dataset.

3. Enable PQ and create the codebook

Update your collection definition to enable PQ. Once PQ is enabled, Weaviate trains the codebook using the training data.

Which objects are used for training?
  • If the collection has more objects than the training limit, Weaviate randomly selects objects from the collection to train the codebook.
  • If the collection has fewer objects than the training limit, Weaviate uses all objects in the collection to train the codebook.

PQ relies on a codebook to compress the original vectors. The codebook defines "centroids" that are used to calculate the compressed vector. If you are not using AutoPQ, you must have some vectors loaded before you enable PQ so Weaviate can define the centroids. The should have from 10,000 to 100,000 vectors loaded before you enable PQ.

To enable PQ, update your schema as shown below. For additional configuration options, see the PQ parameter table.

import weaviate.classes.config as wc

jeopardy = client.collections.get("Question")
jeopardy.config.update(
vector_index_config=wc.Reconfigure.VectorIndex.hnsw(
quantizer=wc.Reconfigure.VectorIndex.Quantizer.pq()
)
)

client.close()

4. Load the rest of your data

Once the codebook has been trained, you may continue to add data as per normal. Weaviate compresses the new data when it adds it to the database.

If you already have data in your Weaviate instance when you create the codebook, Weaviate automatically compresses the remaining objects (the ones after the initial training set).

PQ Parameters

You can configure PQ compression by setting the following parameters at the collection level.

ParameterTypeDefaultDetails
enabledbooleanfalseEnable PQ when true.

The Python client v4 does not use the enabled parameter. To enable PQ with the v4 client, set a quantizer in the collection definition.
trainingLimitinteger100000The maximum number of objects, per shard, used to fit the centroids. Larger values increase the time it takes to fit the centroids. Larger values also require more memory.
segmentsinteger--The number of segments to use. The number of vector dimensions must be evenly divisible by the number of segments.

Starting in v1.23, Weaviate uses the number of dimensions to optimize the number of segments.
centroidsinteger256The number of centroids to use (max: 256).

We generally recommend you do not change this value.

Due to the data structure used, smaller centroid value will not result in smaller vectors, but may result in faster compression at cost of recall.
encoderstringkmeansEncoder specification. There are two encoders. You can specify the type of encoder as either kmeans (default) or tile.
distributionstringlog-normalEncoder distribution type. Only used with the tile encoder. If you use the tile encoder, you can specify the distribution as log-normal (default) or normal.

Additional tools and considerations

Change the codebook training limit

For most use cases, 100,000 objects is an optimal training size. There is little benefit to increasing trainingLimit. If you do increase trainingLimit, the training period will take longer. You could also have memory problems if you set a high trainingLimit.

If you have a small dataset and wish to enable compression, consider using binary quantization (BQ). BQ is a simpler compression method that does not require training.

Check the system logs

When compression is enabled, Weaviate logs diagnostic messages like these.

pq-conf-demo-1  | {"action":"compress","level":"info","msg":"switching to compressed vectors","time":"2023-11-13T21:10:52Z"}

pq-conf-demo-1 | {"action":"compress","level":"info","msg":"vector compression complete","time":"2023-11-13T21:10:53Z"}

If you use docker-compose to run Weaviate, you can get the logs on the system console.

docker compose logs -f --tail 10 weaviate

You can also view the log file directly. Check docker to get the file location.

docker inspect --format='{{.LogPath}}' <your-weaviate-container-id>

Review the current pq configuration

To review the current pq configuration, you can retrieve it as shown below.

jeopardy = client.collections.get("Question")
config = jeopardy.config.get()
pq_config = config.vector_index_config.pq

# print some of the config properties
print(f"Enabled: { pq_config.enabled }")
print(f"Training: { pq_config.training_limit }")
print(f"Segments: { pq_config.segments }")
print(f"Centroids: { pq_config.centroids }")

client.close()

Multiple vectors

Added in v1.24.0

Weaviate collections support multiple, named vectors.

Collections can have multiple, named vectors. Each vector is independent. Each vector space has its own index, its own compression, and its own vectorizer. This means you can create vectors for properties, use different vectorization models, and apply different metrics to the same object.

You do not have to use multiple vectors in your collections, but if you do, you need to adjust your queries to specify a target vector for vector or hybrid queries.

Similarly, compression must be enabled independently for each vector. The procedure varies slightly by client language, but in each case the idea is the same. Each vector is independent and can use PQ, BQ, or no compression.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.