Skip to main content

Compression strategy

Given the choice of PQ, BQ, or no compression, which should you choose? The answer is, it depends.

PQ and BQ are both lossy compression techniques, and the choice between them depends on your circumstances, your model and the use case.

The index type

PQ is currently only supported for the HNSW index, while BQ is supported for both the HNSW and flat indexes. If you are using the flat index, you will need to use BQ.

Model suitability

PQ is generally a more robust compression technique, as it is fitted on your specific data during the training step. This means that you don't need to worry as much about whether your model is suitable for PQ.

On the other hand, BQ can be more sensitive to the model, such as its length and whether it is designed for binary quantization.

If you do not know whether your model is suitable for BQ, we recommend using PQ.

Tunability

As you've seen, PQ parameters are tunable whereas BQ is not. This means that you can adjust PQ to be more or less aggressive on performance parameters, such as recall and QPS targets, while still benefiting from some compression.

Complexity

If you are looking for the easiest solution to implement, BQ is the way to go. It is a simple configuration that can be enabled immediately, without the need to wait for a training set to be reached.

Conclusion

In summary, your choice of compression technique depends on your circumstances, your model and the use case.

But as a general rule of thumb, if you are not sure which to choose, we recommend using PQ. It is more robust, tunable, and generally more suitable for a wider range of models and use cases.

And if resource constraints are not a concern, you can always choose to use no compression at all. This will give you the best performance, but at the cost of increased resource requirements.

But do note that you will likely not be able to switch on compression later, as it requires a reindexing of the data. (With an exception of PQ, which may be enabled later unless your dataset is too large.)

Questions and feedback

If you have any questions or feedback, let us know in the user forum.