Skip to main content

Managing resources (Hot, Warm, Cold)

Weaviate provides flexible resource management features that help you to balance search speeds; search accuracy and recall; and system resource costs.

This guide provides an overview of these topics to help you make allocate resources effectively:

Resource management Tips
  • Start with the dynamic index type unless you have a reason not to.
  • Consider vector compression techniques if some loss of accuracy is acceptable.
    • This will improve query speeds.
    • For HNSW indexes, this will reduce memory usage.
  • Avoid overprovisioning storage. Especially hot storage.

Storage Tiers - Temperatures

Storage Tiers

We categorize storage resources using three tiers: hot, warm, and cold. Each tier has different performance characteristics and costs.

TierVector
Index Type
Vector
Compression
Tenant StateStoragePerformanceCost
🟥 HotHNSWPQ, SQ, BQActiveMemoryFastestHigh
🟨 WarmFlatBQActiveSSDSlowerModerate
🟦 ColdAnyAnyInactiveCloudResource not availableLow

🟥 Hot

Storage Tiers - Hot

  • Describes memory usage
  • Fastest and most expensive
  • Primarily driven by HNSW vector indexes
  • Always available (active) for use
  • Costs increase rapidly with scale

🟨 Warm

Storage Tiers - Warm

  • Describes data stored on disk (SSD)
  • Slower than hot tier but less expensive
  • Driven by flat vector index, object data, and inverted indexes
  • Always available (active) for use
  • Costs increase more slowly than hot tier as data grows

🟦 Cold

Storage Tiers - Cold

Offloading: AWS S3 only

As of Weaviate v1.26.0, tenants can only be offloaded to cold storage in AWS S3. Additional storage options may be added in future releases.

To offload a tenant, use the offload-s3 module.

  • Describes data stored in cloud storage
  • Slowest and least expensive tier
  • Primarily driven by offloaded tenants
  • Resources are not available (inactive) for use
  • Requires reactivation to access

Resource Management - Key Factors

Effective resource management in Weaviate involves balancing performance, cost, and data accessibility. The key levers to manage resources are:

  • Vector index types: Choose the right index type based on the number of objects and desired performance.
  • Vector compression: Use compression techniques to reduce memory usage and improve query performance at the cost of some accuracy.
  • Tenant states: Manage tenant states to balance cost and performance.

Vector index types

The choice of vector index type can have a significant impact on performance and resource usage. Weaviate supports the following index types:

Index TypeResource UsagePerformanceSuitable forDescription
HNSW🟥 HotFastAny object countA memory-based, fast index (read more)
Flat🟨 WarmMedium<~10k objectsA disk-based, brute-force index (read more)
DynamicDependsDependsAny object countTransitions from flat to HNSW index at a specified threshold (read more)

The choice of index type depends on the number of objects and the desired performance. As a rule of thumb, use the following guidelines for a multi-tenant collection:

If you are unsure which index type to use, the dynamic index type is a good starting point, as it automatically transitions from a flat to an HNSW index based on the number of objects.

Vector compression

Vector compression techniques reduce the size of vectors by quantizing them into a smaller representation.

This can have the impact of reducing memory usage, or improving performance by reducing the amount of data that needs to be read from disk. The trade-off is that the resulting search quality may be lower.

Weaviate supports the following vector compression methods:

Compression MethodIndex TypeRequires TrainingDescription
Product Quantization (PQ)HNSWYesEach vector becomes an array of integer-based centroids (read more)
Binary Quantization (BQ)HNSW, FlatNoEach vector dimension becomes a bit (read more)
Scalar Quantization (SQ)HNSWYesEach vector dimension becomes an integer (read more)

As a starting point, use the following guidelines for selecting a compression method:

If you are unsure which index type to use, scalar quantization is a good starting point, provided that you have a representative sample of your likely final dataset.

Tenant states

Multi-tenant collections enable you to efficiently manage isolated subsets of data. Each tenant share the same schema and configuration.

Weaviate supports the following tenant states:

Tenant stateCRUD & QueriesVector IndexInverted IndexObject DataTime to ActivateDescription
Active (default)YesHot/WarmWarmWarmNoneTenant is available for use
InactiveNoWarmWarmWarmFastTenant is locally stored but not available for use
OffloadedNoColdColdColdSlowTenant is stored in cloud storage and not available for use

Hot tenants can be deactivated to warm storage to reduce memory usage, and any tenant can be offloaded to cold storage to reduce memory and disk usage. Conversely, any tenant can be reactivated when needed.

Consider a strategy of deactivating tenants that are not frequently accessed, and offloading tenants that are rarely accessed.

Tips

Best Practices

  • Start with the dynamic index type for new collections. This is particularly useful for multi-tenant collections, as it allows each tenant to use the most appropriate index type.
  • Use vector compression techniques to optimize storage and query performance, especially for large collections or tenants.
  • Conduct thorough testing when changing index types or compression methods to ensure performance meets your requirements.

Common Pitfalls

  • Overprovisioning hot storage: Keeping all data in hot storage can lead to unnecessary costs. Regularly assess what data truly needs the fastest access.
  • Neglecting to plan for growth: Not anticipating data growth can lead to performance issues. Always design your resource management strategy with scalability in mind.
  • Improper tenant management: In multi-tenant scenarios, forgetting to offload inactive tenants can lead to resource waste. Implement automated processes to manage tenant states based on usage patterns.
  • Mismatch between quantization techniques, model and data: When using compression technique, ensure that the quantization technique is compatible with the model (e.g. BQ) and that the data is sufficient and representative for training (e.g. PQ, SQ).

Questions and feedback

If you have any questions or feedback, let us know in the user forum.