BlockMax WAND migration guide
This comprehensive guide covers how to migrate existing data to use the BlockMax WAND algorithm for the inverted index in Weaviate v1.30
and above.
BlockMax WAND offers improved search performance for BM25 and hybrid searches. While new collections created in Weaviate v1.30
or newer will automatically use this optimized format, existing collections created before v1.30
require migration to take advantage of these improvements.
To learn more about the technical details of this change, you can refer to the BlockMax WAND blog post.
Prerequisites
Before beginning the migration, ensure that:
- You have a Weaviate version before
v1.30
that doesn't use the BlockMax WAND algorithm - You're on the latest version of Weaviate before migrating (currently
1.30.0
)
Migration considerations
Before starting the migration process, be aware of the following:
- Migration is recommended primarily if you use BM25 or hybrid search and aren't satisfied with your current query times
- Migration will increase server resource load, especially during the re-indexing stage
- Migration can take hours for shards with millions of documents, depending on the number of searchable properties
- If possible, schedule migration to avoid large data imports/updates/deletes, especially during the re-indexing stage
- BlockMax WAND may return slightly different scores than WAND if you search on multiple properties or have a delete-heavy workflow, as it computes term statistics differently
Migration process
The migration involves three main stages:
- Re-indexing: Converting the internal representation from
mapcollection
to the invertedblockmax
format - Swapping: Making the system use the new format while maintaining the ability to roll back
- Cleanup: Cleaning up old data after successful migration
Stage 1: Re-indexing
During this stage:
- Existing data will be re-ingested in the new BlockMax format
- New incoming data/updates/deletes will be persisted in both old and BlockMax formats (double writing)
- Searches will continue using the old format
- Restart your Weaviate instance with the following configuration:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=weaviate
If your node has a different name than "weaviate", replace it with your actual node name.
Monitor the migration progress by checking:
http://<node_name>:<<debug_endpoint_port>/debug/index/rebuild/inverted/status?collection=<collection_name>
Replace:
<node_name>
with the name of your Weaviate node<debug_endpoint_port>
with the actual port (default is6060
)<collection_name>
with the name of the collection being migrated.
- Track the migration status through the returned JSON:
- Initially, you may see
collection not found or not ready
- During migration, the status will be
in progress
- Progress updates appear in the
latest_snapshot
field approximately every 5 minutes. If not, there is likely an error in the process, and you will need to check the logs for more information.
{
"shards": [
{
"latest_snapshot": "2025-03-31T16:34:08.477558Z, 00000000-0000-0000-0000-00000003b38f, all 241446, idx 241446",
"message": "migration started recently, no snapshots yet",
"properties": "<searchable properties to migrate>",
"shard": "<shard path>",
"snapshot_count": "1",
"objects_migrated": "241446",
"start_time": "2025-03-31T16:33:20.21005Z",
"status": "in progress"
}
]
}- When re-indexing completes, the
status
field will change toreindexed
and the message will showreindexing done, merging buckets
- Initially, you may see
Stage 2: Swapping
During this stage:
- The system begins using the BlockMax format
- Double writing continues
- Old format data is kept on disk
- Rolling back to the old format remains possible
After re-indexing is finished (status
is reindexed
), proceed to swap buckets:
- Restart your Weaviate instance with these settings:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=weaviate
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=weaviate
After restart, check the status endpoint again:
- The status will briefly change to
merged
(may be too fast to see in logs) - Then it will show
done
- The status will briefly change to
At this point, keyword search on your node will start using BlockMax WAND by default
- We recommend testing BM25 and hybrid search performance and results before proceeding to the next stage
Stage 3: Cleanup
Perform this step only after detailed validation and verification that your migration completed successfully. After this step, rolling back is no longer possible.
During this stage:
- Double writing is disabled, using BlockMax format only
- Old format data is cleaned from disk
When everything is working as expected and you've validated keyword search functionality, restart all nodes with:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=true
REINDEX_MAP_TO_BLOCKMAX_TIDY_BUCKETS=true
Multi-node deployments
For multi-node servers deployed with the same configuration, you can migrate one server at a time:
- Set the node name for the current node being migrated.
- Maintain a comma-separated list of already migrated nodes.
For a node currently being migrated (<node_name>
) and previously migrated nodes (<migrated_node_names>
):
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=<migrated_node_names>,<node_name>
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=<migrated_node_names>,<node_name>
Example steps for multi-node migration
With nodes named weaviate-0
, weaviate-1
, weaviate-2
, etc.
- Migrate
weaviate-0
:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=weaviate-0
- Once
weaviate-0
is done, migrateweaviate-1
:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=weaviate-0,weaviate-1
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=weaviate-1
- Once
weaviate-1
is done, migrateweaviate-2
:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=weaviate-0,weaviate-1,weaviate-2
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=weaviate-1,weaviate-2
Repeat the process for other nodes.
At the end of migration, before cleaning up, you can use
true
instead of the full node list:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=true
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=true
Multi-tenancy specific notes
Due to the dynamic nature of tenants, multi-tenancy collections behave slightly differently:
- With
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP
set, tenants will be migrated as they are activated - Deactivation stops migration and should be avoided as the migration may not make enough progress if the tenant is only activated for a short period
- Reactivation of a tenant (active → cold → active) is equivalent to a restart:
- To swap buckets on a tenant, you need to reactivate it (with
REINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=true
set) - For tidying up, if
REINDEX_MAP_TO_BLOCKMAX_TIDY_BUCKETS=true
is set, a swapped tenant will tidy on reactivation - Setting server variables between steps still requires a restart
- To swap buckets on a tenant, you need to reactivate it (with
- Tenants created during migration will be created in the old format and migrated like other tenants
- They'll start using BlockMax format by default if
REINDEX_MAP_TO_BLOCKMAX_TIDY_BUCKETS
is set
- They'll start using BlockMax format by default if
- After the final tidying step, the server should keep all migration variables set to ensure all tenants are eventually migrated
Monitoring migration status
The migration process progresses through several stages, which can be monitored via the status endpoint:
Not active (Multi-tenancy only)
- Status:
shard_not_loaded
- Message:
shard not loaded
- Tenant isn't active
- Status:
Not Started
- Status:
not_started
- Message:
no searchable_map_to_blockmax found
orno started.mig found
- No migration files exist yet or process hasn't been initiated
- Status:
Started
- Status:
started
- Records start time from started.mig
- If properties.mig doesn't exist, message is
computing properties to reindex
- Status:
In Progress
- Status:
in progress
- Tracks progress through
progress.mig.*
files (snapshots) - Updates
latest_snapshot
approximately every 15 minutes - If no progress files found, message is
no progress.mig.* files found, no snapshots created yet
- Status:
Reindexed
- Status:
reindexed
- Message:
reindexing done
orreindexing done, merging buckets
- Indicates reindexing is complete
- Status:
Merged
- Status:
merged
- Message:
merged reindex and ingest buckets
- Buckets have been merged but not yet swapped
- Status:
Swapped
- Status:
swapped
- Message:
swapped buckets
orswapped X files
- Multiple
swapped.mig.*
files may exist
- Status:
Done
- Status:
done
- Message:
reindexing done
- Final state indicating migration is complete
- Status:
Error (can occur at any stage)
- Status:
error
- Various error messages depending on which file operation failed
- Status:
Troubleshooting
Weaviate crashes during migration
- The conversion will resume when Weaviate is restarted if the migration variables are still set
- If your environment variables are reset during restart, the migration will stop
- With variables unset and new data incoming, you'll need to restart the migration process from the beginning
Aborting migration and rolling back
If you have issues with the reindexing process and want to stop it:
Call the abort endpoint:
http://<node_name>:<<debug_endpoint_port>/debug/index/rebuild/inverted/abort
To fully stop and rollback data to the initial stage, restart the server with:
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=true
REINDEX_MAP_TO_BLOCKMAX_ROLLBACK=true
Rolling back is only possible before cleanup!
Environment variable reference
REINDEX_MAP_TO_BLOCKMAX_AT_STARTUP=<node_names>
: Enables and starts the migration process that converts theproperty_<property_name>_searchable
from mapcollection to the inverted blockmax format (double writes are done). Required for otherREINDEX_MAP_TO_BLOCKMAX_*
variables to workREINDEX_MAP_TO_BLOCKMAX_SWAP_BUCKETS=<node_names>
: Swapsmapcollection
buckets to inverted/blockmax
, keeping double writes. Only runs on restart and if migration is finishedREINDEX_MAP_TO_BLOCKMAX_UNSWAP_BUCKETS=<node_names>
: Unswaps inverted/blockmax
buckets back tomapcollection
, keeping double writes. Only runs on restart and if buckets were already swappedREINDEX_MAP_TO_BLOCKMAX_TIDY_BUCKETS=<node_names>
: Deletes themapcollection
buckets and stops double writesREINDEX_MAP_TO_BLOCKMAX_ROLLBACK=<node_names>
: Rollback migration process, restoresmapcollection
buckets (if not yet tidied) and removes created inverted/blockmax
buckets
Using BlockMax WAND in v1.29
(technical preview)
BlockMax WAND algorithm is available in v1.29
as a technical preview. We do not recommend using this feature in production environments in this version and suggest you use v1.30+
.
To use BlockMax WAND in Weaviate v1.29
, it must be enabled prior to collection creation. As of this version, Weaviate will not migrate existing collections to use BlockMax WAND.
Enable BlockMax WAND by setting the environment variables USE_BLOCKMAX_WAND
and USE_INVERTED_SEARCHABLE
to true
.
Now, all new data added to Weaviate will use BlockMax WAND for BM25 and hybrid searches. However, preexisting data will continue to use the default WAND algorithm.