Import data
Code
This example imports the movie data into our collection.
import weaviate from "weaviate-client";
import { generateUuid5 } from "weaviate-client";
let client: WeaviateClient;
// Instantiate your client (not shown). e.g.:
// client = weaviate.connectToWeaviateCloud(...) or
// client = weaviate.connectToLocal(...)
const dataUrl = "https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024.json"
const response = await fetch(dataUrl)
const data = await response.json()
// Get the collection
const movies = client.collections.get("Movie")
// Set a counter and initialize Weaviate Object
let itemsToInsert: Object[] = []
let counter = 0;
// Iterate through data
for (const key of Object.keys(data['title'])) {
counter++;
if(counter % 1000 == 0)
console.log(`Import: ${counter}`)
let genreIds: []
// Format genre_ids and release_date
const parsedArray = JSON.parse(data['genre_ids'][key]);
genreIds = parsedArray.map(item => parseInt(item, 10));
let releaseDate = new Date(data['release_date'][key])
// Build the object payload
let movieObject = {
title: data['title'][key],
overview: data['overview'][key],
vote_average: data['vote_average'][key],
genre_ids: genreIds,
release_date: releaseDate,
tmdb_id: data['id'][key],
}
// Insert
let objectToInsert = {
properties: movieObject,
uuid: generateUuid5(data['title'][key])
}
// Add object to batching array
itemsToInsert.push(objectToInsert)
if(itemsToInsert.length == 2000) {
// Batch insert 2000 items and clear batch array
const response = await movies.data.insertMany(itemsToInsert)
itemsToInsert = []
if(response.hasErrors) {
throw new Error("Something went wrong in import!")
}
}
// ... other operations
}
// insert the remaining objects
if(itemsToInsert.length > 0) {
// Batch insert any remaining items
const response = await movies.data.insertMany(itemsToInsert)
console.log("Done Importing")
if(response.hasErrors) {
throw new Error("Something went wrong in import!")
}
}
client.close()
The code:
- Loads the source data & gets the collection
- Loops through the data and adds objects to the batcher
- Prints out any import errors
Explain the code
Preparation
We use the requests library to load the data from the source, in this case a JSON file.
Then, we create a collection object (with client.collections.get
) so we can interact with the collection.
Iterating over data
The for
loop is used in conjunction with Object.keys()
to iterate through the elements in our JSON file. While iterating we increment the counter variable that lets us bulk insert objects in batches.
for (const key of Object.keys(data['title'])) {
counter++;
if(counter % 1000 == 0)
console.log(`Import: ${counter}`)
// ... other operations
}
Add data to the Object
Convert data types and build the Object
The data is converted from a string to the correct data types for Weaviate. For example, the release_date
is converted to a Date
object, and the genre_ids
are converted to a list of integers.
const parsedArray = JSON.parse(data['genre_ids'][key]);
genreIds = parsedArray.map(item => parseInt(item, 10));
let releaseDate = new Date(data['release_date'][key])
// Build the object payload
let movieObject = {
title: data['title'][key],
overview: data['overview'][key],
vote_average: data['vote_average'][key],
genre_ids: genreIds,
release_date: releaseDate,
tmdb_id: data['id'][key],
}
After converting data to the correct format, we build the object by its properties preparing it to be inserted into Weaviate.
Bulk insert data
Then we create on object that includes the uuid generated with generateUuid5
from Weaviate and the object containing properties we previously define, we push this object to itemsToInsert
for them to be bulk inserted with insertMany()
once the batch is ready..
let objectToInsert = {
properties: movieObject,
uuid: generateUuid5(data['title'][key])
}
// Add object to batching array
itemsToInsert.push(objectToInsert)
if(itemsToInsert.length == 2000) {
// Batch insert 2000 items and clear batch array
const response = await movies.data.insertMany(itemsToInsert)
itemsToInsert = []
if(response.hasErrors) {
throw new Error("Something went wrong in import!")
}
}
Error handling
If you have any errors in your bulk insertion, you want to know something went wrong. That way you can decide how to handle them, such as by raising an exception. In this example, we simply print out that there was an error with the import.
if(response.hasErrors) {
throw new Error("Something went wrong in import!")
}
Where do the vectors come from?
When the batcher sends the queue to Weaviate, the objects are added to the collection. In our case, the movie collection.
Recall that the collection has a vectorizer module, and we do not specify vectors here. So Weaviate uses the specified vectorizer to generate vector embeddings from the data.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.