Skip to main content

Import data

Code

This example imports the movie data into our collection.

import weaviate from "weaviate-client";
import { generateUuid5 } from "weaviate-client";

let client: WeaviateClient;

// Instantiate your client (not shown). e.g.:
// client = weaviate.connectToWeaviateCloud(...) or
// client = weaviate.connectToLocal(...)

const dataUrl = "https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024.json"
const textResponse = await fetch(dataUrl)
const data = await textResponse.json()

// Get current file's directory
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const imgDir = join(__dirname, "images");


// Create directory if it doesn't exist
await fs.mkdir(imgDir, { recursive: true });

// Download images
const postersUrl = "https://raw.githubusercontent.com/weaviate-tutorials/edu-datasets/main/movies_data_1990_2024_posters.zip";
const postersPath = join(imgDir, "movies_data_1990_2024_posters.zip");

const response = await fetch(postersUrl);
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const arrayBuffer = await response.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);

// Write the zip file
await fs.writeFile(postersPath, buffer);

// Unzip the files
const zip = new AdmZip(postersPath);
zip.extractAllTo(imgDir, true);

// Get the collection
const movies = client.collections.get("Movie")

// Set a counter and initialize Weaviate Object
let itemsToInsert: Object[] = []
let counter = 0;

// Iterate through data
for (const key of Object.keys(data['title'])) {

counter++;
if (counter % 20 == 0)
console.log(`Import: ${counter}`)

let genreIds: []

// Format genre_ids and release_date
const parsedArray = JSON.parse(data['genre_ids'][key]);
genreIds = parsedArray.map((item: string) => parseInt(item, 10));
let releaseDate = new Date(data['release_date'][key])

const imgPath = join(imgDir, `${data['id'][key]}_poster.jpg`)
// Convert poster to base64
const posterBase64 = await toBase64FromMedia(imgPath)

// Build the object payload
let movieObject = {
title: data['title'][key],
overview: data['overview'][key],
vote_average: data['vote_average'][key],
genre_ids: genreIds,
release_date: releaseDate,
tmdb_id: data['id'][key],
poster: posterBase64
}
// Insert
let objectToInsert = {
properties: movieObject,
uuid: generateUuid5(data['title'][key])
}

// Add object to batching array
itemsToInsert.push(objectToInsert)

if (itemsToInsert.length == 20) {
try {
const response = await movies.data.insertMany(itemsToInsert);
// END Insert
// Handle Errors // Insert
if (response.hasErrors) {
throw new Error("Error in batch import!");
}
// END Insert // END Handle Errors
// Insert
console.log(`Successfully imported batch of ${itemsToInsert.length} items`);
itemsToInsert = [];
} catch (error) {
console.error('Error importing batch:', error);
}
}
// ... other operations
}


client.close()

The code:

  • Loads the source text and image data
  • Gets the collection
  • Loops through the data and:
    • Finds corresponding image to the text
    • Converts the image to base64
    • Bulk inserts objects in batches of 20
  • Prints out any import errors

Explain the code

Preparation

We use the native Node.js fetch() to load the data from the source, in this case a JSON file containing text data and a Zip file containing posters. The text data is then converted to a JSON object for easier manipulation and the images are extracted from the Zip file.

Then, we create a collection object (with client.collections.get) so we can interact with the collection.

Iterating over data

The for loop is used in conjunction with Object.keys() to iterate through the elements in our JSON file. While iterating we increment the counter variable that lets us bulk insert objects in batches.

for (const key of Object.keys(data['title'])) {

counter++;
if (counter % 20 == 0)
console.log(`Import: ${counter}`)
// ... other operations
}


client.close()
// END BatchImportData

Add data to the Object

Convert data types and build the Object

The data is converted from a string to the correct data types for Weaviate. For example, the release_date is converted to a Date object, and the genre_ids are converted to a list of integers.

  const parsedArray = JSON.parse(data['genre_ids'][key]);
genreIds = parsedArray.map((item: string) => parseInt(item, 10));
let releaseDate = new Date(data['release_date'][key])

const imgPath = join(imgDir, `${data['id'][key]}_poster.jpg`)

To save the image data as a BLOB (binary large object) data type, we convert the image to base64 using the helpful toBase64FromMedia utility that comes with the Weaviate client.

  const posterBase64 = await toBase64FromMedia(imgPath)

// Build the object payload
let movieObject = {
title: data['title'][key],
overview: data['overview'][key],
vote_average: data['vote_average'][key],
genre_ids: genreIds,
release_date: releaseDate,
tmdb_id: data['id'][key],
poster: posterBase64
}

After converting data to the correct format, we build the object by its properties preparing it to be inserted into Weaviate.

Bulk insert data

Then we create on object that includes the uuid generated with generateUuid5 from Weaviate and the object containing properties we previously define, we push this object to itemsToInsert for them to be bulk inserted with insertMany() once the batch is ready.

  let objectToInsert = {
properties: movieObject,
uuid: generateUuid5(data['title'][key])
}

// Add object to batching array
itemsToInsert.push(objectToInsert)

if (itemsToInsert.length == 20) {
try {
const response = await movies.data.insertMany(itemsToInsert);
if (response.hasErrors) {
throw new Error("Error in batch import!");
}
console.log(`Successfully imported batch of ${itemsToInsert.length} items`);
itemsToInsert = [];
} catch (error) {
console.error('Error importing batch:', error);
}
}

Error handling

If you have any errors in your bulk insertion, you want to know something went wrong. That way you can decide how to handle them, such as by raising an exception. In this example, we simply print out that there was an error with the import.

  if (response.hasErrors) {
throw new Error("Error in batch import!");
}

Where do the vectors come from?

When sends the items to Weaviate, the objects are added to the collection. In our case, the movie collection.

Recall that the collection has a vectorizer module, and we do not specify vectors here. So Weaviate uses the specified vectorizer to generate vector embeddings from the data.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.