Create and Load a Weaviate Cluster

July 2, 2024 · 3 min read

Senior iOS Developer

Developer Growth Engineer

Set up a Weaviate Cluster on WCD and load in book data using a python data pipeline that can be searched through your iOS and other Apple Ecosystem Applications.

In this article we will setup a Weaviate Cluster and load the same books data used in the article IOS Intro - Search with Benefits.

You can download the resources used in this blog here.

external-builder-image

Create a Cluster

Follow account setup and cluster creation as explained in the Quickstart Tutorial and take note of the following.

REST Endpoint URL

create-a-cluster-1

Weaviate API Key

Create XCode Project

Create an Xcode Project using External Builder as described in External Builder (Xcode) for Python.

create-xcode-project

Install Weaviate Client

From the root directory of your Xcode Project, install the Weaviate Client.

pip install -U weaviate-client

Refer to the section Install Python Packages and Setup a Virtual Environment in the article External Builder (Xcode) for Python for guidance

OpenAI Embeddings Access

Create an OpenAI API Key and note the number created.

https://www.howtogeek.com/885918/how-to-get-an-openai-api-key/

OpenAI is needed to translate text to vector embeddings for book data.

As of this article, it will cost less than 10 cents to use this one-time (create vector input to be loaded into the cluster) OpenAI feature.

Enter Run Arguments

As described in the article External Builder (Xcode) for Python enter the Run Arguments.

The Cohere key can remain blank.

enter-run-arguments

Support Files

What is included: A python script main.py to define the cluster and load book data, and book data 7k-books-kaggle.csv.

These files were copied from Adam’s (of Weaviate) GitHub project

The python script was modified slightly for clarity.

https://github.com/weaviate/BookRecs

Copy these files to your Xcode project.

support-files

Book Data

A CSV of approximately 7,000 books.

book-data

Python Script to Create and Load Cluster

The main.py python script will do the following

Define the schema of a Books Custer Create vector embeddings of Book Data Load embeddings into the Books Cluster

Each step is logged in the console.

The longest step to run is the “Load Data” step which may take a couple of minutes.

Don’t forget to make sure that the virtual environment is activated before running.

source venv/bin/activate

python-execution-logs

main.py: Startup

API Keys are specified.

main-py-image-startup

main.py: Cluster Defined

Defines the Weaviate Books cluster.

The text2vec-openai parameter defines a vectorizer to convert text into numerical vectors.

The specific OpenAI model used for embedding is ADA Version 2.

https://openai.com/index/new-and-improved-embedding-model

main-py-image-collection

main.py: Read Book Data

Read the file of Book Data.

Make sure to define your directory path.

main-py-image-read-books

main.py: Load Data

Properties are mapped from the data file to the Weaviate cluster.

For each book record, text is converted to a vector and loaded into the Weaviate cluster.

main-py-image-load-data

main.py: Clean-up

Close the book data file and end the script.

Stay connected

Thank you so much for reading! If you would like to talk to us more about this topic, please connect with us:

Community forum
GitHub Don’t forget to give us a ⭐️!
Slack
X (Twitter)
LinkedIn

Quckstart​

Running Python on your Mac​

External Builder (Xcode) for Python​

Create a Cluster​

Create XCode Project​

Install Weaviate Client​

OpenAI Embeddings Access​

Enter Run Arguments​

Support Files​

Book Data​

Python Script to Create and Load Cluster​

main.py: Startup​

main.py: Cluster Defined​

main.py: Read Book Data​

main.py: Load Data​

main.py: Clean-up​

Stay connected​

Quckstart

Running Python on your Mac

External Builder (Xcode) for Python