Retrieval augmented generation (RAG)
Overview
This pages introduces you to retrieval augmented generation (RAG) using Weaviate. It covers:
- What RAG is.
- How to configure Weaviate for RAG.
- How to perform RAG.
- Importing data with RAG in mind.
Prerequisites
This guide assumes some familiarity with Weaviate, but it is not required. If you are new to Weaviate, we suggest starting with the Weaviate Quickstart guide.
Background
What is retrieval augmented generation?
Retrieval augmented generation is a powerful technique that retrieves relevant data to provide to large language models (LLMs) as context, along with the task prompt. It is also called RAG, generative search, or in-context learning in some cases.
Why use RAG?
LLM are incredibly powerful, but can suffer from two important limitations. These limitation are that:
- They can confidently produce incorrect, or outdated, information (also called 'hallucination'); and
- They might simply not be trained on the information you need.
RAG remedies this problem with a two-step process.
The first step is to retrieve relevant data through a query. Then, in the second step, the LLM is prompted with a combination of the retrieve data with a user-provided query.
This provides in-context learning for the LLM, which causes it to use the relevant and up-to-date data rather than rely on recall from its training, or even worse, hallucinated outputs.
Weaviate and retrieval augmented generation
Weaviate incorporates key functionalities to make RAG easier and faster.
For one, Weaviate's search capabilities make it easier to find relevant information. You can use any of similarity, keyword and hybrid searches, along with filtering capabilities to find the information you need.
Additionally, Weaviate has integrated RAG capabilities, so that the retrieval and generation steps are combined into a single query. This means that you can use Weaviate's search capabilities to retrieve the data you need, and then in the same query, prompt the LLM with the same data.
This makes it easier, faster and more efficient to implement RAG workflows in your application.
Examples of RAG
Let's begin by viewing examples of RAG in action. We will then explore how to configure Weaviate for RAG.
We have run this demo with an OpenAI language model and a cloud instance of Weaviate. But you can run it with any deployment method and with any generative AI model integration.
Connect to the instance like so, remembering to replace the API key for the LLM used (OpenAI in this case) with your own API key:
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
import weaviate
from weaviate.classes.init import Auth
import os
client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.getenv("WCD_DEMO_URL"),
auth_credentials=Auth.api_key(api_key=os.getenv("WCD_DEMO_RO_KEY")),
headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")
}
)
import weaviate
import os
client = weaviate.Client(
url=os.getenv("WCD_DEMO_URL"),
auth_client_secret=weaviate.auth.AuthApiKey(api_key=os.getenv("WCD_DEMO_RO_KEY")),
additional_headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")
}
)
import weaviate, { WeaviateClient } from 'weaviate-client';
const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
'https://WEAVIATE_INSTANCE_URL', // Replace with your Weaviate endpoint
{
authCredentials: new weaviate.ApiKey('YOUR-WEAVIATE-API-KEY'), // Replace with your Weaviate instance API key
headers: {
'X-OpenAI-Api-Key': process.env.OPENAI_API_KEY || '', // Replace with your inference API key
}
}
)
import weaviate, { WeaviateClient, ApiKey } from 'weaviate-ts-client';
const client: WeaviateClient = weaviate.client({
scheme: 'https',
host: 'WEAVIATE_INSTANCE_URL', // Replace WEAVIATE_INSTANCE_URL with your instance URL
apiKey: new ApiKey('api-key'),
headers: { 'X-OpenAI-Api-Key': process.env.OPENAI_APIKEY }, // Replace with your inference API key
});
Data retrieval
Let's take an illustrative example with passages from a book. Here, the Weaviate instance contains a collection of passages from the Pro Git book.
Before we can generate text, we need to retrieve relevant data. Let's retrieve the three most similar passages to the meaning of history of git
with a semantic search.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
collection_name = "GitBookChunk"
chunks = client.collections.get(collection_name)
response = chunks.query.near_text(query="history of git", limit=3)
collection_name = "GitBookChunk"
response = (
client.query
.get(class_name=collection_name, properties=["chunk", "chapter_title", "chunk_index"])
.with_near_text({"concepts": ["history of git"]})
.with_limit(3)
.do()
)
const myCollection = client.collections.get('GitBookChunk');
const dataRetrievalResult = await myCollection.query.nearText(['states in git'], {
returnProperties: ['chunk', 'chapter_title', 'chunk_index'],
limit: 2, })
console.log(JSON.stringify(dataRetrievalResult, null, 2));
const dataRetrievalResult = await client.graphql
.get()
.withClassName('GitBookChunk')
.withNearText({ concepts: ['history of git'] })
.withFields('chunk chapter_title chunk_index')
.withLimit(2)
.do();
console.log(JSON.stringify(dataRetrievalResult, null, 2));
This should return a set of results like the following (truncated for brevity):
{
"data": {
"Get": {
"GitBookChunk": [
{
"chapter_title": "01-introduction",
"chunk": "=== A Short History of Git\n\nAs with many great things in life, Git began with a bit of creative ...",
"chunk_index": 0
},
{
"chapter_title": "01-introduction",
"chunk": "== Nearly Every Operation Is Local\n\nMost operations in Git need only local files and resources ...",
"chunk_index": 2
},
{
"chapter_title": "02-git-basics",
"chunk": "==\nYou can specify more than one instance of both the `--author` and `--grep` search criteria...",
"chunk_index": 2
},
]
}
}
}
Transform result sets
We can transform this result set into new text using RAG with just a minor modification of the code. First, let's use a grouped task
prompt to summarize this information.
Run the following code snippet, and inspect the results:
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
collection_name = "GitBookChunk"
chunks = client.collections.get(collection_name)
response = chunks.generate.near_text(
query="history of git",
limit=3,
grouped_task="Summarize the key information here in bullet points"
)
print(response.generated)
collection_name = "GitBookChunk"
response = (
client.query
.get(class_name=collection_name, properties=["chunk", "chapter_title", "chunk_index"])
.with_near_text({"concepts": ["history of git"]})
.with_limit(3)
.with_generate(grouped_task="Summarize the key information here in bullet points")
.do()
)
print(response["data"]["Get"][collection_name][0]["_additional"]["generate"]["groupedResult"])
const groupedTaskResponse = await myCollection.generate.nearText("history of git",{
singlePrompt: `translate a summary of {chunk} into french`
},
{
returnProperties: ['chunk', 'chapter_title', 'chunk_index'],
limit: 2,
})
console.log(groupedTaskResponse.generated);
const groupedTaskResponse = await client.graphql
.get()
.withClassName('GitBookChunk')
.withNearText({ concepts: ['history of git'] })
.withFields('chunk chapter_title chunk_index')
.withLimit(2)
.withGenerate({
groupedTask: 'Summarize the key information here in bullet points'
})
.do();
console.log(groupedTaskResponse.data.Get['GitBookChunk'][0]._additional.generate.groupedResult);
Here is our generated text:
- Git began as a replacement for the proprietary DVCS called BitKeeper, which was used by the Linux kernel project.
- The relationship between the Linux development community and BitKeeper broke down in 2005, leading to the development of Git by Linus Torvalds.
- Git was designed with goals such as speed, simple design, strong support for non-linear development, and the ability to handle large projects efficiently.
- Most operations in Git only require local files and resources, making them fast and efficient.
- Git allows browsing project history instantly and can calculate differences between file versions locally.
- Git allows offline work and does not require a network connection for most operations.
- This book was written using Git version 2, but most commands should work in older versions as well.
In a grouped task
RAG query, Weaviate:
- Retrieves the three most similar passages to the meaning of
history of git
. - Then prompts the LLM with a combination of:
- Text from all of the search results, and
- The user-provided prompt,
Summarize the key information here in bullet points
.
Note that the user-provided prompt did not contain any information about the subject matter. But because Weaviate retrieve the relevant data about the history of git, it was able to summarize the information relating to this subject matter using verifiable data.
That's how easy it is to perform RAG queries in Weaviate.
There will be variability in the actual text that has been generated. This due to the randomness in LLMs' behaviors, and variability across models. This is perfectly normal.
Transform individual objects
In this example, we will take a look at how to transform individual objects. This is useful when you want to generate text for each object individually, rather than for the entire result set.
Here we prompt the model to translate individual wine reviews into French, using emojis. The reviews is a subset from a publicly available dataset of wine reviews.
Note that in this query, we apply a single prompt
parameter. This means that the LLM is prompted with each object individually, rather than with the entire result set.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
collection_name = "WineReview"
reviews = client.collections.get(collection_name)
response = reviews.generate.near_text(
query="fruity white wine",
limit=3,
single_prompt="""
Translate this review into French, using emojis:
===== Country of origin: {country}, Title: {title}, Review body: {review_body}
"""
)
collection_name = "WineReview"
response = (
client.query
.get(class_name=collection_name, properties=["review_body", "title", "country", "points"])
.with_near_text({"concepts": ["fruity white wine"]})
.with_limit(3)
.with_generate(single_prompt="""
Translate this review into French, using emojis:
===== Country of origin: {country}, Title: {title}, Review body: {review_body}
""")
.do()
)
const myWineCollection = client.collections.get('WineReview');
const singlePromptresult = await myWineCollection.generate.nearText("fruity white wine",{
singlePrompt: `Translate this review into French, using emojis:
===== Country of origin: {country}, Title: {title}, Review body: {review_body}`
},{
returnProperties: ['review_body','title','country','points'],
limit: 5,
})
console.log(JSON.stringify(singlePromptresult.objects, null, 2));
const singlePromptresult = await client.graphql
.get()
.withClassName('WineReview')
.withNearText({ concepts: ['fruity white wine'] })
.withFields('review_body title country points')
.withLimit(5)
.withGenerate({
singlePrompt:
`Translate this review into French, using emojis:
===== Country of origin: {country}, Title: {title}, Review body: {review_body}`
})
.do();
for (const r of singlePromptresult.data.Get['WineReview']) {
console.log(r._additional.generate.singleResult)
}
As the query was run with a limit of 5, you should see 5 objects returned, including generated texts.
Here is our generated text for the first object, and the source text:
===== Generated text =====
🇺🇸🍷🌿🍑🌼🍯🍊🍮🍽️🌟
Origine : États-Unis
Titre : Schmitz 24 Brix 2012 Sauvignon Blanc (Sierra Foothills)
Corps de la critique : Pas du tout un Sauvignon Blanc typique, il sent l'abricot et le chèvrefeuille et a le goût de la marmelade. Il est sec, mais a le goût d'un vin de dessert tardif. Attendez-vous à une petite aventure gustative ici.
===== Original review =====
Country: US,
Title: Schmitz 24 Brix 2012 Sauvignon Blanc (Sierra Foothills)
Review body Not at all a typical Sauvignon Blanc, this smells like apricot and honeysuckle and tastes like marmalade. It is dry, yet tastes like a late-harvest dessert wine. Expect a little taste adventure here.
Here, Weaviate has:
- Retrieved five most similar wine reviews to the meaning of
fruity white wine
. - For each result, prompted the LLM with:
- The user-provided prompt, replacing
{country}
,{title}
, and{review_body}
with the corresponding text.
- The user-provided prompt, replacing
In both examples, you saw Weaviate return new text that is original, but grounded in the retrieved data. This is what makes RAG powerful, by combining the best of data retrieval and language generation.
RAG, end-to-end
Now, let's go through an end-to-end example for using Weaviate for RAG.
Your own Weaviate instance
For this example, you will need access to a Weaviate instance that you can write to. You can use any Weaviate instance, such as a local Docker instance, or a WCD instance.
Configure Weaviate for RAG
A collection's generative
model integration configuration is mutable from v1.25.23
, v1.26.8
and v1.27.1
. See this section for details on how to update the collection configuration.
To use RAG, the appropriate generative-xxx
module must be:
- Enabled in Weaviate, and
- Specified in the collection definition.
Each module is tied to a specific group of LLMs, such as generative-cohere
for Cohere models, generative-openai
for OpenAI models and generative-google
for Google models.
If you are using WCD, you will not need to do anything to enable modules.
How to list enabled modules
You can check which modules are enabled by viewing the meta
information for your Weaviate instance, as shown below:
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = client.get_meta()
print(response)
response = client.get_meta()
print(response)
const metaResponse = await client.getMeta()
console.log(metaResponse)
const metaResponse = await client.misc
.metaGetter().do();
console.log(metaResponse)
The response will include a list of modules. Check that your desired module is enabled.
How to enable modules
For configurable deployments, you can specify enabled modules. For example, in a Docker deployment, you can do so by listing them on the ENABLE_MODULES
environment variable, as shown below:
services:
weaviate:
environment:
ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-openai,text2vec-google,generative-cohere,generative-openai,generative-googles'
Check the specific documentation for your deployment method (Docker, Kubernetes, Embedded Weaviate) for more information on how to configure it.
How to configure the language model
Model properties are exposed through the Weaviate module configuration. Accordingly, you can customize them through the moduleConfig
parameter in the collection definition.
For example, the generative-cohere
module has the following properties:
"moduleConfig": {
"generative-cohere": {
"model": "command-xlarge-nightly", // Optional - Defaults to `command-xlarge-nightly`. Can also use`command-xlarge-beta` and `command-xlarge`
"temperatureProperty": <temperature>, // Optional
"maxTokensProperty": <maxTokens>, // Optional
"kProperty": <k>, // Optional
"stopSequencesProperty": <stopSequences>, // Optional
"returnLikelihoodsProperty": <returnLikelihoods>, // Optional
}
}
And the generative-openai
module may be configured as follows:
"moduleConfig": {
"generative-openai": {
"model": "gpt-3.5-turbo", // Optional - Defaults to `gpt-3.5-turbo`
"temperatureProperty": <temperature>, // Optional, applicable to both OpenAI and Azure OpenAI
"maxTokensProperty": <max_tokens>, // Optional, applicable to both OpenAI and Azure OpenAI
"frequencyPenaltyProperty": <frequency_penalty>, // Optional, applicable to both OpenAI and Azure OpenAI
"presencePenaltyProperty": <presence_penalty>, // Optional, applicable to both OpenAI and Azure OpenAI
"topPProperty": <top_p>, // Optional, applicable to both OpenAI and Azure OpenAI
},
}
See the documentation for various model provider integrations.
Populate database
Adding data to Weaviate for RAG is similar to adding data for other purposes. However, there are some important considerations to keep in mind, such as chunking and data structure.
You can read further discussions in the Best practices & tips section. Here, we will use a chunk length of 150 words and a 25-word overlap. We will also include the title of the book, the chapter it is from, and the chunk number. This will allow us to search through the chunks, as well as filter it.
Download & chunk
In the following snippet, we download a chapter of the Pro Git
book, clean it and chunk it.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
from typing import List
def download_and_chunk(src_url: str, chunk_size: int, overlap_size: int) -> List[str]:
import requests
import re
response = requests.get(src_url) # Retrieve source text
source_text = re.sub(r"\s+", " ", response.text) # Remove multiple whitespaces
text_words = re.split(r"\s", source_text) # Split text by single whitespace
chunks = []
for i in range(0, len(text_words), chunk_size): # Iterate through & chunk data
chunk = " ".join(text_words[max(i - overlap_size, 0): i + chunk_size]) # Join a set of words into a string
chunks.append(chunk)
return chunks
pro_git_chapter_url = "https://raw.githubusercontent.com/progit/progit2/main/book/01-introduction/sections/what-is-git.asc"
chunked_text = download_and_chunk(pro_git_chapter_url, 150, 25)
from typing import List
def download_and_chunk(src_url: str, chunk_size: int, overlap_size: int) -> List[str]:
import requests
import re
response = requests.get(src_url) # Retrieve source text
source_text = re.sub(r"\s+", " ", response.text) # Remove multiple whitespaces
text_words = re.split(r"\s", source_text) # Split text by single whitespace
chunks = []
for i in range(0, len(text_words), chunk_size): # Iterate through & chunk data
chunk = " ".join(text_words[max(i - overlap_size, 0): i + chunk_size]) # Join a set of words into a string
chunks.append(chunk)
return chunks
pro_git_chapter_url = "https://raw.githubusercontent.com/progit/progit2/main/book/01-introduction/sections/what-is-git.asc"
chunked_text = download_and_chunk(pro_git_chapter_url, 150, 25)
async function downloadAndChunk(srcUrl: string, chunkSize: number, overlapSize: number) {
const response = await fetch(srcUrl);
const sourceText = await response.text();
const textWords = sourceText.replace(/\s+/g, ' ').split(' ');
let chunks = [];
for (let i = 0; i < textWords.length; i += chunkSize) {
let chunk = textWords.slice(Math.max(i - overlapSize, 0), i + chunkSize).join(' ');
chunks.push(chunk);
}
return chunks;
}
const proGitChapterUrl = 'https://raw.githubusercontent.com/progit/progit2/main/book/01-introduction/sections/what-is-git.asc';
const chunks = await downloadAndChunk(proGitChapterUrl, 150, 25)
import fetch from 'node-fetch';
async function downloadAndChunk(srcUrl, chunkSize, overlapSize) {
const response = await fetch(srcUrl);
const sourceText = await response.text();
const textWords = sourceText.replace(/\s+/g, ' ').split(' ');
let chunks = [];
for (let i = 0; i < textWords.length; i += chunkSize) {
let chunk = textWords.slice(Math.max(i - overlapSize, 0), i + chunkSize).join(' ');
chunks.push(chunk);
}
return chunks;
}
const proGitChapterUrl = 'https://raw.githubusercontent.com/progit/progit2/main/book/01-introduction/sections/what-is-git.asc';
const chunks = await downloadAndChunk(proGitChapterUrl, 150, 25)
This will download the text from the chapter, and return a list/array of strings of 150 word chunks, with a 25-word overlap added in front.
Create collection definitions
We can now create a collection definition for the chunks. To use RAG, your desired generative module must be specified at the collection level as shown below.
The below collection definition for the GitBookChunk
collection specifies text2vec-openai
as the vectorizer and generative-openai
as the generative module. Note that the generative-openai
parameter can have an empty dictionary/object as its value, which will use the default parameters.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
import weaviate.classes as wvc
collection_name = "GitBookChunk"
if client.collections.exists(collection_name): # In case we've created this collection before
client.collections.delete(collection_name) # THIS WILL DELETE ALL DATA IN THE COLLECTION
chunks = client.collections.create(
name=collection_name,
properties=[
wvc.config.Property(
name="chunk",
data_type=wvc.config.DataType.TEXT
),
wvc.config.Property(
name="chapter_title",
data_type=wvc.config.DataType.TEXT
),
wvc.config.Property(
name="chunk_index",
data_type=wvc.config.DataType.INT
),
],
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(), # Use `text2vec-openai` as the vectorizer
generative_config=wvc.config.Configure.Generative.openai(), # Use `generative-openai` with default parameters
)
collection_name = "GitBookChunk"
chunk_class = {
"class": collection_name,
"properties": [
{
"name": "chunk",
"dataType": ["text"],
},
{
"name": "chapter_title",
"dataType": ["text"],
},
{
"name": "chunk_index",
"dataType": ["int"],
}
],
"vectorizer": "text2vec-openai", # Use `text2vec-openai` as the vectorizer
"moduleConfig": {
"generative-openai": {} # Use `generative-openai` with default parameters
}
}
if client.schema.exists(collection_name): # In case we've created this collection before
client.schema.delete_class(collection_name) # THIS WILL DELETE ALL DATA IN THE CLASS
client.schema.create_class(chunk_class)
const schemaDefinition = {
name: 'GitBookChunk',
properties: [
{
name: 'Chunk',
dataType: 'text' as const,
},
{
name: 'chapter_title',
dataType: 'text' as const,
},
{
name: 'chunk_index',
dataType: 'int' as const,
}
],
vectorizers: weaviate.configure.vectorizer.text2VecOpenAI(),
generative: weaviate.configure.generative.openAI()
}
const newCollection = await client.collections.create(schemaDefinition)
console.log('We have a new class!', newCollection['name']);
const classDefinition = {
class: `GitBookChunk`,
properties: [
{
name: 'Chunk',
dataType: ['text']
},
{
name: 'chapter_title',
dataType: ['text']
},
{
name: 'chunk_index',
dataType: ['int']
},
],
vectorizer: 'text2vec-openai', // Use `text2vec-openai` as the vectorizer
moduleConfig: {
'generative-openai': {} // Use `generative-openai` with default parameters
}
};
const returnedClassDefinition = await client
.schema
.classCreator()
.withClass(classDefinition)
.do();
console.log(JSON.stringify(returnedClassDefinition, null, 2));
Import data
Now, we can import the data into Weaviate.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
chunks_list = list()
for i, chunk in enumerate(chunked_text):
data_properties = {
"chapter_title": "What is Git",
"chunk": chunk,
"chunk_index": i
}
data_object = wvc.data.DataObject(properties=data_properties)
chunks_list.append(data_object)
chunks.data.insert_many(chunks_list)
client.batch.configure(batch_size=100)
with client.batch as batch:
for i, chunk in enumerate(chunked_text):
data_object = {
"chapter_title": "What is Git",
"chunk": chunk,
"chunk_index": i
}
batch.add_data_object(data_object=data_object, class_name=collection_name)
const gitCollection = client.collections.get('GitBookChunkTest');
async function importData(chunkData: Array<string>) {
const list:Array<any> = [];
for (const [index, chunk] of chunkData.entries()) {
const obj = {
properties: {
chunk: chunk,
chunk_index: index,
chapter_title: 'What is Git',
},
};
list.push(obj);
}
const result = await gitCollection.data.insertMany(list)
console.log('just bulk inserted',result);
};
await importData(chunks);
async function importData(chunkData) {
// Prepare a batcher
let batcher = client.batch.objectsBatcher();
let counter = 0;
const batchSize = 100;
for (const [index, c] of chunkData.entries()) {
const obj = {
class: 'GitBookChunk',
properties: {
chunk: c,
chunk_index: index,
chapter_title: 'What is Git',
},
};
batcher.withObject(obj);
if (counter++ == batchSize) {
// flush the batch queue
const res = await batcher.do();
console.log(res);
// restart the batch queue
counter = 0;
batcher = client.batch.objectsBatcher();
}
}
// Flush the remaining objects
const res = await batcher.do();
console.log(res);
};
await importData(chunks);
Once this is done, you should have imported a collection of chunks from the chapter into Weaviate. You can check this by running a simple aggregation query:
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = chunks.aggregate.over_all(total_count=True)
print(response.total_count)
response = client.query.aggregate("GitBookChunk").with_meta_count().do()
print(response)
const objectCount = await gitCollection.aggregate.overAll()
console.log(JSON.stringify(objectCount.totalCount));
const objCount = await client
.graphql
.aggregate()
.withClassName('GitBookChunk')
.withFields('meta { count }')
.do();
console.log(JSON.stringify(objCount, null, 2));
Which should indicate that there are 10
chunks in the database.
Generative queries
Now that we have configured Weaviate and populated it with data, we can perform generative queries as you saw in the examples above.
Single (per-object) prompts
Single prompts tell Weaviate to generate text based on each retrieved object and the user-provided prompt. In this example, we retrieve two objects and prompt the language model to write a haiku based on the text of each chunk.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = chunks.generate.fetch_objects(
limit=2,
single_prompt="Write the following as a haiku: ===== {chunk} "
)
for o in response.objects:
print(f"\n===== Object index: [{o.properties['chunk_index']}] =====")
print(o.generated)
response = (
client.query
.get(collection_name, ["chunk", "chunk_index"])
.with_generate(
single_prompt="Write the following as a haiku: ===== {chunk} "
)
.with_limit(2)
.do()
)
for r in response["data"]["Get"][collection_name]:
print(f"\n===== Object index: [{r['chunk_index']}] =====")
print(r["_additional"]["generate"]["singleResult"])
const haikuResponse = await gitCollection.generate.fetchObjects({
singlePrompt: `Write the following as a haiku: ===== {chunk}`
},{
returnProperties: ['chunk','chunk_index'],
limit: 2,
})
if (haikuResponse) {
for (const result of haikuResponse.objects) {
console.log(`\n===== Object index: [${result.properties['chunk_index']}] =====`)
console.log(result.generated)
}
}
let haikuResponse = await client.graphql
.get()
.withClassName('GitBookChunk')
.withFields('chunk chunk_index')
.withLimit(5)
.withGenerate({
singlePrompt: `Write the following as a haiku: ===== {chunk}`
})
.do();
for (const r of haikuResponse.data.Get['GitBookChunk']) {
console.log(`\n===== Object index: [${r['chunk_index']}] =====`)
console.log(r._additional.generate.singleResult)
}
It should return haiku-like text, such as:
===== Object index: [1] =====
Git's data stored
As snapshots of files, not changes
Efficient and unique
===== Object index: [6] =====
Git has three states:
Untracked, modified, staged.
Commit to save changes.
Grouped tasks
A grouped task is a prompt that is applied to a group of objects. This allows you to prompt the language model with the entire set of search results, such as source documents or relevant passages.
In this example, we prompt the language model to write a trivia tweet based on the result.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = chunks.generate.fetch_objects(
limit=2,
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
print(response.generated)
response = (
client.query
.get(collection_name, ["chunk", "chunk_index"])
.with_generate(
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
.with_limit(2)
.do()
)
print(response["data"]["Get"][collection_name][0]["_additional"]["generate"]["groupedResult"])
const triviaResponse = await gitCollection.generate.fetchObjects({
groupedTask: `Write a trivia tweet based on this text. Use emojis and make it succinct and cute.`
},{
limit: 2,
})
console.log(triviaResponse.generated)
const triviaResponse = await client.graphql
.get()
.withClassName('GitBookChunk')
.withFields('chunk chunk_index')
.withLimit(2)
.withGenerate({
groupedTask: 'Write a trivia tweet based on this text. Use emojis and make it succinct and cute.s'
})
.do();
console.log(triviaResponse.data.Get['GitBookChunk'][0]._additional.generate.groupedResult);
It should return a factoid written for social media, such as:
Did you know? 🤔 Git thinks of its data as snapshots, not just changes to files.
📸 Every time you commit, Git takes a picture of all your files and stores a reference to that snapshot.
📂🔗 #GitTrivia
Pairing with search
RAG in Weaviate is a two-step process under the hood, involving retrieval of objects and then generation of text. This means that you can use the full power of Weaviate's search capabilities to retrieve the objects you want to use for generation.
In this example, we search the chapter for passages that relate to the states of git before generating a tweet as before.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = chunks.generate.near_text(
query="states of git",
limit=2,
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
print(response.generated)
response = (
client.query
.get(collection_name, ["chunk", "chunk_index"])
.with_near_text({"concepts": "states of git"})
.with_generate(
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
.with_limit(2)
.do()
)
print(response["data"]["Get"][collection_name][0]["_additional"]["generate"]["groupedResult"])
const searchResponse = await gitCollection.generate.nearText("states of git",{
groupedTask: "Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
},{
limit: 2,
})
console.log('concept',JSON.stringify(searchResponse.generated, null, 2));
const nearTextTriviaResponse = await client.graphql
.get()
.withClassName('GitBookChunk')
.withFields('chunk chunk_index')
.withNearText({concepts: ['states of git']})
.withLimit(2)
.withGenerate({
groupedTask: 'Write a trivia tweet based on this text. Use emojis and make it succinct and cute.'
})
.do();
console.log(nearTextTriviaResponse.data.Get['GitBookChunk'][0]._additional.generate.groupedResult);
This should return text like:
📝 Did you know? Git has three main states for files: modified, staged, and committed.
🌳📦📂 Learn more about these states and how they affect your Git project!
#GitBasics #Trivia
Now, simply by changing the search query, we can generate similar content about different topics.
- Python Client v4
- Python Client v3
- JS/TS Client v3
- JS/TS Client v2
response = chunks.generate.near_text(
query="how git saves data",
limit=2,
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
print(response.generated)
response = (
client.query
.get(collection_name, ["chunk", "chunk_index"])
.with_near_text({"concepts": "how git saves data"})
.with_generate(
grouped_task="Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
)
.with_limit(2)
.do()
)
print(response["data"]["Get"][collection_name][0]["_additional"]["generate"]["groupedResult"])
const anotherSearchResponse = await gitCollection.generate.nearText("how git saves data",{
groupedTask: "Write a trivia tweet based on this text. Use emojis and make it succinct and cute."
},{
limit: 2,
})
console.log('concept',JSON.stringify(anotherSearchResponse.generated, null, 2));
const anotherNearTextResponse = await client.graphql
.get()
.withClassName('GitBookChunk')
.withFields('chunk chunk_index')
.withNearText({concepts: ['how git saves data']})
.withLimit(2)
.withGenerate({
groupedTask: 'Write a trivia tweet based on this text. Use emojis and make it succinct and cute.'
})
.do();
console.log(anotherNearTextResponse.data.Get['GitBookChunk'][0]._additional.generate.groupedResult);
In this case, the result should be something like:
Did you know? 🤔 Git stores everything by the hash value of its contents, not by file name!
📁🔍 It's hard to lose data in Git, making it a joy to use!
😄🔒 Git thinks of its data as a stream of snapshots, making it more than just a VCS!
📸🌟 Most Git operations are local, so no need for network latency!
🌐💨 #GitTrivia
As you can see, Weaviate allows you to use the full power of search to retrieve the objects you want to use for generation. This allows you to ground the language model in the context of up-to-date information, which you can retrieve with the full power of Weaviate's search capabilities.
Best practices & tips
Chunking
In the context of language processing, "chunking" refers to the process of splitting texts into smaller pieces of texts, i.e. "chunks".
For RAG, chunking affects both the information retrieval and the amount of contextual information provided.
While there is no one-size-fits all chunking strategy that we can recommend, we can provide some general guidelines. Chunking by semantic markers, or text length may both be viable strategies.
Chunking by semantic markers
Using semantic markers, such as paragraphs, or sections can be a good strategy that will allows you to retain related information in each chunk. Some potential risks are that chunk lengths may vary significantly, and outlier conditions may occur common (e.g. chunks with headers that are not particularly meaningful).
Chunking by text length
Using text length, such as 100-150 words, can be a robust baseline strategy. This will allow you to retrieve relevant information without having to worry about the exact length of the text. One potential risk is that chunks may be cut off where they are not semantically meaningful, cutting off important contextual information.
You could use a sliding window approach to mitigate this risk, by overlapping chunks. The length of each chunk can be adjusted to your needs, and based on any unit, such as words, tokens, or even characters.
A baseline strategy could involve using chunks created with a 100-200 word sliding window and a 50-word overlap.
Mixed-strategy chunking
Another, slightly more complicated strategy may be using paragraph-based chunks with a maximum and a minimum length, say of 200 words and 50 words respectively.
Data structure
Another important consideration is the data structure. For example, your chunk object could also contain any additional source-level data, such as the title of the book, the chapter it is from, and the chunk number.
This will allow you to search through the chunks, as well as filter it. Then, you could use this information to control the generation process, such as by prompting the LLM with contextual data (chunks) in the order that they appear in the source document.
Additionally, you could link the chunks to the source document, allowing you to retrieve the source document, or even the entire source document, if needed.
Complex prompts
While the field of prompting is relatively new, it has seen significant advancements already.
As one example, a technique called "chain-of-thought prompting" can be an effective technique. It suggests that the prompt can be used to nudge the model towards producing intermediate reasoning steps, which improves the quality of the answer.
We recommend keeping up to date with the latest developments in the field, and experimenting with different techniques.
Our own Connor Shorten's podcast is a great resource for keeping up with the research, as are resources such as Arxiv, and PapersWithCode.
Wrap-up
We've explored the dynamic capabilities of RAG in Weaviate, showcasing how it enhances large language models through retrieval-augmented generation.
To learn more about specific search capabilities, check out the How-to: search guide. And to learn more about individual modules, check out the Modules section.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.