This is a preview version of this unit. So some sections are not yet complete - such as videos and quiz questions. Please check back later for the full version, and in the meantime, feel free to provide any feedback through the comments below.
In this unit, you have learned about chunking, which is a technique of splitting up longer texts into smaller pieces of text, or "chunks".
We covered how it can impact information retrieval using vector databases, and how it can affect the performance of retrieval augmented generation.
Then, we then moved on to cover various chunking techniques including fixed-size chunking, variable-size chunking, and hybrid chunking. We also discussed key considerations when deciding on a chunking strategy, as well as some suggested starting points.
The unit was rounded off with a discussion of some points of consideration when chunking data. These included the length of text per search result, the input query length, the size of the database, the requirements of the language model, and the RAG workflow.
We hope that you now have a good understanding of chunking in general, and are able to implement some solid chunking strategies based on your actual needs.
Having finished this unit, you should be able to:
- Describe what chunking is at a high level
- Explain the impact of chunking in vector search and generative search
- Implement various chunking methods and know where to explore others, and
- Evaluate chunking strategies based on your needs