275 (Keyword) Tokenization
Course overview
This course is self-contained. However, we recommend that you go through one of the 101-level courses, such as that for working with text, your own vectors, or multimodal data.
This course will introduce you to tokenization, and how it relates to Weaviate. Specifically, it will discuss what it is, how it relates to search and how to configure it.
Note that tokenization is a concept that applies to keyword search and filtering, as well as in the context of language models.
This course focuses on the keyword aspect, but will briefly discuss how tokenization impacts language models.
Learning objectives
Here, we will cover:
Learning Goals- What tokenization is, and why it is required.
By the time you are finished, you will be able to:
Learning Outcomes- Identify tokenized text from raw text.
- Name different tokenization options in Weaviate.
- Select an appropriate tokenization option for a given use case.
- Name languages for which specific tokenization options are available.
Units
1. Overview of tokenization
What is tokenization, and why is it important?
2. Available tokenization options
What tokenization options are available in Weaviate?
3. Tokenization and filters
See how tokenization impacts filters.
4. Tokenization and searches
See how tokenization impacts searches.