Introduction
How do we make an AI search for information conceptually rather than relying on exact keyword matches? The answer lies in embeddings and vector databases.
What are Embeddings?
An embedding is a numerical representation of a piece of data (like text) in a high-dimensional vector space. Words or sentences with similar meanings are located closer together in this space. For example, the vectors for "dog" and "puppy" will be very close to each other, while "dog" and "car" will be far apart.
Vector Databases
Traditional databases search for exact keyword matches. Vector databases store embeddings and perform "similarity searches" using mathematical operations like Cosine Similarity. This allows you to find documents that are semantically relevant to a query, even if they share no exact keywords.
Assignment
- Read this blog post on how Text Embeddings work under the hood.
- Explore the documentation for ChromaDB or Pinecone and set up a local instance or free account.