Menu

Project: AI Knowledge Assistant

Introduction

It's time to build! In this capstone project, you will build a fully functional AI Knowledge Assistant that can answer questions about a specific dataset using Retrieval-Augmented Generation (RAG).

Requirements

  1. Data Ingestion: Write a script that reads a collection of text files or PDFs, splits them into appropriate chunk sizes, generates embeddings, and saves them to a Vector Database (e.g., ChromaDB).
  2. Retrieval Engine: Create a function that takes a user query, embeds it, and retrieves the top 3 most relevant chunks from the database.
  3. Generation: Construct a prompt that injects the retrieved context and asks an LLM (via the OpenAI or Anthropic API) to answer the user's question based strictly on the context.
  4. User Interface: Build a simple command-line interface (CLI) or a basic web UI (using Streamlit or Next.js) where a user can chat with the assistant.

Hints

  • Use LangChain's document loaders and text splitters to save time during the ingestion phase.
  • Make sure to include a system prompt that explicitly tells the LLM: "If the answer is not contained within the context, say 'I don't know.'" to prevent hallucinations.

Support me!

I am a software engineer giving back to the community - my name is Musila Peter. Join me in empowering learners around the globe by supporting SaneGenius!