GenAI RAG with LlamaIndex, Ollama and Elasticsearch
Course description
Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing. In this course, you will create a full-fledged local RAG pipeline that processes PDF files, breaks texts into fragments, stores vectors in Elasticsearch, retrieves relevant context, and generates well-reasoned answers using the Mistral model running locally through Ollama.
We will move through the entire process in a specific scenario: searching through students' resumes to answer questions like "Who has worked in Ireland?" or "Who has experience with Apache Spark?" You will set up a containerized infrastructure on Docker Compose (FastAPI, Elasticsearch, Kibana, Streamlit, Ollama) and connect it all with LlamaIndex to focus on logic rather than boilerplate code. Throughout the learning process, you will discover where RAG is truly effective and where challenges arise - like issues with accuracy, completeness, and model "hallucinations" - and how to design solutions for production.
By the end of the course, you will have a complete application that can be deployed locally:
PDF upload - text extraction - conversion to JSON - segmentation and vectorization - indexing in Elasticsearch - interactive search via Streamlit - generating answers using Mistral.
Read more about the course
What You Will Learn
From Search to RAG
You will expand your knowledge of semantic search and learn to apply it for RAG: starting with retrieving relevant parts, then generating substantiated responses based on them. You'll discover how LlamaIndex integrates your data with LLM, and why the size and overlap of "chunks" are important for accuracy.
Creating a Pipeline
With FastAPI, you will implement uploading and processing PDFs: extracting text, formatting JSON, splitting, creating embeddings, and indexing in Elasticsearch, with minimal boilerplate code thanks to LlamaIndex.
Working with Elasticsearch
You will create an index for resumes with vectors and metadata. You'll learn to distinguish between vector search and keyword search, understand how vector fields are stored, and how to explore documents and results through Kibana.
Interface on Streamlit
You will create a simple chat interface on Streamlit for natural language interaction. Enable debug mode to see which fragments were used for responses and apply metadata (e.g., filtering by name) to enhance accuracy.
Processing and Formatting JSON
You will extract text from PDFs using PyMuPDF, then create a neat JSON via Ollama (Mistral), preserving structure and characters. You'll master handling formatting errors and methods for reliable prompt engineering.
Improving Response Quality
You will study practical techniques to increase accuracy:
- adjusting chunk sizes and overlaps, top-K sampling;
- adding metadata (role, skills, location) for hybrid filters;
- experimenting with embedding models and prompts;
- using structured responses (e.g., JSON lists).
Docker Environment
You will assemble the entire stack in Docker Compose: FastAPI, Elasticsearch, Kibana, Streamlit, and Ollama (Mistral), to deploy the system locally with a predictable configuration.
Bonus: Production Patterns
You will learn how to scale the prototype to production level:
- store uploads in a data lake (e.g., S3) and process them through queues (Kafka/SQS);
- automatically scale workers for chunking and embeddings;
- switch LLM backends (e.g., Bedrock or OpenAI) via a unified API;
- store chat history in MongoDB/Postgres and replace Streamlit with a React/Next.js interface.
Watch Online
All Course Lessons (21)
| # | Lesson Title | Duration | Access |
|---|---|---|---|
| 1 | Introduction Demo | 02:43 | |
| 2 | What We Are Going to Build | 02:02 | |
| 3 | Project Architecture | 02:41 | |
| 4 | GitHub Repo Explained | 02:59 | |
| 5 | Step-by-Step Process | 06:06 | |
| 6 | Terms You Find Often | 09:17 | |
| 7 | LlamaIndex Explained | 03:47 | |
| 8 | What is Ollama | 03:20 | |
| 9 | Ollama Setup & Testing | 04:35 | |
| 10 | Standup Infrastructure | 03:23 | |
| 11 | Show Local Processing | 03:01 | |
| 12 | Explain the API | 05:37 | |
| 13 | Explain the API Text Extraction | 04:42 | |
| 14 | Explain the Embedding | 06:55 | |
| 15 | Explain Problem with JSON Creation | 02:57 | |
| 16 | Streamlit Code Explained | 07:58 | |
| 17 | Search with Filter by User | 06:55 | |
| 18 | Do Semantic Queries | 08:33 | |
| 19 | The Biggest Problem with RAG | 03:31 | |
| 20 | How This Will Look in the Real World | 05:38 | |
| 21 | Great YouTube Videos About Real-World Use Cases | 13:10 |
Unlock unlimited learning
Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.
Learn more about subscriptionComments
0 commentsWant to join the conversation?
Sign in to commentSimilar courses
Advanced AI: LLMs Explained with Math (Transformers, Attention Mechanisms & More)
Full-Stack Project with Claude Code
Build a Simple Neural Network & Learn Backpropagation
Build AI-Powered Apps – An AI Course for Developers