Skip to main content

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

1h 49m 50s
English
Paid

Course description

Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing. In this course, you will create a full-fledged local RAG pipeline that processes PDF files, breaks texts into fragments, stores vectors in Elasticsearch, retrieves relevant context, and generates well-reasoned answers using the Mistral model running locally through Ollama.

We will move through the entire process in a specific scenario: searching through students' resumes to answer questions like "Who has worked in Ireland?" or "Who has experience with Apache Spark?" You will set up a containerized infrastructure on Docker Compose (FastAPI, Elasticsearch, Kibana, Streamlit, Ollama) and connect it all with LlamaIndex to focus on logic rather than boilerplate code. Throughout the learning process, you will discover where RAG is truly effective and where challenges arise - like issues with accuracy, completeness, and model "hallucinations" - and how to design solutions for production.

By the end of the course, you will have a complete application that can be deployed locally:

PDF upload - text extraction - conversion to JSON - segmentation and vectorization - indexing in Elasticsearch - interactive search via Streamlit - generating answers using Mistral.

Read more about the course

What You Will Learn

From Search to RAG

You will expand your knowledge of semantic search and learn to apply it for RAG: starting with retrieving relevant parts, then generating substantiated responses based on them. You'll discover how LlamaIndex integrates your data with LLM, and why the size and overlap of "chunks" are important for accuracy.

Creating a Pipeline

With FastAPI, you will implement uploading and processing PDFs: extracting text, formatting JSON, splitting, creating embeddings, and indexing in Elasticsearch, with minimal boilerplate code thanks to LlamaIndex.

Working with Elasticsearch

You will create an index for resumes with vectors and metadata. You'll learn to distinguish between vector search and keyword search, understand how vector fields are stored, and how to explore documents and results through Kibana.

Interface on Streamlit

You will create a simple chat interface on Streamlit for natural language interaction. Enable debug mode to see which fragments were used for responses and apply metadata (e.g., filtering by name) to enhance accuracy.

Processing and Formatting JSON

You will extract text from PDFs using PyMuPDF, then create a neat JSON via Ollama (Mistral), preserving structure and characters. You'll master handling formatting errors and methods for reliable prompt engineering.

Improving Response Quality

You will study practical techniques to increase accuracy:

  • adjusting chunk sizes and overlaps, top-K sampling;
  • adding metadata (role, skills, location) for hybrid filters;
  • experimenting with embedding models and prompts;
  • using structured responses (e.g., JSON lists).

Docker Environment

You will assemble the entire stack in Docker Compose: FastAPI, Elasticsearch, Kibana, Streamlit, and Ollama (Mistral), to deploy the system locally with a predictable configuration.

Bonus: Production Patterns

You will learn how to scale the prototype to production level:

  • store uploads in a data lake (e.g., S3) and process them through queues (Kafka/SQS);
  • automatically scale workers for chunking and embeddings;
  • switch LLM backends (e.g., Bedrock or OpenAI) via a unified API;
  • store chat history in MongoDB/Postgres and replace Streamlit with a React/Next.js interface.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (21)

#Lesson TitleDurationAccess
1
Introduction Demo
02:43
2
What We Are Going to Build
02:02
3
Project Architecture
02:41
4
GitHub Repo Explained
02:59
5
Step-by-Step Process
06:06
6
Terms You Find Often
09:17
7
LlamaIndex Explained
03:47
8
What is Ollama
03:20
9
Ollama Setup & Testing
04:35
10
Standup Infrastructure
03:23
11
Show Local Processing
03:01
12
Explain the API
05:37
13
Explain the API Text Extraction
04:42
14
Explain the Embedding
06:55
15
Explain Problem with JSON Creation
02:57
16
Streamlit Code Explained
07:58
17
Search with Filter by User
06:55
18
Do Semantic Queries
08:33
19
The Biggest Problem with RAG
03:31
20
How This Will Look in the Real World
05:38
21
Great YouTube Videos About Real-World Use Cases
13:10

Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Advanced AI: LLMs Explained with Math (Transformers, Attention Mechanisms & More)

Advanced AI: LLMs Explained with Math (Transformers, Attention Mechanisms & More)

Sources: zerotomastery.io
Delve into the mathematical foundations of transformers, such as GPT and BERT. From tokenization to attention mechanisms—analyze the algorithms that underpin...
4 hours 55 minutes 29 seconds
Full-Stack Project with Claude Code

Full-Stack Project with Claude Code

Sources: Mckay Wrigley (takeoff)
In this workshop, participants step by step create an MVP clone of FigJam - a visual collaboration editor - using Claude Code, Opus 4, Cursor IDE, and...
1 hour 12 minutes 14 seconds
Build a Simple Neural Network & Learn Backpropagation

Build a Simple Neural Network & Learn Backpropagation

Sources: zerotomastery.io
Learn backpropagation and gradient descent by writing a simple neural network from scratch in Python - without libraries, just the basics. Ideal...
4 hours 34 minutes 9 seconds
Build AI-Powered Apps – An AI Course for Developers

Build AI-Powered Apps – An AI Course for Developers

Sources: codewithmosh (Mosh Hamedani)
AI is everywhere - but can you really create applications with it? Most developers have tried ChatGPT. Some have even inserted pieces...
7 hours 3 minutes 31 seconds
Semantic Log Indexing & Search

Semantic Log Indexing & Search

Sources: Andreas Kretz
Semantic search is one of the most practical ways to apply generative AI in real-world data processing projects. In this course, we go beyond...
53 minutes 37 seconds