Skip to main content

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

1h 49m 50s
English
Paid

Course description

Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing. In this course, you will create a full-fledged local RAG pipeline that processes PDF files, breaks texts into fragments, stores vectors in Elasticsearch, retrieves relevant context, and generates well-reasoned answers using the Mistral model running locally through Ollama.

We will move through the entire process in a specific scenario: searching through students' resumes to answer questions like "Who has worked in Ireland?" or "Who has experience with Apache Spark?" You will set up a containerized infrastructure on Docker Compose (FastAPI, Elasticsearch, Kibana, Streamlit, Ollama) and connect it all with LlamaIndex to focus on logic rather than boilerplate code. Throughout the learning process, you will discover where RAG is truly effective and where challenges arise - like issues with accuracy, completeness, and model "hallucinations" - and how to design solutions for production.

By the end of the course, you will have a complete application that can be deployed locally:

PDF upload - text extraction - conversion to JSON - segmentation and vectorization - indexing in Elasticsearch - interactive search via Streamlit - generating answers using Mistral.

Read more about the course

What You Will Learn

From Search to RAG

You will expand your knowledge of semantic search and learn to apply it for RAG: starting with retrieving relevant parts, then generating substantiated responses based on them. You'll discover how LlamaIndex integrates your data with LLM, and why the size and overlap of "chunks" are important for accuracy.

Creating a Pipeline

With FastAPI, you will implement uploading and processing PDFs: extracting text, formatting JSON, splitting, creating embeddings, and indexing in Elasticsearch, with minimal boilerplate code thanks to LlamaIndex.

Working with Elasticsearch

You will create an index for resumes with vectors and metadata. You'll learn to distinguish between vector search and keyword search, understand how vector fields are stored, and how to explore documents and results through Kibana.

Interface on Streamlit

You will create a simple chat interface on Streamlit for natural language interaction. Enable debug mode to see which fragments were used for responses and apply metadata (e.g., filtering by name) to enhance accuracy.

Processing and Formatting JSON

You will extract text from PDFs using PyMuPDF, then create a neat JSON via Ollama (Mistral), preserving structure and characters. You'll master handling formatting errors and methods for reliable prompt engineering.

Improving Response Quality

You will study practical techniques to increase accuracy:

  • adjusting chunk sizes and overlaps, top-K sampling;
  • adding metadata (role, skills, location) for hybrid filters;
  • experimenting with embedding models and prompts;
  • using structured responses (e.g., JSON lists).

Docker Environment

You will assemble the entire stack in Docker Compose: FastAPI, Elasticsearch, Kibana, Streamlit, and Ollama (Mistral), to deploy the system locally with a predictable configuration.

Bonus: Production Patterns

You will learn how to scale the prototype to production level:

  • store uploads in a data lake (e.g., S3) and process them through queues (Kafka/SQS);
  • automatically scale workers for chunking and embeddings;
  • switch LLM backends (e.g., Bedrock or OpenAI) via a unified API;
  • store chat history in MongoDB/Postgres and replace Streamlit with a React/Next.js interface.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (21)

#Lesson TitleDurationAccess
1
Introduction Demo
02:43
2
What We Are Going to Build
02:02
3
Project Architecture
02:41
4
GitHub Repo Explained
02:59
5
Step-by-Step Process
06:06
6
Terms You Find Often
09:17
7
LlamaIndex Explained
03:47
8
What is Ollama
03:20
9
Ollama Setup & Testing
04:35
10
Standup Infrastructure
03:23
11
Show Local Processing
03:01
12
Explain the API
05:37
13
Explain the API Text Extraction
04:42
14
Explain the Embedding
06:55
15
Explain Problem with JSON Creation
02:57
16
Streamlit Code Explained
07:58
17
Search with Filter by User
06:55
18
Do Semantic Queries
08:33
19
The Biggest Problem with RAG
03:31
20
How This Will Look in the Real World
05:38
21
Great YouTube Videos About Real-World Use Cases
13:10

Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Build AI Agents with AWS

Build AI Agents with AWS

Sources: zerotomastery.io
Learn to design, create, and deploy multiple AI agents using AWS by building your own intelligent travel assistant, ready for production. Gain practical...
3 hours 9 minutes 7 seconds
The NotebookLM Guide: Your AI-Powered Productivity Assistant

The NotebookLM Guide: Your AI-Powered Productivity Assistant

Sources: zerotomastery.io
Learn to use NotebookLM from Google to simplify research, analyze content, and boost productivity. From automatic summaries to...
2 hours 3 minutes 22 seconds
AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)

AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)

Sources: zerotomastery.io
Master an in-demand skill that companies are looking for: the development and implementation of custom LLMs. In the course, you will learn how to fine-tune open
7 hours 12 minutes 10 seconds
AI Engineering Course

AI Engineering Course

Sources: get.interviewready.io (Gaurav Sen)
This course is designed to help programmers and developers transition into the field of artificial intelligence engineering. You will delve into vector...
1 hour 36 minutes 46 seconds