Skip to main content

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

1h 49m 50s
English
Paid

Course description

Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing. In this course, you will create a full-fledged local RAG pipeline that processes PDF files, breaks texts into fragments, stores vectors in Elasticsearch, retrieves relevant context, and generates well-reasoned answers using the Mistral model running locally through Ollama.

We will move through the entire process in a specific scenario: searching through students' resumes to answer questions like "Who has worked in Ireland?" or "Who has experience with Apache Spark?" You will set up a containerized infrastructure on Docker Compose (FastAPI, Elasticsearch, Kibana, Streamlit, Ollama) and connect it all with LlamaIndex to focus on logic rather than boilerplate code. Throughout the learning process, you will discover where RAG is truly effective and where challenges arise - like issues with accuracy, completeness, and model "hallucinations" - and how to design solutions for production.

By the end of the course, you will have a complete application that can be deployed locally:

PDF upload - text extraction - conversion to JSON - segmentation and vectorization - indexing in Elasticsearch - interactive search via Streamlit - generating answers using Mistral.

Read more about the course

What You Will Learn

From Search to RAG

You will expand your knowledge of semantic search and learn to apply it for RAG: starting with retrieving relevant parts, then generating substantiated responses based on them. You'll discover how LlamaIndex integrates your data with LLM, and why the size and overlap of "chunks" are important for accuracy.

Creating a Pipeline

With FastAPI, you will implement uploading and processing PDFs: extracting text, formatting JSON, splitting, creating embeddings, and indexing in Elasticsearch, with minimal boilerplate code thanks to LlamaIndex.

Working with Elasticsearch

You will create an index for resumes with vectors and metadata. You'll learn to distinguish between vector search and keyword search, understand how vector fields are stored, and how to explore documents and results through Kibana.

Interface on Streamlit

You will create a simple chat interface on Streamlit for natural language interaction. Enable debug mode to see which fragments were used for responses and apply metadata (e.g., filtering by name) to enhance accuracy.

Processing and Formatting JSON

You will extract text from PDFs using PyMuPDF, then create a neat JSON via Ollama (Mistral), preserving structure and characters. You'll master handling formatting errors and methods for reliable prompt engineering.

Improving Response Quality

You will study practical techniques to increase accuracy:

  • adjusting chunk sizes and overlaps, top-K sampling;
  • adding metadata (role, skills, location) for hybrid filters;
  • experimenting with embedding models and prompts;
  • using structured responses (e.g., JSON lists).

Docker Environment

You will assemble the entire stack in Docker Compose: FastAPI, Elasticsearch, Kibana, Streamlit, and Ollama (Mistral), to deploy the system locally with a predictable configuration.

Bonus: Production Patterns

You will learn how to scale the prototype to production level:

  • store uploads in a data lake (e.g., S3) and process them through queues (Kafka/SQS);
  • automatically scale workers for chunking and embeddings;
  • switch LLM backends (e.g., Bedrock or OpenAI) via a unified API;
  • store chat history in MongoDB/Postgres and replace Streamlit with a React/Next.js interface.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction

All Course Lessons (21)

#Lesson TitleDurationAccess
1
Introduction Demo
02:43
2
What We Are Going to Build
02:02
3
Project Architecture
02:41
4
GitHub Repo Explained
02:59
5
Step-by-Step Process
06:06
6
Terms You Find Often
09:17
7
LlamaIndex Explained
03:47
8
What is Ollama
03:20
9
Ollama Setup & Testing
04:35
10
Standup Infrastructure
03:23
11
Show Local Processing
03:01
12
Explain the API
05:37
13
Explain the API Text Extraction
04:42
14
Explain the Embedding
06:55
15
Explain Problem with JSON Creation
02:57
16
Streamlit Code Explained
07:58
17
Search with Filter by User
06:55
18
Do Semantic Queries
08:33
19
The Biggest Problem with RAG
03:31
20
How This Will Look in the Real World
05:38
21
Great YouTube Videos About Real-World Use Cases
13:10

Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

n8n Automation: Building AI-Powered Workflows

n8n Automation: Building AI-Powered Workflows

Sources: newline (ex fullstack.io)
In this course, you will master n8n - an open platform for building workflows with artificial intelligence. We will go through key concepts, such as nodes...
49 minutes 8 seconds
Building LLMs for Production

Building LLMs for Production

Sources: Towards AI, Louis-François Bouchard
"Creating LLM for Production" is a practical guide spanning 470 pages (updated in October 2024), designed for developers and specialists...
Agentic AI Programming for Python Course

Agentic AI Programming for Python Course

Sources: Talkpython
Learn how to use agent AI to create and improve Python applications. Discover the difference from chatbots and customize AI for your tasks.
2 hours 38 minutes 10 seconds
Build a Reasoning Model (From Scratch)

Build a Reasoning Model (From Scratch)

Sources: Sebastian Raschka
Understand how LLMs reason by creating your own reasoning model from scratch. In the book "Building a Reasoning Model from Scratch," you will step by step...
The Hidden Foundation of GenAI

The Hidden Foundation of GenAI

Sources: Andreas Kretz
Generative AI is everywhere today, but few understand the fundamental concepts it is based on. "The Hidden Foundation of GenAI" is a starting point...
20 minutes 42 seconds