Skip to main content
CF

GenAI RAG with LlamaIndex, Ollama and Elasticsearch

1h 49m 50s
English
Paid

GenAI RAG with LlamaIndex, Ollama and Elasticsearch is a 21-lesson 1 hour 49 minutes self-paced course by Andreas Kretz. Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing.

Course facts

Lessons
21
Duration
1 hour 49 minutes
Level
All levels
Language
English
Updated
Instructor
Andreas Kretz
Price
Premium

Retrieval-Augmented Generation (RAG) - is the next practical step after semantic search and indexing. In this course, you will create a full-fledged local RAG pipeline that processes PDF files, breaks texts into fragments, stores vectors in Elasticsearch, retrieves relevant context, and generates well-reasoned answers using the Mistral model running locally through Ollama.

We will move through the entire process in a specific scenario: searching through students' resumes to answer questions like "Who has worked in Ireland?" or "Who has experience with Apache Spark?" You will set up a containerized infrastructure on Docker Compose (FastAPI, Elasticsearch, Kibana, Streamlit, Ollama) and connect it all with LlamaIndex to focus on logic rather than boilerplate code. Throughout the learning process, you will discover where RAG is truly effective and where challenges arise - like issues with accuracy, completeness, and model "hallucinations" - and how to design solutions for production.

By the end of the course, you will have a complete application that can be deployed locally:

PDF upload - text extraction - conversion to JSON - segmentation and vectorization - indexing in Elasticsearch - interactive search via Streamlit - generating answers using Mistral.

What You Will Learn

From Search to RAG

You will expand your knowledge of semantic search and learn to apply it for RAG: starting with retrieving relevant parts, then generating substantiated responses based on them. You'll discover how LlamaIndex integrates your data with LLM, and why the size and overlap of "chunks" are important for accuracy.

Creating a Pipeline

With FastAPI, you will implement uploading and processing PDFs: extracting text, formatting JSON, splitting, creating embeddings, and indexing in Elasticsearch, with minimal boilerplate code thanks to LlamaIndex.

Working with Elasticsearch

You will create an index for resumes with vectors and metadata. You'll learn to distinguish between vector search and keyword search, understand how vector fields are stored, and how to explore documents and results through Kibana.

Interface on Streamlit

You will create a simple chat interface on Streamlit for natural language interaction. Enable debug mode to see which fragments were used for responses and apply metadata (e.g., filtering by name) to enhance accuracy.

Processing and Formatting JSON

You will extract text from PDFs using PyMuPDF, then create a neat JSON via Ollama (Mistral), preserving structure and characters. You'll master handling formatting errors and methods for reliable prompt engineering.

Improving Response Quality

You will study practical techniques to increase accuracy:

  • adjusting chunk sizes and overlaps, top-K sampling;
  • adding metadata (role, skills, location) for hybrid filters;
  • experimenting with embedding models and prompts;
  • using structured responses (e.g., JSON lists).

Docker Environment

You will assemble the entire stack in Docker Compose: FastAPI, Elasticsearch, Kibana, Streamlit, and Ollama (Mistral), to deploy the system locally with a predictable configuration.

Bonus: Production Patterns

You will learn how to scale the prototype to production level:

  • store uploads in a data lake (e.g., S3) and process them through queues (Kafka/SQS);
  • automatically scale workers for chunking and embeddings;
  • switch LLM backends (e.g., Bedrock or OpenAI) via a unified API;
  • store chat history in MongoDB/Postgres and replace Streamlit with a React/Next.js interface.

Additional

  • https://github.com/team-data-science/GenAI-RAG
  • https://github.com/team-data-science/GenAI-RAG/blob/main/test_ollama.py
  • https://github.com/team-data-science/GenAI-RAG/blob/main/docker-compose.yml
  • https://github.com/team-data-science/GenAI-RAG/blob/main/uploader.py
  • https://github.com/team-data-science/GenAI-RAG/tree/main/fastapi-app
  • https://github.com/team-data-science/GenAI-RAG/tree/main/streamlit-app

Who teaches GenAI RAG with LlamaIndex, Ollama and Elasticsearch? Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

What lessons are included in GenAI RAG with LlamaIndex, Ollama and Elasticsearch?

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 21 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Introduction
All Course Lessons (21)
#Lesson TitleDurationAccess
1
Introduction Demo
02:43
2
What We Are Going to Build
02:02
3
Project Architecture
02:41
4
GitHub Repo Explained
02:59
5
Step-by-Step Process
06:06
6
Terms You Find Often
09:17
7
LlamaIndex Explained
03:47
8
What is Ollama
03:20
9
Ollama Setup & Testing
04:35
10
Standup Infrastructure
03:23
11
Show Local Processing
03:01
12
Explain the API
05:37
13
Explain the API Text Extraction
04:42
14
Explain the Embedding
06:55
15
Explain Problem with JSON Creation
02:57
16
Streamlit Code Explained
07:58
17
Search with Filter by User
06:55
18
Do Semantic Queries
08:33
19
The Biggest Problem with RAG
03:31
20
How This Will Look in the Real World
05:38
21
Great YouTube Videos About Real-World Use Cases
13:10
Unlock unlimited learning

Get instant access to all 20 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

What courses are similar to GenAI RAG with LlamaIndex, Ollama and Elasticsearch?

Frequently asked questions

What prerequisites are needed before taking this course?
Before enrolling, it's helpful to have a basic understanding of Docker, as the course involves setting up a containerized infrastructure with Docker Compose. Familiarity with Python and web APIs will also be beneficial since the course includes building and interacting with APIs using FastAPI and Streamlit. No specific prior knowledge of LlamaIndex, Ollama, or Elasticsearch is required, as these tools will be explained in the course.
What will I build during the course?
Throughout the course, you will build a complete local Retrieval-Augmented Generation (RAG) pipeline. This includes processing PDF files, extracting and segmenting text, storing vectors in Elasticsearch, and generating answers using the Mistral model. The application will enable interactive searches through Streamlit and can answer questions based on indexed data, such as identifying resumes with specific experience or geographic work history.
Who is the target audience for this course?
The course is designed for individuals interested in developing skills in Retrieval-Augmented Generation and text indexing technologies. It's suitable for data scientists, AI researchers, and software engineers who want to explore the application of semantic search and indexing in real-world scenarios, specifically those involving document processing and local AI models.
How does this course compare to other RAG courses?
This course uniquely focuses on building a local RAG pipeline using LlamaIndex, Ollama, and Elasticsearch. While many RAG courses cover cloud-based solutions, this course provides hands-on experience with local processing, containerized infrastructure, and the integration of various tools like FastAPI and Streamlit. It emphasizes practical application and real-world challenges such as model accuracy and hallucinations.
What specific tools or platforms will I learn to use?
You will gain practical experience with several tools and platforms, including Docker Compose for containerized infrastructure, Elasticsearch for vector storage and search, and Streamlit for creating interactive search interfaces. Additionally, the course covers LlamaIndex for logical code structure, Ollama for running the Mistral model locally, and FastAPI for API development.
What is not covered in this course?
The course does not cover cloud deployment or management of RAG applications, as it focuses on local environments. It also does not provide advanced training in the underlying algorithms of the Mistral model or extensive detail on Elasticsearch beyond its use in the project. Students looking for in-depth theoretical exploration of these topics may need supplementary resources.
What is the expected time commitment for this course?
The course consists of 21 lessons, and while the exact runtime is not specified, students should be prepared to commit several hours per week for studying the material, completing exercises, and building the application. Additional time may be required for those new to Docker or the specific tools introduced in the course, to ensure a thorough understanding.