Skip to main content
CF

Semantic Log Indexing & Search

53m 37s
English
Paid

Unlock the power of semantic search with our comprehensive course, where we dive deep into the practicalities of generative AI in real-world data processing projects. Building on the foundational knowledge from the course The Hidden Foundation of GenAI, we embark on a journey to apply embeddings in practice. You will master the entire process of creating a semantic search pipeline—from generating embeddings and storing them in a vector database to executing natural language queries.

Course Overview

This course is structured around an impactful data observability project. You will construct a pipeline that aggregates logs, processes them with FastAPI, and secures the embeddings in qdrant—a high-performance vector storage solution. Furthermore, you'll craft an intuitive dashboard on Streamlit, enabling semantic log searches instead of traditional keyword searches, and evaluate the outputs against conventional SQL queries in DuckDB.

Key Course Steps

  1. From Embeddings to Search: Revisit the basics of embeddings and delve into how they enable semantic search functionality.
  2. Building a Pipeline: Implement an API with FastAPI for processing logs and generating embeddings.
  3. Working with qdrant: Explore collections, points, cosine similarity search, and optimize the embedding structure.
  4. Streamlit Interface: Develop a user-friendly search interface and compare the semantic search approach with traditional SQL.
  5. Improving Accuracy: Discover methods for optimizing embeddings, refining query formulations, and configuring searches.
  6. Launching in Docker: Deploy the entire stack (FastAPI, qdrant, Streamlit, DuckDB) using Docker Compose.
  7. Bonus: Utilize DuckDB for analytics by implementing WAL, handling data in Docker, and contrasting SQL capabilities with vector search.

Course Outcomes

By the end of the course, you will not only comprehend the mechanics of semantic search but also possess a ready-to-use project that can be tailored for your personal AI-driven solutions. This hands-on experience will prepare you to apply semantic search capabilities effectively and innovate within the realm of AI.

Additional

https://github.com/team-data-science/GenAI-DataObservability

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 16 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 16 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Intro
All Course Lessons (16)
#Lesson TitleDurationAccess
1
Intro Demo
00:44
2
Getting Started: Semantic Search for Your Logs
03:08
3
Dissecting the Pipeline Monitor Architecture: FastAPI, Qdrant & DuckDB
03:50
4
Beginner’s Guide to Qdrant Collections and Similarity Search
03:28
5
Your First Glimpse at the Project Code Structure on GitHub
02:55
6
Building and Launching the Pipeline with Docker Compose
04:37
7
Writing JSON Logs to FastAPI: Bulk Upload Explained
01:42
8
How FastAPI Parses LogEntry Models and Prepares Embeddings
04:37
9
Embeddings 101: Turning Your Logs into Searchable Vectors
02:06
10
Querying Qdrant: From Playground to Streamlit Dashboard
03:55
11
Hands-On Embedding Tuning: Boost Your Log Search Accuracy
03:54
12
Deploying Improved Embeddings and Measuring Improvement
05:35
13
What We Built and Why It Matters
02:53
14
How DuckDB Fits into Your Data Observability Stack
01:28
15
Writing to DuckDB with a Write-Ahead Log
05:03
16
Docker & DuckDB: Implementing WAL to Solve File Lock Errors
03:42
Unlock unlimited learning

Get instant access to all 15 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What prior knowledge do I need before taking this course?
Before enrolling in this course, it is recommended to have foundational knowledge of generative AI, which is covered in the prerequisite course 'The Hidden Foundation of GenAI'. Familiarity with basic web development, API creation, and database management will also be beneficial as the course involves working with FastAPI, qdrant, and DuckDB.
What kind of projects will I work on during the course?
This course involves developing a data observability project. You will build a pipeline that aggregates logs, processes them using FastAPI, and stores the generated embeddings in qdrant. Additionally, you will create a user-friendly dashboard with Streamlit to facilitate semantic log searches, offering a practical alternative to traditional SQL-based searches.
Who is the target audience for this course?
The course is ideal for data engineers, software developers, and IT professionals interested in enhancing their skills in semantic search technologies and data observability. Individuals looking to apply generative AI concepts to real-world data processing projects will find the course particularly relevant.
How does this course compare in depth and scope to similar courses?
Unlike introductory courses that focus solely on the basics of semantic search, this course offers a comprehensive approach by integrating the entire process from generating embeddings to implementing a full semantic search pipeline. It covers advanced topics such as embedding optimization, vector storage with qdrant, and dashboard creation with Streamlit.
What tools and platforms will I learn to use in this course?
Throughout the course, you will gain hands-on experience with FastAPI for API development, qdrant for vector storage and similarity search, and Streamlit for building interactive dashboards. Additionally, you will work with DuckDB to compare semantic search with traditional SQL queries.
What topics are not covered in this course?
The course does not cover the foundational principles of generative AI in detail; it assumes prior knowledge from 'The Hidden Foundation of GenAI'. It also does not delve into the specifics of machine learning model training or traditional keyword-based search techniques.
What is the expected time commitment for completing the course?
The course consists of 16 structured lessons. The exact runtime is not specified, but students should anticipate dedicating several hours per week to complete the lessons, understand the material, and work on the hands-on project components.