Skip to main content
CF

Semantic Log Indexing & Search

53m 37s
English
Paid

Unlock the power of semantic search with our comprehensive course, where we dive deep into the practicalities of generative AI in real-world data processing projects. Building on the foundational knowledge from the course The Hidden Foundation of GenAI, we embark on a journey to apply embeddings in practice. You will master the entire process of creating a semantic search pipeline—from generating embeddings and storing them in a vector database to executing natural language queries.

Course Overview

This course is structured around an impactful data observability project. You will construct a pipeline that aggregates logs, processes them with FastAPI, and secures the embeddings in qdrant—a high-performance vector storage solution. Furthermore, you'll craft an intuitive dashboard on Streamlit, enabling semantic log searches instead of traditional keyword searches, and evaluate the outputs against conventional SQL queries in DuckDB.

Key Course Steps

  1. From Embeddings to Search: Revisit the basics of embeddings and delve into how they enable semantic search functionality.
  2. Building a Pipeline: Implement an API with FastAPI for processing logs and generating embeddings.
  3. Working with qdrant: Explore collections, points, cosine similarity search, and optimize the embedding structure.
  4. Streamlit Interface: Develop a user-friendly search interface and compare the semantic search approach with traditional SQL.
  5. Improving Accuracy: Discover methods for optimizing embeddings, refining query formulations, and configuring searches.
  6. Launching in Docker: Deploy the entire stack (FastAPI, qdrant, Streamlit, DuckDB) using Docker Compose.
  7. Bonus: Utilize DuckDB for analytics by implementing WAL, handling data in Docker, and contrasting SQL capabilities with vector search.

Course Outcomes

By the end of the course, you will not only comprehend the mechanics of semantic search but also possess a ready-to-use project that can be tailored for your personal AI-driven solutions. This hands-on experience will prepare you to apply semantic search capabilities effectively and innovate within the realm of AI.

Additional

https://github.com/team-data-science/GenAI-DataObservability

About the Author: Andreas Kretz

Andreas Kretz thumbnail

Andreas Kretz is a German data engineer and one of the most widely followed independent voices on data engineering as a career discipline. He runs the Plumbers of Data Science brand and has been publishing tutorial material continuously since the field consolidated around the modern lake-house stack (Spark, Kafka, Snowflake, Databricks, Airflow).

His CourseFlix listing is the largest single-author catalog under this source — over thirty courses spanning data-pipeline construction, streaming architectures, the cloud-native data stack on AWS / Azure / GCP, the Python and Scala tooling that dominates the field, and the soft-skills / career side of breaking into data engineering. Material is paid and aimed at engineers transitioning into data work or already-working data engineers picking up specific tools.

Watch Online 16 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 16 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Intro
All Course Lessons (16)
#Lesson TitleDurationAccess
1
Intro Demo
00:44
2
Getting Started: Semantic Search for Your Logs
03:08
3
Dissecting the Pipeline Monitor Architecture: FastAPI, Qdrant & DuckDB
03:50
4
Beginner’s Guide to Qdrant Collections and Similarity Search
03:28
5
Your First Glimpse at the Project Code Structure on GitHub
02:55
6
Building and Launching the Pipeline with Docker Compose
04:37
7
Writing JSON Logs to FastAPI: Bulk Upload Explained
01:42
8
How FastAPI Parses LogEntry Models and Prepares Embeddings
04:37
9
Embeddings 101: Turning Your Logs into Searchable Vectors
02:06
10
Querying Qdrant: From Playground to Streamlit Dashboard
03:55
11
Hands-On Embedding Tuning: Boost Your Log Search Accuracy
03:54
12
Deploying Improved Embeddings and Measuring Improvement
05:35
13
What We Built and Why It Matters
02:53
14
How DuckDB Fits into Your Data Observability Stack
01:28
15
Writing to DuckDB with a Write-Ahead Log
05:03
16
Docker & DuckDB: Implementing WAL to Solve File Lock Errors
03:42
Unlock unlimited learning

Get instant access to all 15 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

  • Learning Apache Spark thumbnailUpdated 11mo ago

    Learning Apache Spark

    By: Andreas Kretz
    Master Apache Spark with this in-depth course designed for data engineers seeking to enhance their data processing capabilities.
    1h 44m5/5
  • Data Engineering on GCP thumbnailUpdated 11mo ago

    Data Engineering on GCP

    By: Andreas Kretz
    Google Cloud Platform (GCP) is one of the most popular cloud platforms in the world, providing an extensive set of tools and services for building, managing.
    1h 17m5/5
  • Apache Kafka Fundamentals thumbnailUpdated 11mo ago

    Apache Kafka Fundamentals

    By: Andreas Kretz
    Master the fundamentals of Apache Kafka in this comprehensive course designed to provide you with essential knowledge for a confident start.
    1h 4m5/5

Frequently asked questions

What is Semantic Log Indexing & Search about?
Unlock the power of semantic search with our comprehensive course, where we dive deep into the practicalities of generative AI in real-world data processing projects. Building on the foundational knowledge from the course The Hidden…
Who teaches this course?
It is taught by Andreas Kretz. You can find more courses by this instructor on the corresponding source page.
How long is the course?
It contains 16 lessons with a total runtime of 53 minutes. Every lesson is available to watch online at your own pace.
Is it free to watch?
It is part of CourseFlix's premium catalog. A subscription unlocks the full video player; the course description, table of contents, and preview information are available to everyone.
Where can I watch it online?
The course is available to watch online on CourseFlix at https://courseflix.net/course/semantic-log-indexing-search. The page hosts every lesson with the integrated video player; no download is required.