RAG (Retrieval-Augmented Generation) is the architectural pattern that gives a language model access to your own documents, knowledge base, or operational data without fine-tuning. The standard pipeline is: chunk the source documents, embed each chunk into a vector representation, store the vectors in a database, embed the user's query at runtime, retrieve the most similar chunks, and pass them as context to the model.
Building a working RAG demo is straightforward. Building one that actually outperforms keyword search on the user's real questions is harder. Courses cover the parts that matter — chunking strategies (fixed-size vs semantic, hierarchical, contextual), hybrid retrieval (vector + BM25 + reranking), evaluation harnesses, and the production concerns: latency budgets, citation surfacing, and handling queries the index doesn't cover.