SELF-PACED COURSE
    NEW

    Build Your First RAG System.

    A self-paced crash course for data scientists who want to build with RAG, not just talk about it. Three hands-on projects walk you through extraction, retrieval, and evaluation end to end.

    Andres Vourakis

    Built by Andres Vourakis

    Senior Data Scientist • 8+ years experience

    SELF-PACED COURSE

    RAG Crash Course

    6 to 10 hours, self-paced
    Three hands-on projects
    Lifetime access
    $249

    One-time payment. No recurring fees.

    Most RAG content is either too shallow or built for software engineers.

    Blog posts give you a vague picture. Production-grade tutorials drown you in infrastructure you do not need yet. Almost none of them teach you how to know whether your RAG system actually works.

    This course is built for that middle. Enough theory to make the right calls. Three hands-on projects you can extend.

    What You'll Learn

    Six lessons across three arcs: extraction, retrieval, and evaluation. Three of them include a hands-on project you can extend.

    How RAG actually works

    The three phases of the pipeline (ingestion, retrieval, generation) and what "grounded answers with citations" really means. Framed for a data scientist: SQL is retrieval over rows, RAG is retrieval over text.

    When not to reach for RAG

    Three filters for whether RAG is the right tool at all. When the prompt is enough, when web search beats your own pipeline, when SQL or a semantic layer is the right answer, and where MCP servers fit in.

    Document extraction

    Bad extraction is the silent killer of RAG. The three approaches (naive text, classical OCR, VLM-based parsing) and the modern toolset: PyMuPDF, Docling, LlamaParse, or a VLM via API.

    Chunking and embeddings

    Fixed-size and structure-aware chunking, picking an embedding model, and the hidden ways context gets lost. Plus where the field is heading: late chunking and hybrid search.

    Build the pipeline with pgvector

    Query embedding, top-K retrieval, prompt assembly, generation. You ship a working ask_docs(question) function over a real corpus, no dedicated vector database required.

    Evaluation: the part most courses skip

    Retrieval eval vs generation eval. Ground truth datasets and the pitfalls of building one. Run precision@k and recall@k on your own pipeline and use the metrics to iterate on chunking, embeddings, or prompts.

    Format

    • Self-paced video lessons

      Watch in any order, on your schedule. No live sessions to miss.

    • 6 to 10 hours total

      Designed to be finishable in a focused weekend. Watch faster if you already know parts of the stack.

    • Three hands-on projects

      You build PDF extraction, a working RAG pipeline with pgvector, and an evaluation loop. The code is yours to keep and extend.

    • Lifetime access

      Pay once, keep access to all current and future updates of the course.

    Who it's for

    • Data scientists who want to understand RAG well enough to build with it, not just talk about it.
    • Data professionals from analytics, BI, or ML backgrounds who want a clear path into applied AI.
    • People who want to start learning right now, on their own schedule.

    Prerequisites

    You should be comfortable writing Python. No prior RAG or LLM experience required.

    Ready to ship your first RAG system?

    One weekend. Yours forever. $249.

    One-time payment. Lifetime access.