This mini-series is designed for developers, ML engineers, and AI enthusiasts who want to understand, build, and scale LLMs and AI systems. Across 10 hands-on classes, you will:
- Learn how LLMs serve predictions efficiently at scale.
- Understand AI infrastructure, caching, vector databases, and distributed pipelines.
- Implement retrieval, indexing, multi-modal search, and RL-based inference loops.
- Build end-to-end AI pipelines and monitoring solutions.
By the end of the course, you will have the skills to design production-ready LLM systems, optimize AI workloads, and handle multi-modal, multilingual, and retrieval-augmented pipelines.
## Class 1: Foundations of AI Infrastructure & LLMs
Topics Covered:
- LLM, AI, LLM inference scaling: batching, speculative decoding
- Neural Network, AI Architecture
- GPU vs TPU trade-offs for inference
Hands-on:
- Write an async batch inference server in Python
- Implement Redis caching for inference API
Goal:
Understand how LLMs serve predictions efficiently, including hardware and software trade-offs, and how caching and batching improve throughput and latency.
### Class 1: Foundations of AI Infrastructure & LLMs
### Class 2: Fault Tolerance & Distributed AI
### Class 3: Vector Representations & Indexing
### Class 4: Scheduling & Optimization
### Class 5: Search & Retrieval
### Class 6: Embeddings & Multilingual AI
### Class 7: Advanced AI Pipelines
### Class 8: Multi-modal & Vision-Language Models
### Class 9: End-to-End LLM Systems
### Class 10: Capstone Simulation & Review