- Week 1 Jan 23
- LecIntroduction TutHPC tutorial
-
- Introduction
- History of self-driving cars
- Embodied learning
- Suggested readings
- Turing (1950) Computing Machinery and Intelligence
- Pomerleau (1988) ALVINN: An Autonomous Land Vehicle in a Neural Network
- [Video] History Channel 1998 : Driverless Car Technology Overview at Carnegie Mellon University
- Smith & Gasser (2005) The Development of Embodied Cognition: Six Lessons from Babies
- Week 2 Jan 30
- LecDeep Learning for Structured Outputs TutSimulator Tutorial
-
- Suggested readings
- LeCun (2006) A Tutorial on Energy-Based Learning
- Girshick et al. (2013) Rich feature hierarchies for accurate object detection and semantic segmentation
- Long et al. (2014) Fully Convolutional Networks for Semantic Segmentation
- Zheng et al. (2015) Conditional Random Fields as Recurrent Neural Networks
- Chen et al. (2016) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
- Kingma & Dhariwal (2018) Glow: Generative Flow with Invertible 1x1 Convolutions
- Ho et al. (2020) Denoising Diffusion Probabilistic Models
- Additional readings
- Carion et al. (2020) End-to-End Object Detection with Transformers
- Kamath et al. (2021) MDETR – Modulated Detection for End-to-End Multi-Modal Understanding
- Cheng et al. (2021) Per-Pixel Classification is Not All You Need for Semantic Segmentation
- Rombach et al. (2022) High-Resolution Image Synthesis with Latent Diffusion Models
- Kirillov et al. (2023) Segment Anything
- Bai et al. (2023) Sequential Modeling Enables Scalable Learning for Large Vision Models
- Chi et al. (2023) Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
- Suggested readings
- Week 3 Feb 6
- Lec3D Vision, Mapping TutVideo Learning Tutorial
-
- Suggested readings:
- Fischer et al. (2015) FlowNet: Learning Optical Flow with Convolutional Networks
- Godard et al. (2016) Unsupervised Monocular Depth Estimation with Left-Right Consistency
- Qi et al. (2016) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
- Tamar et al. (2016) Value Iteration Networks
- Parisotto et al. (2017) Neural Map: Structured Memory for Deep Reinforcement Learning
- Gupta et al. (2017) Cognitive Mapping and Planning for Visual Navigation
- Additional readings:
- Chaplot et al. (2020) Neural Topological SLAM for Visual Navigation
- Huang et al. (2022) FlowFormer: A Transformer Architecture for Optical Flow
- Wu et al. (2023) Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling
- Sun et al. (2023) Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes
- Yang et al. (2024) Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
- Wang et al. (2025) Continuous 3D Perception Model with Persistent State
- Suggested readings:
- Week 4 Feb 13
- LecSelf-Supervised Representation Learning and Object Discovery TutEgocentric Video Tutorial
-
- Suggested readings:
- Sermanet et al. (2017) Time-Contrastive Networks: Self-Supervised Learning from Video
- Van den Oord et al. (2018) Representation Learning with Contrastive Predictive Coding
- Wu et al. (2018) Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
- Chen et al. (2020) A Simple Framework for Contrastive Learning of Visual Representations
- Grill et al. (2020) Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
- He et al. (2021) Masked Autoencoders Are Scalable Vision Learners
- Additional readings:
- Weinzaepfel et al. (2022) CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
- Wang et al. (2022) Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
- Seo et al. (2022) Masked World Models for Visual Control
- Venkataramanan et al. (2023) Is ImageNet Worth 1 Video? Learning Strong Image Encoders from 1 Long Unlabelled Video
- van Steenkiste et al. (2024) Moving Off-the-Grid: Scene-Grounded Video Representations
- Cui et al. (2024) DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
- Wang et al. (2024) PooDLe: Pooled and Dense Self-Supervised Learning from Naturalistic Videos
- Suggested readings:
- Week 5 Feb 20
- LecWorld Models and Forecasting TutMotion Learning Tutorial