Back to Full Curriculum
RB-EL2Semester 74 (3-0-2)Elective

Deep Learning for Robot Perception

Convolutional layers and feature hierarchies, AlexNet to ResNet/VGG, Residual learning and skip connections, Depthwise separable convolutions (MobileNet), Dilated convolutions (DeepLab), Grouped convolutions (ShuffleN...

Syllabus

01

Unit 1: CNN Architectures for Visual Perception

Convolutional layers and feature hierarchies, AlexNet to ResNet/VGG, Residual learning and skip connections, Depthwise separable convolutions (MobileNet), Dilated convolutions (DeepLab), Grouped convolutions (ShuffleNet), Attention mechanisms (CBAM, SENet), EfficientNet scaling and compound coefficients.

02

Unit 2: Object Detection and Segmentation

Two-stage detectors (R-CNN, Fast/Faster R-CNN), One-stage detectors (YOLO v3-v8, SSD, RetinaNet), Anchor-free methods (CornerNet, FCOS), Instance segmentation (Mask R-CNN), Semantic segmentation (FCN, U-Net, DeepLab series), Panoptic segmentation (Panoptic FPN), Real-time detection trade-offs.

03

Unit 3: 3D Perception and Depth Estimation

Monocular depth estimation (MiDaS, Depth Anything), Stereo vision and disparity maps, LiDAR point cloud processing (voxelization, PointNet/PointNet++), Multi-view fusion (MVSNet), 3D object detection (VoxelNet, SECOND, PointRCNN), BEV representations (Lift-Splat-Shoot), Sensor fusion (camera-LiDAR-Radar).

04

Unit 4: Visual Odometry and SLAM

Feature-based VO (ORB-SLAM, DSO), Learning-based VO (DeepVO, VIO), Direct methods vs. indirect, Bundle adjustment optimization, Loop closure detection, Visual-inertial odometry (OKVIS, VINS-Mono), Neural radiance fields (NeRF) for dense reconstruction, Semantic SLAM integration.

05

Unit 5: Robust Perception and Domain Adaptation

Sim-to-real transfer (CycleGAN, domain randomization), Uncertainty estimation (Bayesian NNs, Monte Carlo dropout), Test-time adaptation, Continual learning for perception, Multi-modal fusion (BEVFusion, UniFusion), Edge deployment optimization (TensorRT, ONNX Runtime), Anomaly detection in perception pipelines.