Back to Full Curriculum
ML601Semester 63 (2-0-2)Major

Computer Vision

Image formation and digitization (sampling, quantization), Pixel representations and color spaces (RGB, HSV, Lab, grayscale), Spatial domain filtering (linear, nonlinear filters, smoothing, sharpening), Edge detection...

Syllabus

01

Unit 1: Digital Image Fundamentals and Processing

Image formation and digitization (sampling, quantization), Pixel representations and color spaces (RGB, HSV, Lab, grayscale), Spatial domain filtering (linear, nonlinear filters, smoothing, sharpening), Edge detection (Sobel, Prewitt, Canny, Laplacian of Gaussian), Histogram equalization and contrast enhancement, Morphological operations (erosion, dilation, opening, closing).

02

Unit 2: Frequency Domain Processing and Transforms

2D Discrete Fourier Transform (DFT) and properties, Fast Fourier Transform (FFT) implementation, High-pass/low-pass filtering in frequency domain, Homomorphic filtering for illumination correction, Discrete Cosine Transform (DCT) for compression, Wavelet transforms (Haar, Daubechies), Multi-resolution analysis and pyramid representations.

03

Unit 3: Feature Detection, Extraction, and Description

Corner detection (Harris, Shi-Tomasi), Blob detection (LoG, DoG, Hessian), SIFT (scale-space extrema, keypoint description), SURF and ORB for real-time applications, HOG (Histogram of Oriented Gradients) for pedestrian detection, Feature matching (nearest neighbor, FLANN), RANSAC for robust estimation, Bag-of-visual-words model.

04

Unit 4: Camera Geometry and 3D Vision

Pinhole camera model and intrinsic/extrinsic parameters, Camera calibration (Zhang's method, checkerboard patterns), Epipolar geometry and fundamental matrix, Stereo vision (disparity maps, rectification), Structure from Motion (SfM) pipeline, SLAM fundamentals (visual odometry, bundle adjustment), Depth estimation from monocular cues.

05

Unit 5: Deep Learning for Computer Vision

CNN architectures for vision (AlexNet, VGG, ResNet, EfficientNet), Object detection (R-CNN family, YOLO, SSD, RetinaNet), Semantic segmentation (FCN, U-Net, DeepLab), Instance segmentation (Mask R-CNN), Visual transformers (ViT, Swin Transformer), Self-supervised learning (SimCLR, DINO), 3D vision with PointNet/PointNet++, Video analysis (optical flow, 3D CNNs).