Back to Full Curriculum
ML501Semester 63 (2-0-2)Major

Natural Language Processing (NLP)

Language modeling and n-gram models, Regular expressions for tokenization, Sentence segmentation and normalization, Stemming, lemmatization, and part-of-speech tagging, Stopword removal and text normalization, Bag-of-...

Syllabus

01

Unit 1: NLP Fundamentals and Text Preprocessing

Language modeling and n-gram models, Regular expressions for tokenization, Sentence segmentation and normalization, Stemming, lemmatization, and part-of-speech tagging, Stopword removal and text normalization, Bag-of-words and TF-IDF representations, Character-level and subword tokenization (BPE, WordPiece), Unicode handling and multilingual text processing.

02

Unit 2: Word Embeddings and Sequence Models

Word2Vec (skip-gram, CBOW), GloVe and fastText embeddings, Contextual embeddings (ELMo, Flair), RNN/LSTM/GRU for sequence modeling, Bidirectional encoders, Sequence labeling tasks (NER, POS tagging, chunking), CRF layer for structured prediction, Attention mechanisms (self-attention, multi-head attention).

03

Unit 3: Transformer Architecture and BERT

Transformer model architecture (encoder-decoder, positional encoding), Self-attention and scaled dot-product attention, Multi-head attention and layer normalization, BERT pretraining objectives (MLM, NSP), RoBERTa, DistilBERT, and ALBERT variants, Fine-tuning strategies and domain adaptation, Sentence-BERT for semantic similarity.

04

Unit 4: Large Language Models and Prompt Engineering

GPT architecture evolution (GPT-1 to GPT-4), Decoder-only transformers, In-context learning and few-shot prompting, Chain-of-thought reasoning, Retrieval-Augmented Generation (RAG) architecture, Knowledge graphs and dense retrieval, Prompt engineering techniques (zero-shot, few-shot, instruction tuning), Hallucination mitigation strategies.

05

Unit 5: Advanced NLP Systems and Evaluation

Text generation evaluation (BLEU, ROUGE, BERTScore, human evaluation), Question answering systems (extractive, generative), Conversational AI (dialogue state tracking, response generation), Multilingual NLP (mBERT, XLM-R), Model deployment (TGI, vLLM), Ethical considerations (bias detection, toxicity classification, fairness evaluation), RAG evaluation frameworks.