C S 11B: NATURAL LANGUAGE PROCESSING

Foothill College Course Outline of Record

Foothill College Course Outline of Record
Heading	Value
Effective Term:	Winter 2026
Units:	4.5
Hours:	4 lecture, 2 laboratory per week (72 total per quarter)
Prerequisite:	C S 3A and C S 8A.
Degree & Credit Status:	Degree-Applicable Credit Course
Foothill GE:	Non-GE
Transferable:	CSU/UC
Grade Type:	Letter Grade (Request for Pass/No Pass)
Repeatability:	Not Repeatable

Student Learning Outcomes

Define common applications in NLP and the tradeoffs of different solutions
Recognize the potential for bias in NLP models and discuss strategies for overcoming these pitfalls to promote equitable model outcomes
Use Python packages to create NLP pipelines that include text preprocessing, feature extraction, and model analysis

Description

This course provides an introduction to the field of natural language processing (NLP), a branch of artificial intelligence that focuses on the interaction between computers and human languages. Students will explore the fundamental concepts, techniques, and tools used to process and analyze natural language data. Topics covered include text preprocessing, tokenization, part-of-speech tagging, syntactic parsing, semantic analysis, sentiment analysis, machine translation, and language generation. Throughout the course, students will gain hands-on experience with popular NLP libraries and frameworks, such as NLTK, spaCy, and scikit-learn.

Course Objectives

The student will be able to:

Describe the landscape of applications related to natural language processing (NLP)
Demonstrate an understanding of the linguistics foundation of NLP
Use publicly available packages for text preprocessing
Use publicly available packages for feature extraction
Describe the traditional NLP models
Describe and use different flavors of pre-trained models
Demonstrate the use of NLP for information retrieval
Recognize NLP as a tool that can reduce or amplify problems in society

Course Content

Survey of applications
1. Chatbots
2. Sentiment analysis
3. Machine translation
4. Document classification
5. Topic modeling and summarization
6. Voice and speech recognition
Linguistics foundation
1. Phonetics and phonology
2. Morphology and syntax
3. Semantics and pragmatics
Text preprocessing
1. Tokenization and n-grams
2. Stopword removal
3. Lemmatization and stemming
4. Part of speech tagging
5. Named entity recognition
Feature extraction
1. Bag-of-words
2. Term frequency-inverse document frequency
3. Dimensionality reduction
4. Representation of speech sounds
Traditional models
1. Naive Bayes
2. Support vector machines
3. Evaluation of machine learning methods
4. Markov models in NLP
Pre-trained models
1. Word embeddings
2. Transformers and attention
3. Transfer learning in NLP
Information retrieval
1. Measuring text similarity
2. Semantic analysis
3. Ranking document relevance
Ethical and responsible NLP
1. Bias
2. Privacy and copyright
3. Interpretability and explainability
4. Generative safeguards

Lab Content

Environment familiarization
1. Navigating Jupyter notebooks (or a similar environment)
2. Running code cells and handling error messages
3. Installing and importing NLP libraries (e.g., NLTK, spaCy)
Basic text operations
1. Reading text data from files (plain text, CSV)
2. Printing and examining the first lines of text documents
3. Basic string operations (splitting, lowercasing, stripping punctuation)
Tokenization and preprocessing
1. Using tokenizers to split text into sentences and words
2. Removing stopwords
3. Applying lemmatization or stemming
4. Comparing raw vs. cleaned text outputs
Feature extraction
1. Converting text into numeric features
2. Programmatically emphasize important terms
3. Inspecting feature matrices (vocabulary size, top-weighted terms)
4. Extracting vocal features
Text classification setup
1. Splitting text data into training and test sets
2. Training a Naive Bayes classifier on labeled text data
3. Evaluating model performance with accuracy and a confusion matrix
4. Training and comparing a logistic regression classifier
5. Performing speaker recognition
Linguistic annotation
1. Applying POS tagging to sentences
2. Extracting named entities
3. Counting occurrences of specific tags or entity types
Word embeddings
1. Loading pre-trained embeddings (e.g., GloVe)
2. Retrieving and inspecting vector representations of words
3. Finding the most similar words to a given term
Pre-trained language models (transformers)
1. Building a pipeline for a task (e.g., sentiment analysis or Q&A)
2. Running inference with a pre-trained model on sample inputs
3. Trying different transformer tasks (e.g., fill-mask, zero-shot classification)
Practical applications
1. Implementing a keyword-based search over a small text corpus
2. Summarizing a text document
3. Evaluating the quality of retrieved information or generated summaries
Project integration and review
1. Combining multiple steps (preprocessing --> feature extraction --> classification) into a single workflow
2. Implementing a chosen NLP mini-project (e.g., sentiment analysis with embeddings)
3. Writing code to document and report results (e.g., plots, printouts of sample predictions)

Special Facilities and/or Equipment

1. The college will provide access to a computer laboratory with Python and an IDE installed, with sufficient privileges to allow students to install Python packages.
2. The college will provide a website or course management system with an assignment posting component (through which all lab assignments are to be submitted) and a forum component (where students can discuss course material and receive help from the instructor). This applies to all sections, including on-campus (i.e., face-to-face) offerings.
3. When taught online, the college will provide a fully functional and maintained course management system through which the instructor and students can interact.
4. When taught online, students must have currently existing email accounts and ongoing access to computers with internet capabilities.

Method(s) of Evaluation

Methods of Evaluation may include but are not limited to the following:

Tests and quizzes
Lab notebook
Written laboratory assignments which include source code, sample runs, and documentation
Reflective papers
Final examination or project

Method(s) of Instruction

Methods of Instruction may include but are not limited to the following:

Instructor-authored lectures which include mathematical foundations, theoretical motivation, and coding implementation of NLP models
Detailed review of assignments which includes model solutions and specific comments on the student submissions
Discussion which engages students and instructor in an ongoing dialog about NLP
Instructor-authored labs that rigorously demonstrate a student's ability to implement NLP models

Representative Text(s) and Other Materials

Vasiliev, Yuli. Modern NLP with spaCy: Mastering Natural Language Processing. 2022.

Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers: Building Language Applications with Hugging Face. 2022.

Patel, Ankur A., and Ajay Uppili Arasanipalai. Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Understand, and Interpret Text. 2021.

Types and/or Examples of Required Reading, Writing, and Outside of Class Assignments

Reading
1. Textbook assigned reading averaging 30 pages per week
2. Reading the supplied handouts and modules averaging 10 pages per week
3. Reading online resources as directed by instructor though links pertinent to programming
4. Reading library and reference material directed by instructor through course handouts
Writing
1. Writing technical prose documentation that supports and describes the programs that are submitted for grades

Discipline(s)

Computer Science

Academic Catalog