Academic Catalog

C S 11B: NATURAL LANGUAGE PROCESSING

Foothill College Course Outline of Record

Foothill College Course Outline of Record
Heading Value
Effective Term: Winter 2026
Units: 4.5
Hours: 4 lecture, 2 laboratory per week (72 total per quarter)
Prerequisite: C S 3A and C S 8A.
Degree & Credit Status: Degree-Applicable Credit Course
Foothill GE: Non-GE
Transferable: CSU/UC
Grade Type: Letter Grade (Request for Pass/No Pass)
Repeatability: Not Repeatable

Student Learning Outcomes

  • Define common applications in NLP and the tradeoffs of different solutions
  • Recognize the potential for bias in NLP models and discuss strategies for overcoming these pitfalls to promote equitable model outcomes
  • Use Python packages to create NLP pipelines that include text preprocessing, feature extraction, and model analysis

Description

This course provides an introduction to the field of natural language processing (NLP), a branch of artificial intelligence that focuses on the interaction between computers and human languages. Students will explore the fundamental concepts, techniques, and tools used to process and analyze natural language data. Topics covered include text preprocessing, tokenization, part-of-speech tagging, syntactic parsing, semantic analysis, sentiment analysis, machine translation, and language generation. Throughout the course, students will gain hands-on experience with popular NLP libraries and frameworks, such as NLTK, spaCy, and scikit-learn.

Course Objectives

The student will be able to:

  1. Describe the landscape of applications related to natural language processing (NLP)
  2. Demonstrate an understanding of the linguistics foundation of NLP
  3. Use publicly available packages for text preprocessing
  4. Use publicly available packages for feature extraction
  5. Describe the traditional NLP models
  6. Describe and use different flavors of pre-trained models
  7. Demonstrate the use of NLP for information retrieval
  8. Recognize NLP as a tool that can reduce or amplify problems in society

Course Content

  1. Survey of applications
    1. Chatbots
    2. Sentiment analysis
    3. Machine translation
    4. Document classification
    5. Topic modeling and summarization
    6. Voice and speech recognition
  2. Linguistics foundation
    1. Phonetics and phonology
    2. Morphology and syntax
    3. Semantics and pragmatics
  3. Text preprocessing
    1. Tokenization and n-grams
    2. Stopword removal
    3. Lemmatization and stemming
    4. Part of speech tagging
    5. Named entity recognition
  4. Feature extraction
    1. Bag-of-words
    2. Term frequency-inverse document frequency
    3. Dimensionality reduction
    4. Representation of speech sounds
  5. Traditional models
    1. Naive Bayes
    2. Support vector machines
    3. Evaluation of machine learning methods
    4. Markov models in NLP
  6. Pre-trained models
    1. Word embeddings
    2. Transformers and attention
    3. Transfer learning in NLP
  7. Information retrieval
    1. Measuring text similarity
    2. Semantic analysis
    3. Ranking document relevance
  8. Ethical and responsible NLP
    1. Bias
    2. Privacy and copyright
    3. Interpretability and explainability
    4. Generative safeguards

Lab Content

  1. Environment familiarization
    1. Navigating Jupyter notebooks (or a similar environment)
    2. Running code cells and handling error messages
    3. Installing and importing NLP libraries (e.g., NLTK, spaCy)
  2. Basic text operations
    1. Reading text data from files (plain text, CSV)
    2. Printing and examining the first lines of text documents
    3. Basic string operations (splitting, lowercasing, stripping punctuation)
  3. Tokenization and preprocessing
    1. Using tokenizers to split text into sentences and words
    2. Removing stopwords
    3. Applying lemmatization or stemming
    4. Comparing raw vs. cleaned text outputs
  4. Feature extraction
    1. Converting text into numeric features
    2. Programmatically emphasize important terms
    3. Inspecting feature matrices (vocabulary size, top-weighted terms)
    4. Extracting vocal features
  5. Text classification setup
    1. Splitting text data into training and test sets
    2. Training a Naive Bayes classifier on labeled text data
    3. Evaluating model performance with accuracy and a confusion matrix
    4. Training and comparing a logistic regression classifier
    5. Performing speaker recognition
  6. Linguistic annotation
    1. Applying POS tagging to sentences
    2. Extracting named entities
    3. Counting occurrences of specific tags or entity types
  7. Word embeddings
    1. Loading pre-trained embeddings (e.g., GloVe)
    2. Retrieving and inspecting vector representations of words
    3. Finding the most similar words to a given term
  8. Pre-trained language models (transformers)
    1. Building a pipeline for a task (e.g., sentiment analysis or Q&A)
    2. Running inference with a pre-trained model on sample inputs
    3. Trying different transformer tasks (e.g., fill-mask, zero-shot classification)
  9. Practical applications
    1. Implementing a keyword-based search over a small text corpus
    2. Summarizing a text document
    3. Evaluating the quality of retrieved information or generated summaries
  10. Project integration and review
    1. Combining multiple steps (preprocessing --> feature extraction --> classification) into a single workflow
    2. Implementing a chosen NLP mini-project (e.g., sentiment analysis with embeddings)
    3. Writing code to document and report results (e.g., plots, printouts of sample predictions)

Special Facilities and/or Equipment

1. The college will provide access to a computer laboratory with Python and an IDE installed, with sufficient privileges to allow students to install Python packages.
2. The college will provide a website or course management system with an assignment posting component (through which all lab assignments are to be submitted) and a forum component (where students can discuss course material and receive help from the instructor). This applies to all sections, including on-campus (i.e., face-to-face) offerings.
3. When taught online, the college will provide a fully functional and maintained course management system through which the instructor and students can interact.
4. When taught online, students must have currently existing email accounts and ongoing access to computers with internet capabilities.

Method(s) of Evaluation

Methods of Evaluation may include but are not limited to the following:

Tests and quizzes
Lab notebook
Written laboratory assignments which include source code, sample runs, and documentation
Reflective papers
Final examination or project

Method(s) of Instruction

Methods of Instruction may include but are not limited to the following:

Instructor-authored lectures which include mathematical foundations, theoretical motivation, and coding implementation of NLP models
Detailed review of assignments which includes model solutions and specific comments on the student submissions
Discussion which engages students and instructor in an ongoing dialog about NLP
Instructor-authored labs that rigorously demonstrate a student's ability to implement NLP models

Representative Text(s) and Other Materials

Vasiliev, Yuli. Modern NLP with spaCy: Mastering Natural Language Processing. 2022.

Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers: Building Language Applications with Hugging Face. 2022.

Patel, Ankur A., and Ajay Uppili Arasanipalai. Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Understand, and Interpret Text. 2021.

Types and/or Examples of Required Reading, Writing, and Outside of Class Assignments

  1. Reading
    1. Textbook assigned reading averaging 30 pages per week
    2. Reading the supplied handouts and modules averaging 10 pages per week
    3. Reading online resources as directed by instructor though links pertinent to programming
    4. Reading library and reference material directed by instructor through course handouts
  2. Writing
    1. Writing technical prose documentation that supports and describes the programs that are submitted for grades

Discipline(s)

Computer Science