Course Syllabus
Natural Language Processing |
Semester & Location: |
Spring 2025 - DIS Stockholm |
Type & Credits: |
Elective Course - 3 credits |
Major Disciplines: |
Computer Science, Mathematics |
Prerequisite(s): |
One year of computer science at university level. A course in data structures or a course in algorithms. Knowledge of a programming language (e.g. in Python/Javascript/Java/C++/Matlab). |
Faculty Members: |
John Rager (current students please use the Canvas Inbox) |
Program Director: |
Natalia Landázuri Sáenz, Ph.D. |
Program Contact: | |
Time & Place: |
TBD |
Course Description
Natural Language Processing (NLP) is the subfield of Artificial Intelligence that deals with tasks involving human languages – English, Swedish, Xhosa, etc. NLP includes question answering, sentiment analysis, summarization, and translation, among others. Recently, great excitement has been created by the part of NLP known as “Large Language Models (LLMs),” e.g. ChatGPT.
This course is an introduction to NLP which will focus on the parts of the field needed to gain an understanding of those Large Language Models. We will discuss and implement various algorithms needed to create an LLM. These may include Tokenization, Stemming and Lemmatization; Word Embeddings, Basic Neural Networks; Transformers and Attention Modules, and Tuning a Pretrained LLM model. We will discuss ways to use an LLM once you have one. Once we understand how LLMs work, we can ask why they are good at some things and not at others. We will also think about whether LLMs can be harmful. Students will code in python using various libraries.
Tentative Outline
CLASS 1 What is Natural Language Processing?
CLASS 2 The Preliminary Tasks: Normalization, Segmentation, Tokenization, Vectorization
(at this point in the class we will discuss what these tasks involve,
we will return to the details later
CLASS 3 Probability Intro, Calculating probabilities in NLP
CLASS 4 What is a language model? N-grams as a language model
CLASS 5 N-grams continued
CLASS 6 A First NLP tool: Naïve Bayes Classification
CLASS 7 Naïve Bayes, continued
CLASS 8 Introduction to Neural Networks
CLASS 9 Neural Networks/Gradient Descent, continued
CLASS 10 Neural Networks/Gradient Descent, continued
CLASS 11 Fixed length neural language models
CLASS 12 Fixed length neural language models
CLASS 13 RNNs and LSTMs (brief discussion)
CLASS 14 The Overall Structure of an LLM
CLASS 15 Attention and Transformers
CLASS 16 Attention and Transformers
CLASS 17 Attention and Transformers
CLASS 18 Fine Tuning
CLASS 19 Fine Tuning
CLASS 20 Tokenization
CLASS 21 Vectorization
CLASS 22 Project Work
CLASS 23 Project Presentations
Learning Objectives
By the end of this course, students will have
- Demonstrated understanding of natural language processing tasks, models, and techniques.
- Completed a series of projects to implement and improve NLP models.
- Used standard Python NLP libraries in the development of these solutions.
- Considered ethical concerns about NLP.
Faculty
Prof. Rager earned his Ph.D. from Northwestern University. He is the Thalheiner Professor of Computer Science at Amherst College in Amherst, Massachusetts where he has taught since 1988. He has always been interested in languages, both human and computer. His dissertation was in the field of symbolic natural language processing and subsequent to that his research has shifted to (among other things) natural language processing using machine learning. He has also worked on applying Artificial Intelligence to teaching English to Speakers of Other Languages. This work was motivated by the difficulties faced by English teachers in Moldova, where he was a Fulbright Scholar during the 2003-04 academic year. His teaching has often touched on language. For example, he has taught a seminar for first-year students called “Natural and Unnatural Languages.” The material in that course included “traditional” natural language processing as done in artificial intelligence, but also a discussion of rhetorical devices in Shakespeare, a reading of parts of Finnegan’s Wake and a discussion of language evolution. He has also taught a course on Digital Textual Analysis. That course discussed the computer science (e.g. topic modeling, Naive Bayes classification) used in papers in digital humanities. The course included both Computer Science and Humanities students, who worked together in groups on projects.
Readings
Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin (https://stanford.edu/~jurafsky/slp3/)
Exactly what we will cover depends to some extent on the class and how things go but I expect we will cover topics from chapters 1,3,4,6,7,10 and 11. We will not cover all of the material in those text chapters, the text is encyclopedic.
Other readings will be chosen from current literature. (This is a fast-moving field).
Field Studies
Field studies will involve visiting firms or researchers involved in the use and/or development of LLMs.
Approach to Teaching
I have always believed in teaching students, not material, so expect the course to change in response to the needs and interests of the students in it.
Many days I will introduce some material and then we will do an exercise to support understanding the material. There will be lots of examples, and lots of discussion.
Expectations of the Students
- Come to class. You won't learn much if you do not.
- Ask and answer questions. You can reply to a question either with an answer or with a clarifying question. I ask lots of questions – it helps us all stay engaged.
Evaluation
There are several kinds of assignments in this course:
- Many classes will include exercises. Some will be individual, some group. Your solutions to them will be gathered into "portfolios." Some of these exercises will need to be finished after class. Most of these extended exercise should be thought of as programming assignments.
- There will be a group project.
- There will be short in-class checkup quizzes. These are designed to measure how the understanding is going. They do not count as part of your grade.
- One or two people will be assigned as note-takes for each class session. This way the class together will produce a set of notes for the course.
Grading
Assignment |
Percent |
Participation – behavior that promotes learning by you and others |
20% |
Exercise Portfolios |
35% |
Note Taking (not graded, you either did them and get credit, or didn't and don't) |
10% |
Project |
35% |
Late Assignments
You need to do the assignments in order to learn the material, so I will usually be willing to consider extensions. Please talk to me well before the deadline if you think you are going to have trouble making the deadline. You should be prepared to discuss the following when requesting an extension:
- - Explain why it important to your learning to get an extension
- - Propose a new due date (it should not be far in the future)
- - Explain why the proposed extension will not interfere with your ability to get the NEXT assignment done on time.
Use of AI and LLMs
You may not use ChatGPT, Co-pilot, or any other generative AI models or AI tool on any assignments unless the assignment specifically tells you to do so.. If you think it would benefit your learning to use one somewhere I have not allowed it, come tell me why!
Use of Laptops or Phones in class
This is a programming-intensive class. You will need to use your laptop to do the programming. Please restrict your laptop use to working on coursework.
Collaboration
Talking to other students is encouraged, within limits. Please discuss ideas, not code. To help you think about the boundaries, please follow these directives:
- Do not discuss assignments with students outside of the class.
- Wait one hour after discussing a project with other students before you write code. This will help you to make sure that you understand what you are coding.
- Never show code from a project to another student at any time, whether before or after the assignment is due.
- You can only use the code given to use as starter code. You may not search for code to solve your problem. You will not learn from that.
- You may not post your code online. You may show your code to potential employers, faculty members and students not in the class.
DIS Accommodations Statement
Your learning experience in this class is important to me. If you have approved academic accommodations with DIS, please make sure I receive your DIS accommodations letter within two weeks from the start of classes. If you can think of other ways I can support your learning, please don't hesitate to talk to me. If you have any further questions about your academic accommodations, contact Academic Support acadsupp@dis.dk.
Academic Regulations
Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on:
DIS - Study Abroad in Scandinavia - www.DISabroad.org
Course Summary:
Date | Details | Due |
---|---|---|