Course Syllabus

Natural Language Processing

Semester & Location:	Spring 2025 - DIS Stockholm
Type & Credits:	Elective Course - 3 credits
Major Disciplines:	Computer Science, Mathematics
Prerequisite(s):	One year of computer science at university level. A course in data structures or a course in algorithms. Knowledge of a programming language (e.g. in Python/Javascript/Java/C++/Matlab).
Faculty Members:	John Rager (current students please use the Canvas Inbox)
Program Director:	Natalia Landázuri Sáenz, Ph.D.
Program Contact:	academics@disstockholm.se
Time & Place:	TBD

Course Description

Natural Language Processing (NLP) is the subfield of Artificial Intelligence that deals with tasks involving human languages – English, Swedish, Xhosa, etc. NLP includes question answering, sentiment analysis, summarization, and translation, among others. Recently, great excitement has been created by the part of NLP known as “Large Language Models (LLMs),” e.g. ChatGPT.

This course is an introduction to NLP which will focus on the parts of the field needed to gain an understanding of those Large Language Models. We will discuss and implement various algorithms needed to create an LLM. These may include Tokenization, Stemming and Lemmatization; Word Embeddings, Basic Neural Networks; Transformers and Attention Modules, and Tuning a Pretrained LLM model. We will discuss ways to use an LLM once you have one. Once we understand how LLMs work, we can ask why they are good at some things and not at others. We will also think about whether LLMs can be harmful. Students will code in python using various libraries.

Tentative Outline

CLASS 1 What is Natural Language Processing?

CLASS 2 The Preliminary Tasks: Normalization, Segmentation, Tokenization, Vectorization

(at this point in the class we will discuss what these tasks involve,

we will return to the details later

CLASS 3 Probability Intro, Calculating probabilities in NLP

CLASS 4 What is a language model? N-grams as a language model

CLASS 5 N-grams continued

CLASS 6 A First NLP tool: Naïve Bayes Classification

CLASS 7 Naïve Bayes, continued

CLASS 8 Introduction to Neural Networks

CLASS 9 Neural Networks/Gradient Descent, continued

CLASS 10 Neural Networks/Gradient Descent, continued

CLASS 11 Fixed length neural language models

CLASS 12 Fixed length neural language models

CLASS 13 RNNs and LSTMs (brief discussion)

CLASS 14 The Overall Structure of an LLM

CLASS 15 Attention and Transformers

CLASS 16 Attention and Transformers

CLASS 17 Attention and Transformers

CLASS 18 Fine Tuning

CLASS 19 Fine Tuning

CLASS 20 Tokenization

CLASS 21 Vectorization

CLASS 22 Project Work

CLASS 23 Project Presentations

Learning Objectives

By the end of this course, students will have

Demonstrated understanding of natural language processing tasks, models, and techniques.
Completed a series of projects to implement and improve NLP models.
Used standard Python NLP libraries in the development of these solutions.
Considered ethical concerns about NLP.

Faculty

Prof. Rager earned his Ph.D. from Northwestern University. He is the Thalheiner Professor of Computer Science at Amherst College in Amherst, Massachusetts where he has taught since 1988. He has always been interested in languages, both human and computer. His dissertation was in the field of symbolic natural language processing and subsequent to that his research has shifted to (among other things) natural language processing using machine learning. He has also worked on applying Artificial Intelligence to teaching English to Speakers of Other Languages. This work was motivated by the difficulties faced by English teachers in Moldova, where he was a Fulbright Scholar during the 2003-04 academic year. His teaching has often touched on language. For example, he has taught a seminar for first-year students called “Natural and Unnatural Languages.” The material in that course included “traditional” natural language processing as done in artificial intelligence, but also a discussion of rhetorical devices in Shakespeare, a reading of parts of Finnegan’s Wake and a discussion of language evolution. He has also taught a course on Digital Textual Analysis. That course discussed the computer science (e.g. topic modeling, Naive Bayes classification) used in papers in digital humanities. The course included both Computer Science and Humanities students, who worked together in groups on projects.

Readings

Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin (https://stanford.edu/~jurafsky/slp3/)

Exactly what we will cover depends to some extent on the class and how things go but I expect we will cover topics from chapters 1,3,4,6,7,10 and 11. We will not cover all of the material in those text chapters, the text is encyclopedic.

Other readings will be chosen from current literature. (This is a fast-moving field).

Field Studies

Field studies will involve visiting firms or researchers involved in the use and/or development of LLMs.

Approach to Teaching

I have always believed in teaching students, not material, so expect the course to change in response to the needs and interests of the students in it.

Many days I will introduce some material and then we will do an exercise to support understanding the material. There will be lots of examples, and lots of discussion.

Expectations of the Students

Come to class. You won't learn much if you do not.
Ask and answer questions. You can reply to a question either with an answer or with a clarifying question. I ask lots of questions – it helps us all stay engaged.

Evaluation

There are several kinds of assignments in this course:

Many classes will include exercises. Some will be individual, some group. Your solutions to them will be gathered into "portfolios." Some of these exercises will need to be finished after class. Most of these extended exercise should be thought of as programming assignments.
There will be a group project.
There will be short in-class checkup quizzes. These are designed to measure how the understanding is going. They do not count as part of your grade.
One or two people will be assigned as note-takes for each class session. This way the class together will produce a set of notes for the course.

Grading

Assignment	Percent
Participation – behavior that promotes learning by you and others	20%
Exercise Portfolios	35%
Note Taking (not graded, you either did them and get credit, or didn't and don't)	10%
Project	35%

Late Assignments

You need to do the assignments in order to learn the material, so I will usually be willing to consider extensions. Please talk to me well before the deadline if you think you are going to have trouble making the deadline. You should be prepared to discuss the following when requesting an extension:

- Explain why it important to your learning to get an extension
- Propose a new due date (it should not be far in the future)
- Explain why the proposed extension will not interfere with your ability to get the NEXT assignment done on time.

Use of AI and LLMs

You may not use ChatGPT, Co-pilot, or any other generative AI models or AI tool on any assignments unless the assignment specifically tells you to do so.. If you think it would benefit your learning to use one somewhere I have not allowed it, come tell me why!

Use of Laptops or Phones in class

This is a programming-intensive class. You will need to use your laptop to do the programming. Please restrict your laptop use to working on coursework.

Collaboration

Talking to other students is encouraged, within limits. Please discuss ideas, not code. To help you think about the boundaries, please follow these directives:

Do not discuss assignments with students outside of the class.
Wait one hour after discussing a project with other students before you write code. This will help you to make sure that you understand what you are coding.
Never show code from a project to another student at any time, whether before or after the assignment is due.
You can only use the code given to use as starter code. You may not search for code to solve your problem. You will not learn from that.
You may not post your code online. You may show your code to potential employers, faculty members and students not in the class.

DIS Accommodations Statement

Your learning experience in this class is important to me. If you have approved academic accommodations with DIS, please make sure I receive your DIS accommodations letter within two weeks from the start of classes. If you can think of other ways I can support your learning, please don't hesitate to talk to me. If you have any further questions about your academic accommodations, contact Academic Support acadsupp@dis.dk.

Academic Regulations

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on:

DIS - Study Abroad in Scandinavia - www.DISabroad.org

Course Summary:

Date	Details	Due

July 2025

Calendar
Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
29 June 2025 Previous month Next month Today Click to view event details	30 June 2025 Previous month Next month Today Click to view event details	1 July 2025 Previous month Next month Today Click to view event details	2 July 2025 Previous month Next month Today Click to view event details	3 July 2025 Previous month Next month Today Click to view event details	4 July 2025 Previous month Next month Today Click to view event details	5 July 2025 Previous month Next month Today Click to view event details
6 July 2025 Previous month Next month Today Click to view event details	7 July 2025 Previous month Next month Today Click to view event details	8 July 2025 Previous month Next month Today Click to view event details	9 July 2025 Previous month Next month Today Click to view event details	10 July 2025 Previous month Next month Today Click to view event details	11 July 2025 Previous month Next month Today Click to view event details	12 July 2025 Previous month Next month Today Click to view event details
13 July 2025 Previous month Next month Today Click to view event details	14 July 2025 Previous month Next month Today Click to view event details	15 July 2025 Previous month Next month Today Click to view event details	16 July 2025 Previous month Next month Today Click to view event details	17 July 2025 Previous month Next month Today Click to view event details	18 July 2025 Previous month Next month Today Click to view event details	19 July 2025 Previous month Next month Today Click to view event details
20 July 2025 Previous month Next month Today Click to view event details	21 July 2025 Previous month Next month Today Click to view event details	22 July 2025 Previous month Next month Today Click to view event details	23 July 2025 Previous month Next month Today Click to view event details	24 July 2025 Previous month Next month Today Click to view event details	25 July 2025 Previous month Next month Today Click to view event details	26 July 2025 Previous month Next month Today Click to view event details
27 July 2025 Previous month Next month Today Click to view event details	28 July 2025 Previous month Next month Today Click to view event details	29 July 2025 Previous month Next month Today Click to view event details	30 July 2025 Previous month Next month Today Click to view event details	31 July 2025 Previous month Next month Today Click to view event details	1 August 2025 Previous month Next month Today Click to view event details	2 August 2025 Previous month Next month Today Click to view event details
3 August 2025 Previous month Next month Today Click to view event details	4 August 2025 Previous month Next month Today Click to view event details	5 August 2025 Previous month Next month Today Click to view event details	6 August 2025 Previous month Next month Today Click to view event details	7 August 2025 Previous month Next month Today Click to view event details	8 August 2025 Previous month Next month Today Click to view event details	9 August 2025 Previous month Next month Today Click to view event details

Group	Weight
Assignments	0%
Total	0%