Course Syllabus

Natural Language Processing

DIS Logo


image.png  

Semester & Location:

Spring 2025 - DIS Stockholm

Type & Credits:

Elective Course - 3 credits

Major Disciplines:

Computer Science, Mathematics

Prerequisite(s):

One year of computer science at university level. A course in data structures or a course in algorithms. Knowledge of a programming language (e.g. in Python/Javascript/Java/C++/Matlab).

Faculty Members:

John Rager (current students please use the Canvas Inbox)

Program Director:

Natalia Landázuri Sáenz, Ph.D.

Program Contact:

academics@disstockholm.se

Time & Place:

TBD

Course Description

Natural Language Processing (NLP) is the subfield of Artificial Intelligence that deals with tasks involving human languages – English, Swedish, Xhosa, etc.  NLP includes question answering, sentiment analysis, summarization, and translation, among others.  Recently, great excitement has been created by the part of NLP known as “Large Language Models (LLMs),” e.g. ChatGPT.

This course is an introduction to NLP which will focus on the parts of the field needed to gain an understanding of those Large Language Models. We will discuss and implement various algorithms needed to create an LLM.  These may include Tokenization, Stemming and Lemmatization; Word Embeddings, Basic Neural Networks; Transformers and Attention Modules, and Tuning a Pretrained LLM model. We will discuss ways to use an LLM once you have one.  Once we understand how LLMs work, we can ask why they are good at some things and not at others.  We will also think about whether LLMs can be harmful.  Students will code in python using various libraries.

Tentative Outline

CLASS 1              What is Natural Language Processing?

CLASS 2            The Preliminary Tasks: Normalization, Segmentation, Tokenization, Vectorization

                        (at this point in the  class we will discuss what these tasks involve,

we will return to the details later

CLASS 3            Probability Intro, Calculating probabilities in NLP

CLASS 4            What is a language model?  N-grams as a language model

CLASS 5            N-grams continued

CLASS 6            A First NLP tool: Naïve Bayes Classification

CLASS 7            Naïve Bayes, continued

CLASS 8            Introduction to Neural Networks

CLASS 9            Neural Networks/Gradient Descent, continued

CLASS 10          Neural Networks/Gradient Descent, continued

CLASS 11          Fixed length neural language models

CLASS 12          Fixed length neural language models

CLASS 13          RNNs and LSTMs (brief discussion)

CLASS 14          The Overall Structure of an LLM

CLASS 15          Attention and Transformers

CLASS 16          Attention and Transformers

CLASS 17          Attention and Transformers

CLASS 18          Fine Tuning

CLASS 19          Fine Tuning

CLASS 20          Tokenization

CLASS 21          Vectorization

CLASS 22          Project Work

CLASS 23          Project Presentations

 

Learning Objectives

By the end of this course, students will have

  • Demonstrated understanding of natural language processing tasks, models, and techniques.
  • Completed a series of projects to implement and improve NLP models.
  • Used standard Python NLP libraries in the development of these solutions.
  • Considered ethical concerns about NLP.

Faculty

Prof. Rager earned his Ph.D. from Northwestern University.  He is the Thalheiner Professor of Computer Science at Amherst College in Amherst, Massachusetts where he has taught since 1988.  He has always been interested in languages, both human and computer.  His dissertation was in the field of symbolic natural language processing and subsequent to that his research has shifted to (among other things) natural language processing using machine learning.  He has also worked on applying Artificial Intelligence to teaching English to Speakers of Other Languages. This work was motivated by the difficulties faced by English teachers in Moldova, where he was a Fulbright Scholar during the 2003-04 academic year. His teaching has often touched on language. For example, he has taught a seminar for first-year students called “Natural and Unnatural Languages.” The material in that course included “traditional” natural language processing as done in artificial intelligence, but also a discussion of rhetorical devices in Shakespeare, a reading of parts of Finnegan’s Wake and a discussion of language evolution.  He has also taught a course on Digital Textual Analysis. That course discussed the computer science (e.g. topic modeling, Naive Bayes classification) used in papers in digital humanities. The course included both Computer Science and Humanities students, who worked together in groups on projects.

Readings

Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin  (https://stanford.edu/~jurafsky/slp3/)

Exactly what we will cover depends to some extent on the class and how things go but I expect we will cover topics from chapters 1,3,4,6,7,10 and 11.  We will not cover all of the material in those text chapters, the text is encyclopedic.

Other readings will be chosen from current literature. (This is a fast-moving field).

Field Studies

Field studies will involve visiting firms or researchers involved in the use and/or development of LLMs.

Approach to Teaching

I have always believed in teaching students, not material, so expect the course to change in response to the needs and interests of the students in it.

Many days I will introduce some material and then we will do an exercise to support understanding the material.  There will be lots of examples, and lots of discussion.

Expectations of the Students

  1. Come to class. You won't learn much if you do not.
  2. Ask and answer questions. You can reply to a question either with an answer or with a clarifying question. I ask lots of questions – it helps us all stay engaged.

Evaluation

There are several kinds of assignments in this course:

  1. Many classes will include exercises. Some will be individual, some group.  Your solutions to them will be gathered into "portfolios."  Some of these exercises will need to be finished after class. Most of these extended exercise should be thought of as programming assignments.
  2. There will be a group project.
  3. There will be short in-class checkup quizzes. These are designed to measure how the understanding is going.  They do not count as part of your grade.
  4. One or two people will be assigned as note-takes for each class session.  This way the class together will produce a set of notes for the course.

Grading

Assignment

Percent

Participation – behavior that promotes learning by you and others

20%

 

Exercise Portfolios

35%

 

Note Taking (not graded, you either did them and get credit, or didn't and don't)

10%

Project

35%

Late Assignments

You need to do the assignments in order to learn the material, so I will usually be willing to consider extensions.  Please talk to me well before the deadline if you think you are going to have trouble making the deadline.  You should be prepared to discuss the following when requesting an extension:

  • -  Explain why it important to your learning to get an extension
  • -  Propose a new due date (it should not be far in the future)
  • -  Explain why the proposed extension will not interfere with your ability to get the NEXT assignment done on time.

Use of AI and LLMs

You may not use ChatGPT, Co-pilot, or any other generative AI models or AI tool on any assignments unless the assignment specifically tells you to do so.. If you think it would benefit your learning to use one somewhere I have not allowed it, come tell me why!

Use of Laptops or Phones in class 

This is a programming-intensive class.  You will need to use your laptop to do the programming.  Please restrict your laptop use to working on coursework.

Collaboration

Talking to other students is encouraged, within limits.  Please discuss ideas, not code.  To help you think about the boundaries, please follow these directives:

  • Do not discuss assignments with students outside of the class.
  • Wait one hour after discussing a project with other students before you write code. This will help you to make sure that you understand what you are coding.
  • Never show code from a project to another student at any time, whether before or after the assignment is due.
  • You can only use the code given to use as starter code. You may not search for code to solve your problem.  You will not learn from that.
  • You may not post your code online. You may show your code to potential employers, faculty members and students not in the class.

DIS Accommodations Statement 

Your learning experience in this class is important to me.  If you have approved academic accommodations with DIS, please make sure I receive your DIS accommodations letter within two weeks from the start of classes. If you can think of other ways I can support your learning, please don't hesitate to talk to me. If you have any further questions about your academic accommodations, contact Academic Support acadsupp@dis.dk. 

Academic Regulations

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on:

 

DIS - Study Abroad in Scandinavia - www.DISabroad.org

Course Summary:

Date Details Due