Course Syllabus

Computational Analysis of Big Data

DIS Logo

Esri-and-Big-Data.jpg

Semester & Location:

Spring 2024 - DIS Copenhagen

Type & Credits:

Elective Course - 3 credits

Major Disciplines:

Computer Science, Information Science, Mathematics

Prerequisite:

One year of computer science at the university level and a course in algorithms and data structures. Knowledge of at least one programming language (e.g. Python, Javascript, Java, C++, Matlab).

Note: We will be using Python in this course. If you have little or no experience coding in Python, you should either follow a Python tutorial before the course starts or prepare to invest some hours getting up to speed with the language once we start.

Faculty Members:

Anders Gjølbye Madsen and William Theodor Lehn-Schiøler (current students please use the Canvas Inbox)

 

Program Director:

Natalia Landázuri Sáenz, PhD

Academic Support:

csc-engr@disstockholm.se

Lecture Time & Place:

Tuesdays and Fridays 1:15 - 2:35pm Location: F24-203

 

Course Description

Walmart started using big data even before the term became recognized. Today, industries, governments, social media platforms, finance, and organizations alike use data and analytics to optimize sales, minimize cost, and maximize reach. The ability to do so comes from the power of knowledge-based prediction, with the main goal of turning massive amounts of data into actionable information.

In this course, we will learn about Big Data and Data Science from various perspectives and gain hands-on experience with a broad selection of tools and approaches in the context of relevant use cases.

Classes will be a mix of thematic discussions, hands-on problem-solving, and project work in groups. At the end of the course, you will be able to select and use appropriate combinations of tools and approaches to tackle typical problems due to Big Data.  This is a programming intensive course with weekly programming labs and a programming final project.  We use the programming language Python.

Course Overview

The course consists of 12 sessions:

  1. Fundamentals of working with data
  2. Getting data—scraping and APIs
  3. Machine learning: Introduction
  4. Machine learning: Supervised Learning
  5. Machine learning: Unsupervised Learning
  6. Crunching Big Data with Parallelization
  7. Natural language processing
  8. Modelling Data with Complex Networks
  9. Project Proposals
  10. Lab work on project/supervision
  11. Lab work on project/supervision
  12. Project presentations

Course Elements

The following topics are covered in this course:

  • Python Programming
  • APIs and Web scraping
  • Natural language processing
  • MapReduce
  • Machine learning
  • Information Networks
  • Philosophical Considerations in Big Data
  • Ethical Considerations in Big Data

Learning Objectives

Upon completing the course, the student will be able to: 

  • Understand how Big Data fits into the context of Data Science
  • Select computational tools for performing analysis on Big Data
  • Acquire large datasets from online sources and apply Data Science tools
  • Extract knowledge and build prediction models using machine learning
  • Critically evaluate how analytical tools influence results from both a technical and ethical perspective 

 

Faculty

Anders Gjølbye Madsen

Bachelor of Science in Engineering (Artificial Intelligence and Data Analysis, Technical University of Denmark, 2019-2022), Master of Science in Engineering (Mathematical Modeling and Computation, Technical University of Denmark, 2022-2024), Master of Science (Mathematics and Computer Science, ETH Zürich, 2023-2024), Researcher at Danish Pioneer Centre for Artificial Intelligence (2022 - Present),  Machine Learning Engineer at BrainCapture (2021 - Present), Board member at Copenhagen MedTech (2022 - Present), Member of the Young Academy Panel at Danish Data Science Academy (2024 - Present)

William Theodor Lehn-Schiøler

Bachelor of Science in Engineering (Artificial Intelligence and Data Analysis, Technical University of Denmark, 2019-2022), Master of Science in Engineering (Mathematical Modeling and Computation, the Technical University of Denmark, 2022-2024), Master of Science (Mathematics and Informatics, École Polytechnique, 2021-2022), Data Scientist at BrainCapture (2021 - Present), Board member at Copenhagen MedTech (2022 - Present), Chairman at Copenhagen MedTech (2023 - Present)

Readings

Most of the learning will be based on the book Python Data Science Handbook written by Jake VanderPlas. We will also use the freely available book Network Science by Albert-László Barabási. Some learning will also be based on papers, blog posts, and videos available online.

Field Studies

During the course, there is allocated time for two half-day field studies. Typically we visit relevant companies in the Copenhagen area and/or watch and then discuss a recent documentary on legal and ethical issues around Big Data.

Approach to Teaching

The course is designed around the principle of constructive alignment. The two major components in the course—the assignments and the final project—implement this principle by stating clear outcome goals of every activity and the course as a whole.

Assignments: Leading up to each session, students are given a "preparation goal" and a suggested list of materials they can use to reach it. Sessions start with a short lecture (less than 1 hour) that introduces the topic of the day, and then students work through a set of technical exercises. Each week, the students will be assigned to a random group of 2-3 students -- the groups will work through the technical exercises together, in class under the instructor's supervision, and afterward on their own.  They have a week to hand in their solutions.  Many groups complete the assignment in class. This gives the students a clear outcome goal for each session: "Show up prepared and complete the exercises". It incentivizes the students to prepare and work focused.

Final project: From the beginning of the course the students are aware that an outcome of the course is a project that, if done well, can add value to their professional portfolio. Students will work in groups of 3.  The project is a small study on some popular topic of their choosing that they can investigate with data scraped or downloaded from the Internet. Students submit the project in two parts: First, each team must deliver a proposal presentation that showcases the project idea and execution plan. Second, after completing the project each team must communicate the results in the popular format of a blog post (and submit a link to the blog post and the code repository). Both project presentations serve as a platform for sharing ideas between groups and offering peer feedback.

Expectations of the Students

Students are expected to reach the preparation goal leading up to each session. Students who have little or no experience coding in Python should either follow a Python tutorial before the course starts or prepare to invest some hours getting up to speed with the language once we start. Students should have a working laptop computer. It is advised that each machine has a least 4 GB of RAM and a reasonable processor (if it’s bought after 2017 you should be fine). The Unix operating system is preferred (Windows with WSL, OSX, and Linux), but not a necessity.

Evaluation

During the course, you will hand in weekly assignments containing exercises solved in class. Furthermore, you will complete a larger project that uses tools that have been taught in the class. An acceptable project will cover e.g. data scraping and analysis. You will be allowed to define your project, but you can also get assistance from the teacher.

Both projects and assignments are group efforts. The teacher will rate all the assignments.

During the programming projects, you are allowed to consult freely with any of the other students and the instructor. Contributions from other students, however, must be acknowledged with citations in your final report, as required by academic standards. Contributions to your presentations must similarly be acknowledged. Needless to say, the right to consult does not include the right to copy — programs, papers, and presentations must be your original work. 

The participation grade reflects a student's contributions to classes, exercises, comments on other student's questions on the Discussion boards, attendance, and engagement with guest speakers and during field studies.  Inappropriate and/or unprofessional behavior (e.g., sleeping during presentations, and being rude towards our hosts during field studies) results in a score of 0 for participation for the entire semester.

Grading

Participation:  (includes class, lab, project, and field studies behavior that is beneficial to the learning of others)

15%

Mandatory assignments:  3 coding assignments containing a subset of the programming exercises from the weekly labs

45%

Final project:  (10% proposal presentation, 30% project report and presentation)

40%

 

Academic Regulations  

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on: 

DIS - Study Abroad in Scandinavia - www.DISabroad.org

Course Summary:

Date Details Due