Course Syllabus

Computational Analysis of Big Data

DIS Logo

Esri-and-Big-Data.jpg

Semester & Location:

Spring 2024 - DIS Copenhagen

Type & Credits:

Elective Course - 3 credits

Major Disciplines:

Computer Science, Information Science, Mathematics

Prerequisite:

One year of computer science at university level and a course in algorithms and data structures. Knowledge of at least one programming language (e.g. Python, Javascript, Java, C++, Matlab).

Note: We will be using Python in this course. If you have little or no experience coding in Python, you should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start.

Faculty Members:

Lucian Leahu, PhD (current students please use the Canvas Inbox)

Program Director:

Natalia Landázuri Sáenz, PhD

Academic Support:

csc-engr@disstockholm.se

Lecture Time & Place:

Tuesdays and Fridays 1:15 - 2:35pm Location: F24-206

 

Course Description

Walmart started using big data even before the term became widely used. Today, many industries, social media platforms, governmental and non-governmental organizations alike use data and analytics to optimize their processes, minimize costs, and maximize reach. The ability to do so comes from the power of knowledge-based prediction, with the main goal of turning massive amount of data into actionable information.

In this course, we will examine several technical aspects of Big Data and Data Science and gain hands-on experience with a broad selection of tools and approaches in the context of relevant use-cases.

Classes will be a mix of thematic discussions, hands-on problem solving through coding, and project work in groups. At the end of the course, you will be able to select and use appropriate combinations of tools and approaches to tackle typical problems involving Big Data.  This is a programming intensive course with weekly programming labs and a programming final project.  We use the programming language Python.

Course overview

The course consists of 12 weeks (23 sessions):

  1. Fundamentals of working with data
  2. Getting data—scraping and APIs
  3. Machine learning: Introduction
  4. Machine learning: Supervised Learning
  5. Machine learning: Unsupervised Learning
  6. Crunching Big Data with Parallelization
  7. Natural language processing
  8. Modelling data with Complex Networks
  9. Project Proposals
  10. Lab work on project/supervision
  11. Lab work on project/supervision
  12. Project presentations

Course Elements

The following topics are covered in this course:

  • Python programming
  • APIs and Web scraping
  • Natural language processing
  • MapReduce
  • Machine learning
  • Information Networks
  • Philosophical considerations in Big Data
  • Ethical considerations in Big Data

Learning Objectives

Upon successfully completing the course, the student will be able to: 

  • Understand how Big Data fits into the context of Data Science 
  • Select computational tools for performing analysis on Big Data 
  • Acquire large datasets from online sources and apply Data Science tools 
  • Extract knowledge and build prediction models using machine learning 
  • Critically evaluate how analytical tools influence results both from both a technical and ethical perspective 

Faculty

Lucian Leahu PhD in Computer Science from Cornell University (2012). Assistant professor at ITU Copenhagen (2015-2018). ERCIM Postdoctoral Fellow at the Swedish Institute of Computer Science (2012-2013) and Project Leader in the Media Technology and Interaction Design Department at the Royal Institute of Technology (2014). With DIS since 2019.

 

Readings

Most of the learning will be based on the book Python Data Science Handbook written by Jake VanderPlas. We will also use the freely available book Network Science by Albert-László Barabási. Some learning will also be based on papers, blog posts and videos available online.

Field Studies

During the course there is allocated time for two half-day field studies. Typically, we visit relevant companies in the Copenhagen area and/or watch and then discuss a recent documentary on legal and ethical issues around Big Data.

Approach to Teaching

The course is designed around the principle of constructive alignment. The two major components in the course—the assignments and the final project—implement this principle by stating clear outcome goals of every activity and the course as a whole.

Assignments: Leading up to each session, students are given a "preparation goal" and a suggested list of materials they can use to reach it. Sessions start with a short lecture (less than 1 hour) that introduces the topic of the day, and then students work through a set of technical exercises. Each week, the students will be assigned to a random group of 2-3 students -- the groups will work through the technical exercises together, in class under the instructor's supervision, and afterwards on their own.  They have a week to hand-in their solutions.  Many groups complete the assignment in class. This gives the students a clear outcome goal for each session: "show up prepared and complete the exercises". It incentivises the students to prepare and work focussed.

Final project: From the beginning of the course the students are aware that an outcome of the course is a project that, if done well, can add value to their professional portfolio. Students will work in groups of 3.  The project is a small study on some popular topic of their own choosing that they can investigate with data scraped or downloaded from the Internet. Students submit the project in two parts: First, each team must deliver a proposal presentation which showcases the project idea and execution plan. Second, after completing the project each team must communicate the results in the popular format of a blog post (and submit a link to the blogpost and the code repository). Both project presentations serve as a platforms for sharing ideas between groups and offering peer feedback.

Expectations of the Students

Students are expected to reach the preparation goal leading up to each session. Students who have little or no experience coding in Python should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start. Students should have a working laptop computer. It is advised that each machine has a least 4 GB of RAM and a reasonable processor (if it’s bought after 2019 you should be fine). The Unix operating system is preferred (OSX and Linux), but not a necessity.

Evaluation

During the course you will hand in weekly assignments containing exercises solved in class. Furthermore, you will complete a larger project that uses tools which have been taught in the class. An acceptable project will cover e.g. data scraping and analysis. You will be allowed to define your own project, but you can also get assistance from the teacher.

Both project and assignments are group efforts. The teacher will rate all the assignments.

During the programming projects, you are allowed to consult freely with any of the other students and the instructor. Contributions from other students, however, must be acknowledged with citations in your final report, as required by academic standards. Contributions to your presentations must similarly be acknowledged. Needless to say, the right to consult does not include the right to copy — programs, papers, and presentations must be your own original work. 

The participation grade reflects a student's contributions to classes, exercises, comments on other students' questions on the Discussion boards, attendance and engagement with guest speakers and during field studies.  Inappropriate and/or unprofessional behaviour (e.g., sleeping during presentations, being rude towards our hosts during field studies) results in a score of 0 for participation for the entire semester.

Grading

Participation:  (includes class, lab, project, and field studies behavior that is beneficial to the learning of others)

15%

Mandatory assignments:  3 coding assignments containing a subset of the programming exercises from the weekly labs

45%

Final project:  (10% proposal presentation, 30% project report and presentation)

40%

 

Academic Regulations  

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on: 

DIS - Study Abroad in Scandinavia - www.DISabroad.org

Course Summary:

Date Details Due