Course Syllabus

Computational Analysis of Big Data

DIS Logo

Esri-and-Big-Data.jpg

Semester & Location:

Spring 2024 - DIS Copenhagen

Type & Credits:

Elective Course - 3 credits

Major Disciplines:

Computer Science, Information Science, Mathematics

Prerequisite:

One year of computer science at university level and a course in algorithms and data structures. Knowledge of at least one programming language (e.g. Python, Javascript, Java, C++, Matlab).

Note: We will be using Python in this course. If you have little or no experience coding in Python, you should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start.

Faculty Members:

 Panagiota Katsikouli, PhD (current students please use the Canvas Inbox)

 

Program Director:

Natalia Landázuri Sáenz, PhD

Academic Support:

csc-engr@disstockholm.se

Lecture Time & Place:

Tuesdays. and Fridays 1:15 - 2:35pm Location: F24-306

 

Course Description

Walmart started using big data even before the term became recognized. Today, industries, governments, social media platforms, finance, and organizations alike use data and analytics to optimize sales, minimize cost, and maximize reach. The ability to do so comes from the power of knowledge-based prediction, with the main goal of turning massive amount of data into actionable information.

In this course, we will learn about Big Data and Data Science from various perspectives and gain hands-on experience with a broad selection of tools and approaches in the context of relevant use-cases.

Classes will be a mix of thematic discussions, hands-on problem solving, and project work in groups. At the end of the course, you will be able to select and use appropriate combinations of tools and approaches to tackle typical problems due to Big Data.  This is a programming intensive course with weekly programming labs and a programming final project.  We use the programming language Python.

Course overview

The course consists of 12 sessions:

  1. Fundamentals of working with data
  2. Getting data—scraping and APIs
  3. Machine learning: Introduction
  4. Machine learning: Supervised Learning
  5. Machine learning: Unsupervised Learning
  6. Crunching Big Data with Parallelization
  7. Natural language processing
  8. Modelling data with Complex Networks
  9. Project Proposals
  10. Lab work on project/supervision
  11. Lab work on project/supervision
  12. Project presentations

Course Elements

The following topics are covered in this course:

  • Python programming
  • APIs and Web scraping
  • Natural language processing
  • MapReduce
  • Machine learning
  • Information Networks
  • Philosophical considerations in Big Data
  • Ethical considerations in Big Data

Learning Objectives

Upon successfully completing the course, the student will be able to: 

  • Understand how Big Data fits into the context of Data Science 
  • Select computational tools for performing analysis on Big Data 
  • Acquire large datasets from online sources and apply Data Science tools 
  • Extract knowledge and build prediction models using machine learning 
  • Critically evaluate how analytical tools influence results both from both a technical and ethical perspective 

Faculty

image.png

Panagiota (Yota) Katsikouli

PhD (Informatics, University of Edinburgh, 2018) . Post-doctoral Researcher, INRIA Lyon, 2018-2019. Post-doctoral Researcher, University College of Dublin, 2019. Post-doctoral Researcher, Technical University of Denmark, 2019-2020. Teaching and Research, University of Copenhagen, 2020-present. Faculty Member, OPen Institute of Technology, 2023-present. With DIS since 2023.

 

Readings

Most of the learning will be based on the book Python Data Science Handbook written by Jake VanderPlas. We will also use the freely available book Network Science by Albert-László Barabási. Some learning will also be based on papers, blog posts and videos available online.

Field Studies

During the course there is allocated time for two half-day field studies. Typically we visit relevant companies in the Copenhagen area and/or watch and then discuss a recent documentary on legal and ethical issues around Big Data.

Approach to Teaching

The course is designed around the principle of constructive alignment. The two major components in the course—the assignments and the final project—implement this principle by stating clear outcome goals of every activity and the course as a whole.

Assignments: Leading up to each session, students are given a "preparation goal" and a suggested list of materials they can use to reach it. Sessions start with a short lecture (less than 1 hour) that introduces the topic of the day, and then students work through a set of technical exercises. Each week, the students will form groups of 2-3 students -- the groups will work through the technical exercises together, in class under the instructor's supervision, and afterwards on their own.  They have a week to hand-in their solutions.  Many groups complete the assignment in class. This gives the students a clear outcome goal for each session: "show up prepared and complete the exercises". It incentivizes the students to prepare and work focused.

Final project: From the beginning of the course the students are aware that an outcome of the course is a project that, if done well, can add value to their professional portfolio. Students will work in groups of 3.  The project is a small study on some popular topic of their own choosing that they can investigate with data scraped or downloaded from the Internet. Students submit the project in two parts: First, each team must deliver a proposal presentation which showcases the project idea and execution plan. Second, after completing the project each team must communicate the results in the popular format of a blog post (and submit a link to the blogpost and the code repository). Both project presentations serve as a platforms for sharing ideas between groups and offering peer feedback.

Expectations of the Students

Students are expected to reach the preparation goal leading up to each session. Students who have little or no experience coding in Python should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start. Students should have a working laptop computer. It is advised that each machine has a least 4 GB of RAM and a reasonable processor (if it’s bought after 2017 you should be fine). The Unix operating system is preferred (OSX and Linux), but not a necessity.

Evaluation

During the course you will hand in weekly assignments containing exercises solved in class. Furthermore, you will complete a larger project that uses tools which have been taught in the class. An acceptable project will cover e.g. data scraping and analysis. You will be allowed to define your own project, but you can also get assistance from the teacher.

Both project and assignments are group efforts. The teacher will rate all the assignments.

During the programming projects, you are allowed to consult freely with any of the other students and the instructor. Contributions from other students, however, must be acknowledged with citations in your final report, as required by academic standards. Contributions to your presentations must similarly be acknowledged. Needless to say, the right to consult does not include the right to copy — programs, papers, and presentations must be your own original work. 

The participation grade reflects a student's contributions to classes, exercises, comments on other students' questions on the Discussion boards, attendance and engagement with guest speakers and during field studies.  Inappropriate and/or unprofessional behaviour (e.g., sleeping during presentations, being rude towards our hosts during field studies) results in a score of 0 for participation for the entire semester.

Grading

Participation:  (includes class, lab, project, and field studies behavior that is beneficial to the learning of others)

15%

Mandatory assignments:  3 coding assignments (15% each) containing a subset of the programming exercises from the weekly labs. 

45%

Final project:  (10% project proposal presentation, 20% project report and 10% project presentation)

40%

 

Academic Regulations  

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on: 

DIS - Study Abroad in Scandinavia - www.DISabroad.org

Course Summary:

Date Details Due