Course Syllabus
Computational Analysis of Big Data |
Semester & Location: |
Fall 2024 - DIS Copenhagen |
Type & Credits: |
Elective Course - 3 credits |
Major Disciplines: |
Computer Science, Information Science, Mathematics |
Prerequisite: |
One year of computer science at university level and a course in algorithms and data structures. Knowledge of at least one programming language (e.g. Python, Javascript, Java, C++, Matlab). Note: We will be using Python in this course. If you have little or no experience coding in Python, you should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start. |
Faculty Members: |
Panagiota Katsikouli, PhD (current students please use the Canvas Inbox)
|
Program Director: |
Natalia Landázuri Sáenz, PhD |
Program Contact: | |
Lecture Time & Place: |
Tuesdays and Fridays 1:15 - 2:35pm Location: N7-C24 |
Course Description
Walmart started using big data even before the term became recognized. Today, industries, governments, social media platforms, finance, and organizations alike use data and analytics to optimize sales, minimize cost, and maximize reach. The ability to do so comes from the power of knowledge-based prediction, with the main goal of turning massive amount of data into actionable information.
In this course, we will learn about Big Data and Data Science from various perspectives and gain hands-on experience with a broad selection of tools and approaches in the context of relevant use-cases.
Classes will be a mix of thematic discussions and hands-on exercises. The course is programming intensive with group programming exercises and a programming final group project. We use the programming language Python.
At the end of the course, you will be able to select and use appropriate combinations of tools and approaches to tackle typical problems due to Big Data.
Course overview
- Fundamentals of working with data
- Getting data—scraping and APIs
- Machine learning: Introduction
- Machine learning: Supervised Learning
- Machine learning: Unsupervised Learning
- Natural language processing
- Modelling data with Complex Networks
- Crunching Big Data with Parallelization
- Project Proposals
- Lab work on project/supervision
- Lab work on project/supervision
- Project presentations
Course Elements
The following topics are covered in this course:
- Python programming
- APIs and Web scraping
- Machine learning
- Natural language processing
- Complex Networks
- Parallelization and Threading
Learning Objectives
Upon successfully completing the course, the student will be able to:
- Understand how Big Data fits into the context of Data Science
- Select computational tools for performing analysis on Big Data
- Acquire large datasets from online sources and apply Data Science tools
- Extract knowledge and build prediction models using machine learning
Faculty
Panagiota (Yota) Katsikouli
Readings
Most of the learning will be based on the book Python Data Science Handbook written by Jake VanderPlas. We will also use the freely available book Network Science by Albert-László Barabási. Some learning will also be based on papers, select chapters and videos available online.
Field Studies
During the course there is allocated time for two half-day field studies. Typically we visit relevant companies in the Copenhagen area and/or watch and then discuss a recent documentary on legal and ethical issues around Big Data.
Approach to Teaching
The course is designed around the principle of constructive alignment. The two major components in the course—the assignments and the final project—implement this principle by stating clear outcome goals of every activity and the course as a whole.
Lessons: Leading up to each lesson, students are given a "preparation goal" and a suggested list of materials they can use to reach it. Lessons typically include a lecture part, that introduces the topic of the day, and programming examples of the topics covered, that we will work on together in class. For every thematic module, students will be also given a notebook with extra exercises to work at their own time. These exercises are not graded. However, the graded assignments of the course will contain a selection of these exercises.
Assignments: Students will be asked to form groups of 2-3 students. The groups can work through the programming extra exercises together. The graded assignments (which consist of selected extra exercises) will be submitted as a group effort -- that is, assignments are submitted by groups and not individually.
Final project: From the beginning of the course the students are aware that an outcome of the course is a project that, if done well, can add value to their professional portfolio. Students (in groups of 2-3) will work on this project for the second half of the course (the first half being dedicated to the thematic modules). The project is a small study on some popular topic of their own choosing that they can investigate with data scraped or downloaded from the Internet. Students submit the project in two parts: First, each group must deliver a proposal which showcases the project idea and execution plan. The proposals are presented in class. Second, after completing the project, each group must communicate the results in the format of a report or a blog post (and submit those along with the code for their project). Our last class is dedicated in presenting the group projects in class.
Expectations of the Students
Students are expected to reach the preparation goal leading up to each session. Students who have little or no experience coding in Python should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start. Students should have a working laptop computer. It is advised that each machine has a least 4 GB of RAM and a reasonable processor (if it’s bought after 2017 you should be fine).
Students are expected to attend all classes and course's activities (i.e., the field trips). In case of illness, the student should inform the instructor as soon as possible for their absence. In case of a longer absence (more than 2 consecutive lessons) the student should provide a written explanation (for example, if the cause is illness, then the student should provide a doctor's paper) and DIS will be informed for the prolonged absence. Multiple and/or unjustified absences, as well as continuous late arrivals to classes and activities, will reflect in the student's final grade through the participation factor.
Evaluation
During the course you will hand in 3 group assignments containing exercises from the extra sets of exercises given to you for each thematic module. Furthermore, you will complete a larger project that uses tools which have been taught in the class. An acceptable project will cover e.g. data acquisition, data exploration and analysis. You will be allowed to define your own project, but you can also get assistance from the teacher.
Both project and assignments are group efforts. The teacher will rate all the assignments and give you detailed feedback.
During the programming projects, you are allowed to consult freely with any of the other students (in the other groups, that is) and the instructor. Contributions from other students, however, must be acknowledged with citations in your final report, as required by academic standards. Contributions to your presentations must similarly be acknowledged. Needless to say, the right to consult does not include the right to copy — programs, papers, and presentations must be your own original work.
The participation grade reflects a student's contributions to classes, exercises, comments on other students' questions on the Discussion boards, attendance and engagement with guest speakers and during field studies. Inappropriate and/or unprofessional behaviour (e.g., sleeping during presentations, being rude towards our hosts during field studies) results in a score of 0 for participation for the entire semester.
Grading
Participation (individual): |
10% |
Mandatory assignments (group): 3 coding assignments (15% each) containing a subset of the programming exercises. |
45% |
Final project (group): (10% project proposal presentation, 20% project report and 15% project presentation) |
45% |
Academic Regulations
Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on:
DIS - Study Abroad in Scandinavia - www.DISabroad.org
Course Summary:
Date | Details | Due |
---|---|---|