Course Syllabus

Computational Analysis of Big Data 

DIS Logo

Esri-and-Big-Data.jpg

Semester & Location:

Fall 2019 - DIS Copenhagen

Type & Credits:

Elective Course - 3 credits

Major Disciplines:

Computer Science. Mathematics

Faculty Members:

Ulf Aslak, ulfaslak@gmail.com 

Program Director:

Iben de Neergaard, idn@dis.dk

Time & Place:

Location: V10-A12

Tuesdays 13:15 - 16:10

 

Course Description

Walmart started using big data even before the term became recognized. Today, industries, governments, social media platforms, finance, and organizations alike use data and analytics to optimize sales, minimize cost, and maximize reach. The ability to do so comes from the power of knowledge-based prediction, with the main goal of turning massive amount of data into actionable information.

In this course, we will learn about Big Data and Data Science from various perspectives and gain hands-on experience with a broad selection of tools and approaches in the context of relevant use-cases.

Classes will be a mix of thematic discussions, hands-on problem solving, and project work in groups. At the end of the course, you will be able to select and use appropriate combinations of tools and approaches to tackle typical problems due to Big Data.

 

Prerequisites

One year of introduction to Computer Science and an introduction to probability theory, linear algebra or statistics at university level. Practical programming experience is strongly recommended (e.g. in Python/Javascript/Java/C++/Matlab) and prior knowledge of algorithms and data structures is useful.

Learning Objectives

Upon successfully completing the course, the student will be able to: 

  • Understand how Big Data fits into the context of Data Science 
  • Select computational tools for performing analysis on Big Data 
  • Acquire large datasets from online sources and apply Data Science tools 
  • Extract knowledge and build prediction models using machine learning 
  • Critically evaluate how analytical tools influence results both from both a technical and ethical perspective 

 

Course overview

The course is rooted in 12 sessions:

  1. Coding with data in Python
  2. A Data Scientist's most fundamental tools
  3. Getting data—scraping and APIs
  4. Machine learning 1
  5. Machine learning 2
  6. Networks
  7. Natural language processing
  8. Crunching Big Data with MapReduce
  9. Ethical and legal considerations in Big Data
  10. Lab work on project report
  11. Lab work on project report
  12. Project presentations

Course Elements

The following topics are covered in this course:

  • Python programming
  • Web scraping
  • Natural language processing
  • MapReduce
  • Machine learning
  • Networks
  • Legal considerations in Big Data
  • Ethical considerations in Big Data

 

Teacher

Ulf Aslak holds a PhD in Social Data Science, from the Copenhagen Centre for Social Data Science, University of Copenhagen, and has bachelor and masters degrees in Physics and Digital Media Engineering from the Technical University of Denmark (DTU). He is a visiting researcher at DTU, and has worked at the Uri Alon Lab in Israel and the Brockmann Lab in Berlin. He has experience working as a consultant and a Data Scientist at multiple private companies including Trustpilot, Alfa Laval, Peergrade, and Sterlitech.

Required texts

Most of the learning will be based on the book Data Science from Scratch: First Principles with Python, 1st Edition written by Joel Grus. We will also use the freely available book Network Science by Albert-László Barabási. Some learning will also be based on papers, blog posts and videos available online.

 

Approach to Teaching

The course is designed around the principle of constructive alignment. The two major components in the course—the assignments and the final project—implement this principle by stating clear outcome goals of every activity and the course as a whole.

Assignments: Leading up to each session, students are given a "preparation goal" and a suggested list of materials they can use to reach it. Sessions start with a short lecture (less than 1 hour) that introduces the topic of the day, and then students work through a set of technical exercises. The students are required to hand in two assignments throughout the course (40% of their final grade, 20% each), which are composed of selected problems from the exercises they have solved in class. This gives the student a clear outcome goal for each session: "show up prepared and complete the exercises". It gives incentive to prepare and work focussed.

Final project: From the beginning of the course the students are aware that an outcome of the course is a project that, if done well, can add value to their professional portfolio. The project is a small study on some popular topic of their own choosing that they can investigate with data they have scraped or downloaded from the Internet. They submit the project in two parts: First, each team must compose a proposal video which demonstrates that they have made a plan for their project and are able to hypothesize about the outcomes. Second, after they have completed their project they must communicate the results in the popular format of a blog post. The proposal video is a fun exercise that serves as a platform for sharing ideas between groups (we view them all in class) but it also forces them to start with a very comprehensive idea of the outcome in mind.

Another small but important component of the teaching approach is peer evaluation. Each student is tasked with reviewing 2 assignments after handing in their own (with or without a group). The reviewing process is anonymous. Using peer evaluations, each hand in gets a lot of varied feedback, and lets students reflect on their own work by reviewing how others solved the same problems. High quality feedback is incentivized by having each reviewee rate their received feedback such as to produce a feedback quality score for every reviewer which, by a small fraction, influences their final grade.

 

Expectations of the Students

Students are expected to reach the preparation goal leading up to each session. Students who have little or no experience coding in Python should either follow a Python tutorial before the course starts, or prepare to invest some hours getting up to speed with the language once we start. Students should have a working laptop computer. It is advised that each machine has a least 4 GB of RAM and a reasonable processor (if it’s bought after 2012 you should be fine). The Unix operating system is prefered (OSX and Linux), but not a necessity.

 

Field Studies

During the course there is allocated time for two one-day field studies. In the first field study, we will do a hackathon competetion, where students compete to create the best performing prediction model on a selected dataset from Kaggle. In the second hackathon we will visit a local company that works with Big Data.

 

Assignments and Evaluation

During the course you will hand in three assignments containing selected exercises solved in class. Furthermore, you will complete a larger project that uses tools which have been taught in the class. An acceptable project will cover e.g. data scraping and analysis. You will be allowed to define your own project, but you can also get assistance from the teacher.

Both project and assignments are group efforts. The teacher will rate all the assignments, but you will also participate using the peer evaluation system Peergrade.io, where each handin is double-blind peer-reviewed by 3-4 students which, together with the teacher’s evaluation composes indicators towards the final grade. This creates more and fairer feedback for each group as well as evaluation that is less sensitive to mistakes. Students’ overall feedback quality is taken into account during grade evaluation.

During the programming projects, you are allowed to consult freely with any of the other students and the instructor. Contributions from other students, however, must be acknowledged with citations in your final report, as required by academic standards. Contributions to your presentations must similarly be acknowledged. Needless to say, the right to consult does not include the right to copy — programs, papers, and presentations must be your own original work. 

When assigning the final grades, your efforts will weigh as follows: 

  • Participation: 15% (includes class/exercise/project behavior that is beneficial to the learning of others)
  • Mandatory assignments: 40%
  • Final project: 35% (10% proposal video, 25% project report and presentation)
  • Overall peer feedback quality: 10%

 

Academic Regulations  

Please make sure to read the Academic Regulations on the DIS website. There you will find regulations on:

 

Course Summary:

Date Details Due