Course Content

From medical decision support systems to automatic language translation, from sorting and prioritizing news on social networks to autonomous cars: Machine learning is woven into the fabric of daily life. Applying machine learning, data science aims to extract knowledge or insights from data.

The class will provide an introduction to data science and applied machine learning. For this, the programming language Python will be used (and taught). You will learn about the difference between supervised and unsupervised machine learning, and four machine learning tasks:

  1. Classification (e.g. k-NN, Decision Trees, Support Vector Machines)
  2. Regression (Linear Regression, Logistic Regression)
  3. Clustering (k-means)
  4. Dimensionality Reduction (PCA, t-SNE)

We will explore natural language processing for text mining and computer vision. Exploratory data analysis and evaluation, as an integral part of data science, will also be taught.

This class is taught remotely. Every week, the lecturer will upload new material to this website. To succeed in this course, you have to watch the videos, do the exercises and applications, and work on your own project. Remember that these videos are not full-fledged lectures, they are a starting point for your own learning. Use material like the coursebook to learn more about the topics as we progress in the course.

This is an online course, not a lecture that was filmed and put online. The course format was adapted to suit both the needs of the medium and the material.

In the course, we will mostly interact by sending each other videos. This means that you can rewatch videos, watch them at different speeds, and discuss the videos with each other.

Schedule

Below you find the schedule of the course. Note that there are different kinds of items.

  • Input, which are videos in which the lecturer provides an overview of a certain topic.
  • Exercises, which are videos in which the lecturer shows how to implement the content from a certain Input in Python using iPython Notebook.
  • Applications, for which you use the content from the Input and the Exercises to implement something on your own that goes beyond the Input and the Exercises. All of these have to be uploaded to StudIP! (See schedule for dates)
  • Individual or Group Deliverables, for which you have to record yourself and your screen, either by presenting yourself or your project (progress) or where you have to write a report. All of these have to be uploaded to StudIP! If a file is too big to be uploaded to StudIP directly, please upload a text file with a link to the video to StudIP (See schedule for dates).
  • Questions & Answers Sessions, for which the lecturer will collect your questions and answer them in a separate video.
  • Video Conference, where we as a group will video chat with a well-known machine learning experts about data science and machine learning.
Date of Session Content
April 20
Week #1
Input: Introduction to Data Science [Video] [Slides]

Input: Course Organization and Grading Criteria [Video] [Slides]

Input: Introduction to Python [Video] [Slides]

Individual Deliverable: Self-Introduction Video [Assignment] [Upload video or link to video to StudIP - Deadline: April 27, 23:59]

April 27
Week #2
Input: Introduction to Scientific Python [Video] [Slides]

Input: Basic Statistics [Video] [Slides]

May 04
Week #3
Input: How to use Python and iPython Notebook [Video]

Input: Classification [Video] [Slides]

Exercise: Classification [Video] [Notebook]

Individual Deliverable: Classification [Dataset] [Upload iPython Notebook to StudIP - Deadline: May 25, 23:59]

May 11
Week #4
Input: Regression [Video] [Slides]

Exercise: Regression [Video] [Notebook]

May 18
Week #5
Input: Clustering [Video] [Slides]

Input: Dimensionality Reduction [Video] [Slides]

Exercise: Clustering & Dimensionality Reduction [Video] [Notebook]

May 25
Week #6
Input: Course Project Expectations [Video] [Slides] Note that the FATE exercise is CANCELLED

Input: Fairness, Accountability, Transparency & Ethics in Machine Learning [Video] [Slides]

June 08
Week #7
Input: Natural Language Processing [Video] [Slides]

Input: Topic Modelling [Video] [Slides]

Questions & Answers Session #1 [Zoom Meeting, you find the link on StudIP]

Group Deliverable: Exposé Presentation [Upload video or link to video to StudIP - Deadline: June 15, 23:59]

June 15
Week #8
Input: Exploratory Data Analysis [Video] [Slides]

Exercise: Exploratory Data Analysis [Video] [Notebook]

June 22
Week #9
Input: Computer Vision [Video] [Slides]

Input: Evaluation Metrics [Video] [Slides]

June 29
Week #10
Group Deliverable: Progress Presentation [Upload video or link to video to StudIP - Deadline: June 29, 23:59]

July 06
Week #11
Video Conference with Machine Learning Expert: Karen Ullrich, University of Amsterdam, previously at Microsoft Research and Google DeepMind [Link to Video Conference]

July 13
Week #12
Group Deliverable: Final Presentation [Upload video or link to video to StudIP - Deadline: July 13, 23:59]

Group Deliverable: Final Report [Upload to StudIP - Deadline: July 20, 23:59]

Learning Outcomes

  • Programming Skills

    You learn how to train machine learning systems using Python and the libraries numpy, scipy, and scikit-learn.

  • Real Project

    You will work on a real data science project where you apply machine learning to answer your research questions.

  • Presentation and Report

    You will learn how to present your research findings both in videos and in a written report.