Course Content

From medical decision support systems to automatic language translation, from sorting and prioritizing news on social networks to autonomous cars: Machine learning is woven into the fabric of daily life. Applying machine learning, data science aims to extract knowledge or insights from data.

The class will provide an introduction to data science and applied machine learning. For this, the programming language Python will be used (and taught). You will learn about the difference between supervised and unsupervised machine learning, and four machine learning tasks:

  1. Classification (e.g. k-NN, Decision Trees, Support Vector Machines)
  2. Regression (Linear Regression, Logistic Regression)
  3. Clustering (k-means)
  4. Dimensionality Reduction (PCA, t-SNE)

We will explore natural language processing for text mining and computer vision. Exploratory data analysis and evaluation, as an integral part of data science, will also be taught.

This class is taught remotely. Every week, the lecturer will upload new material to this website. To succeed in this course, you have to watch the videos, do the exercises and applications, and work on your own project. Remember that these videos are not full-fledged lectures, they are a starting point for your own learning. Use material like the coursebook to learn more about the topics as we progress in the course.

This is an online course, not a lecture that was filmed and put online. The course format was adapted to suit both the needs of the medium and the material.

We will meet regularly, but most of the input will be provided as videos. This allows you to rewatch videos, watch them at different speeds, and discuss the videos with each other.


Below you find the schedule of the course. Note that there are different kinds of items.

  • Meetings, where we as a group will video chat via Zoom.
  • Input, which are videos in which the lecturer provides an overview of a certain topic.
  • Tutorials, which are videos in which the lecturer shows how to implement the content from a certain Input in Python using iPython Notebook.
  • Programming Exercise, for which you have to program a classification system.
  • Project, for which either have to write a report or for which you have to present via Zoom.
  • Peer Feedback, for which your group has to provide detailed feedback to another project.

For meetings, we use Zoom. For file sharing, we use StudIP. For feedback (both from me and from peers), we use Conceptboard.

The parts marked in yellow are required to pass the course. The parts marked in red are required and graded.





Additional Material


Kick-off Meeting

Course Organization and Grading Criteria (live) [Video] [Slides]

Course Book: Introduction to Machine Learning with Python: A Guide for Data Scientists by Müller and Guido (throughout the course!)


Introduction to Data Science [Video] [Slides]

Introduction to Python [Video] [Slides]

Tetiana Ivanova: How to become a Data Scientist in 6 months | PyData London 2016


Townhall Meeting (new date!)

Basic Statistics [Video] [Slides]

Introduction to Scientific Python [Video] [Slides]

How to use Python and iPython Notebook [Video]

Wes McKinney: pandas in 10 minutes | Walkthrough


Townhall Meeting (moved to 2021-04-26)

Classification Input [Video] [Slides]

Classification Tutorial [Video] [Notebook]

Regression Input [Video] [Slides]

Regression Tutorial [Video] [Slides]

Jake VanderPlas: Machine Learning with Scikit Learn

Andreas Müller: Machine Learning with scikit-learn


Kick-off Project

Course Project Expectations (live)

Group Forming (live)

Classification Exercise (2021-05-10) [Dataset]

Deb Roy: The birth of a word

David Kriesel: SpiegelMining – Reverse Engineering von Spiegel-Online (33c3) [German Version]

David Kriesel: BahnMining - Pünktlichkeit ist eine Zier [German Version]


Clustering [Video] [Slides]

Dimensionality Reduction [Video] [Slides]

Clustering & Dimensionality Reduction Exercise [Video] [Notebook]

Topic Modelling [Video] [Slides]

Hendrik Heuer: Data Science for Digital Humanities: Extracting meaning from Images and Text

Matti Lyra: Evaluating Topic Models


Exposé Workshop (2021-05-27 9:00, check StudIP)

Exploratory Data Analysis Input [Video] [Slides]

Exploratory Data Analysis Tutorial [Video] [Notebook]

Evaluation Metrics [Video] [Slides]

Daniel Chen: Cleaning and Tidying Data in Pandas


Session on Machine Learning in a Digital Society

Fairness, Accountability, Transparency & Ethics in Machine Learning [Video] [Slides]

Hendrik Heuer: Rage Against The Machine Learning - Auditing YouTube and Others


Exposé Presentation

Yoshua Bengio: The Rise of Artificial Intelligence through Deep Learning

Tariq Rashid: A Gentle Introduction to Neural Networks and making your own with Python

Li et al. (Stanford CS231N): Introduction to Neural Networks


Project Workshop

Natural Language Processing [Video] [Slides]

Computer Vision [Video] [Slides]

Exposé Peer Feedback (2021-06-16)

Li et al. (Stanford CS231N): Convolutional Neural Networks

Li et al. (Stanford CS231N): Recurrent Neural Networks


Progress Presentation

Data Science and Machine Learning Questions (2021-06-23)


Interview with an ML expert


Open Workshop

Progress Peer Feedback (2021-06-28)


Final Presentation  (Project Day)

Final Report

Learning Outcomes

  • Programming Skills

    You learn how to train machine learning systems using Python and the libraries numpy, scipy, and scikit-learn.

  • Real Project

    You will work on a real data science project where you apply machine learning to answer your research questions.

  • Presentation and Report

    You will learn how to present your research findings both in videos and in a written report.