Course Content

From medical decision support systems to automatic language translation, from sorting and prioritizing news on social networks to autonomous cars: Machine learning is woven into the fabric of daily life. Applying machine learning, data science aims to extract knowledge or insights from data.

The class will provide an introduction to data science and applied machine learning. For this, the programming language Python will be used (and taught). You will learn about the difference between supervised and unsupervised machine learning, and four machine learning tasks:

  1. Classification (e.g. k-NN, Decision Trees, Support Vector Machines)
  2. Regression (Linear Regression, Logistic Regression)
  3. Clustering (k-means)
  4. Dimensionality Reduction (PCA, t-SNE)

We will explore natural language processing for text mining and computer vision. Exploratory data analysis and evaluation, as an integral part of data science, will also be taught.

To succeed in this course, you have to watch the videos, do the exercises and applications, and work on your own project. Remember that these videos are not full-fledged lectures, they are a starting point for your own learning. Use material like the coursebook to learn more about the topics as we progress in the course.

This is a blended learning course. This means that we combine videos that we recorded with live tutorials in which you will be working on a real machine learning project.

All sessions in June and July will be done via Zoom. Whether we will meet in person in April or May will depend on the number of COVID-19 cases and the risk we (you as the students and us as instructors) are willing to take. We will survey you anonymously on your stance on these issues.

We will meet regularly (either in person or via Zoom), but most of the input will be provided as videos. This allows you to rewatch videos, watch them at different speeds, and discuss the videos with each other.


Below you find the schedule of the course. Note that there are different kinds of items.

  • Meetings, where we as a group will video chat via Zoom.
  • Input are videos in which the lecturer provides an overview of a certain topic.
  • Tutorials are videos in which the lecturer shows how to implement the content from a certain Input in Python using iPython Notebook.
  • Programming Exercise, for which you have to program a classification system.
  • Project, for which you either have to write a report or for which you have to present via Zoom.

For meetings, we use Zoom. For file sharing, we use StudIP. For feedback (both from me and peers), we use Conceptboard.

Based on the poll that we conducted with all registered students, we decided to do the course fully online. Zoom links will be shared on StudIP.

The parts marked in yellow are required to pass the course. The parts marked in red are required and graded.





Additional Material

25.04.22 Kick-off Meeting
(online via Zoom)
Introduction to Data Science [Video][Slides]

Introduction to Python [Video] [Slides]
Course Book: Introduction to Machine Learning with Python: A Guide for Data Scientists by Müller and Guido (throughout the course!)

Tetiana Ivanova: How to become a Data Scientist in 6 months | PyData London 2016
02.05.22 -- no meeting -- Basic Statistics [Video] [Slides]

Introduction to Scientific Python [Video][Slides]

How to use Python and iPython Notebook [Video]
Wes McKinney: pandas in 10 minutes | Walkthrough
09.05.22 Kick-off Project & Group Forming
(online via Zoom)
Classification Input [Video] [Slides]

Classification Tutorial [Video] [Notebook]

Regression Input [Video] [Slides]

Regression Tutorial [Video] [Slides]
Jake VanderPlas: Machine Learning with Scikit Learn

Andreas Müller: Machine Learning with scikit-learn

Deb Roy: The birth of a word

David Kriesel: SpiegelMining – Reverse Engineering von Spiegel-Online (33c3) [German Version]

David Kriesel: BahnMining - Pünktlichkeit ist eine Zier [German Version]
16.05.22 Exposé Workshop
(online via Zoom)
Clustering [Video] [Slides]

Dimensionality Reduction [Video] [Slides]

Clustering & Dimensionality Reduction Exercise [Video] [Notebook]

Topic Modelling [Video] [Slides]
Classification Exercise (2022-05-16 23:59) [Dataset] Hendrik Heuer: Data Science for Digital Humanities: Extracting meaning from Images and Text

Matti Lyra: Evaluating Topic Models
23.05.22 Exploratory Data Analysis
(online via Zoom)
Exploratory Data Analysis Input [Video] [Slides]

Exploratory Data Analysis Tutorial [Video] [Notebook]

Evaluation Metrics [Video] [Slides]
Daniel Chen: Cleaning and Tidying Data in Pandas
30.05.22 Exposé Presentation
(online via Zoom)
Exposé Presentation Upload (2022-05-30 23:59) Yoshua Bengio: The Rise of Artificial Intelligence through Deep Learning

Tariq Rashid: A Gentle Introduction to Neural Networks and making your own with Python

Li et al. (Stanford CS231N): Introduction to Neural Networks
13.06.22 Data Science in a Digital Society
(online via Zoom)
Fairness, Accountability, Transparency & Ethics in Machine Learning [Video] [Slides] Hendrik Heuer: Rage Against The Machine Learning - Auditing YouTube and Others
20.06.22 Progress Presentation
(online via Zoom)
Progress Presentation Upload (2022-06-22 23:59)
27.06.22 Advanced Topics in Natural Language Processing (Guest Lecture)
(online via Zoom)
Natural Language Processing [Video] [Slides] Li et al. (Stanford CS231N): Recurrent Neural Networks
04.07.22 Advanced Topics in Visualization (Guest Lecture)
(online via Zoom)
Computer Vision [Video] [Slides] Li et al. (Stanford CS231N): Convolutional Neural Networks
11.07.22 Project Workshop
(online via Zoom)
18.07.22 Final Presentation
(online via Zoom)
Final Presentation Upload (2022-07-18 23:59)
05.08.22 Final Report Final Report (2022-08-05 23:59)

Learning Outcomes

  • Programming Skills

    You learn how to train machine learning systems using Python and the libraries numpy, scipy, and scikit-learn.

  • Real Project

    You will work on a real data science project where you apply machine learning to answer your research questions.

  • Presentation and Report

    You will learn how to present your research findings both in presentations and in a research report.