Introduction to Data Science with Python

Author

Thilo Kraft & Jan Bischoff

Published

July 28, 2023

Welcome

Welcome to the course website for

Introduction to Data Science with Python

offered for GRADE Brain and other GRADE Centers at Goethe University in July 2023. This website serves as the central repository for all course materials. Here, you will find all slides, lecture materials, and links to your online development environment.

Course Objective

Data analysis plays a critical role in many academic disciplines, and the Python programming language has become one of the standard tools within the Data Science community. This course will introduce programming with Python and how to use it for data analysis. After successfully completing this course, you will be able to understand the fundamentals of the Python programming language. This skill set includes basic data analysis by data wrangling, data visualization, and implementing simple statistical models in Python.

Our goal is to show you the scope of possibilities within Python and leave you with the impression that you can confidently implement your own empirical projects in Python.

Course Description

This course aims at Python beginners. Hence, we will cover the fundamentals of programming and Python, such as variables, loops, and logic statements, before we dive into the topic of Data Science. This course will not cover deeper statistical or theoretical concepts as we focus on applied coding.

This course introduces:

  • Syntax and basics of Python
  • How to use Notebooks as a development environment
  • Data analysis, data wrangling, and data visualization using numpy, pandas and matplotlib
  • Introduction to implementing simple statistical models in Python with scikit-learn

Course Organization and Schedule

We will meet on July 28, 2023.

This workshop alternates between lecture-style presentations and application exercises. We aim to adhere to the following schedule:

Part Content Date Time
1 Course Introduction and Case Study 28.07.2023 09:00 - 10:00
2 Fundamentals of Python 28.07.2023 10:00 - 12:00
3 Data wrangling and visualizations using pandas and matplotlib 28.07.2023 13:00 - 15:00
4 Advanced visualizations with seaborn, modeling with scikit-learn 28.07.2023 15:00 - 17:00

There will be a lunch break after the second part of approximately 60 minutes and additional small breaks after each part.

We want you to make your hands dirty — that means we want you to code! Just following along fancy slides won’t magically transfer the skill of coding to you. But you actively engaging with the course content in your development environment will more likely do just that.

That’s why we need you to prepare accordingly: Please ensure that you have access to Google Colab before the course. We will use Google Colab for the coding parts, such that we can use Python without (sometimes time-consuming) pre-configuration or installation on your machine. To use Google colab, you need a Google account (same account which is used for Gmail, YouTube, etc.)

If you have any questions, please reach out to one of us through the e-mail addresses on the bottom of this page.

ITrainers

Feel free to reach out to us by e-mail if you have any questions before, during, or after the course:

  • Thilo Kraft, Ph.D. Student in Quantitative Marketing, send me an email

  • Jan Bischoff, Course Designer and Teacher at TechAcademy e.V., Business and Economics Student, send me an email

Acknowledements

We gratefully acknowledge funding through the project DigiTeLL at Goethe University Frankfurt (partnership Coding Intro).

Thanks to Felix Schneider (Github), who initially developed this course and granted permission adapt and use the course materials.

The case study used as motivation is built upon material from datasciencebox.org by Mine Çetinkaya-Rundel.