As Data Scientists and Machine Learning experts spend a decent amount of time preprocessing, this topic is a necessary part in their toolkit.
In this training we specifically focus on the pandas library, which has grown into one of the main tools for data preprocessing and exploration in Python, with many capabilities.
We start off with an introduction to preprocessing, the concept of tidy data and some useful techniques such as pivoting and missing value imputation. Then, we go into the pandas library, its background, data structures, and basic features. In a demo we get to see concrete ways to handle data sets, from loading, subsetting, merging, etc. to (re)sampling, applying grouped transformations and saving results.
The training includes theory, demos, and hands-on exercises.
After this training you have gained knowledge about:
- The pandas library
- Data structures: dataframes, series
- Tidy data
- Loading and saving data
- Data exploration
- Plotting time series
- Useful transformation techniques
- Merging, selecting, sorting, sampling
- Missing value imputation
- Grouped operations
- Long/wide conversions
- Advantages and limitations of pandas