In the second Apache Spark training you will be introduced to Spark’s Dataframe API and Spark SQL. These APIs are optimized for dealing with structured data, tabular data, and allow SQL access to very large datasets. During the practical session participants will be introduced to the APIs and then work on analyzing MovieLens dataset using Spark SQL.
The training includes theory, demos, and hands-on exercises.
After this training you will have gained knowledge on:
- Spark’s Dataframe API
- Spark SQL
- The Parquet storage format