Receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.
This course is a deep dive into streaming technologies used for real-time processing applications. The lessons that are presented here focus on Kafka, which provides a scalable solution for decoupling data streams, Spark streaming structured data model, Airflow for scheduling and Data architectures (Lambda and Kappa). After this course, you will be able to design high-quality streaming applications such as processing raw data and write the cleaned data to a MySQL database or transfer the data from a MySQL database to a Postgres database. This course is ideal for data engineers who want to master streaming applications. As requirements, experience with programming languages such as python or java and Spark are required.
Are you interested? Contact us and we will get in touch with you.
The Streaming Big Data training is split in 4 days. Click below to see a detailed description of each class:
In this training you will be introduced to Spark’s structured streaming APIs. Participants are introduced to streaming concepts such as event time, late data, windowing, and watermarking. During the practical session participants will solve several streaming queries regarding order (sales) data using Spark and Kafka.
The training includes theory, demos, and hands-on exercises. After this training you will have gained knowledge about:
The Kafka training aims to provide an overview of the Apache platform. Participants will learn about Kafka terminology and how Kafka provides a scalable solution for decoupling data streams. Topics such as partitioning and message guarantees will be addressed. During the practical session participants will use a Dockerized Kafka broker to explore basic consuming and producing followed up by a more complex change data capture (CDC) scenario.
The training introduces Kafka concepts and theory followed up by hands-on exercises. After this training you will have gained knowledge about:
Learn how to setup different (big) data architectures and the design principles behind them and the trade-offs between them. This lesson explores the Lambda and Kappa Architectures and lets students build a small scale prototype for each.
This training aims to give an overview of what Apache Airflow is, how it works, and how it can be used in practice.
First, we will discuss the core architecture components, including the metadata database, scheduler, executor and worker nodes and how they interact in a single and multi-node architecture. We then move to Airflow-specific concepts such as directed acyclical graphs (also referred to as DAGs), operators, tasks and the task lifecycle in a workflow. Finally, we discuss some special additional functionalities of Airflow, like hooks, connections and XComs.
After this theoretical overview, we gain hands-on experience in a two-part lab session. In the first lab, we set up a workflow to receive raw data from a client and write the cleaned data to a MySQL database, where we can then query the data to generate sales reports. For the second lab, we use Airflow hooks and connections to transfer the data from a MySQL database to a Postgres database.
The training includes theory, demos and hands-on exercises: After this training, you will have gained knowledge about:
info.nl@eraneos.com
KvK nr. 02070702
BTWnr. NL813364103B01
Terms and Conditions
Privacy Statement (English)
Privacy Verklaring (Dutch)