This training aims to give an overview of what Apache Airflow is, how it works, and how it can be used in practice.
First, we will discuss the core architecture components, including the metadata database, scheduler, executor and worker nodes and how they interact in a single and multi-node architecture. We then move to Airflow-specific concepts such as directed acyclical graphs (also referred to as DAGs), operators, tasks and the task lifecycle in a workflow. Finally, we discuss some special additional functionalities of Airflow, like hooks, connections and XComs.
After this theoretical overview, we gain hands-on experience in a two-part lab session. In the first lab, we set up a workflow to receive raw data from a client and write the cleaned data to a MySQL database, where we can then query the data to generate sales reports. For the second lab, we use Airflow hooks and connections to transfer the data from a MySQL database to a Postgres database.
The training includes theory, demos and hands-on exercises: After this training, you will have gained knowledge about:
- Various Airflow use cases and applications
- Architecture components: metadata database, scheduler, executor, workers
- Single vs. Multi-node architectures
- Directed Acyclical Graphs (DAGs)
- Tasks and the task lifecycle
- Lab session to get hands-on experience writing DAG files and taking advantage of the Airflow web UI for scheduling data workflows