KB: Airflow DAG
An Airflow DAG (Directed Acyclic Graph) is the central component in Apache Airflow that defines a workflow or pipeline. It represents the sequence and dependencies of tasks to be executed. Each DAG is a Python script that contains the structure of the workflow, detailing the tasks (operators) and their dependencies.
DAGs:
Tasks inside DAGs:
Here's a breakdown of the main elements of an Airflow DAG:
1. Directed Acyclic Graph (DAG)
- Directed: It has a clear order where each task points to the next task.
- Acyclic: It cannot have loops, so tasks cannot depend on each other in a circular way.
- Graph: It represents tasks (nodes) and dependencies (edges) as a graph structure.
2. Tasks in a DAG
- Tasks are individual steps in a DAG and are represented by operators (like
PythonOperator,BashOperator, etc.). - Each task can perform specific actions, such as running scripts, transferring data, or checking conditions.
- Tasks are defined in the DAG and assigned dependencies to set the order in which they run.
3. Dependencies
- Tasks are connected with dependencies, indicating the order of execution.
- Airflow uses
>>and<<operators to set dependencies between tasks (e.g.,task1 >> task2means task1 must run before task2).
4. DAG Parameters
- dag_id: A unique identifier for the DAG.
- schedule_interval: Defines how often the DAG should run (e.g., hourly, daily, weekly).
- start_date: Sets the date and time when the DAG should first run.
- catchup: If
True, the DAG will "catch up" and run missed intervals if it’s behind schedule.
5. DAG Execution
- Airflow schedules and manages the execution of DAGs based on their schedule intervals.
- Each DAG run creates a DAG Run, which is a specific instance of the DAG with a particular execution date.
Example of a Simple DAG
In this example:
example_dagis a simple Airflow DAG that runs daily.hello_taskis a task that prints "Hello from Airflow!"
https://airflow.apache.org/docs/apache-airflow/stable/ui.html
Comments
Post a Comment