For tutorial:
https://www.youtube.com/watch?v=K9AnJ9_ZAXE&list=PLwFJcsJ61oujAqYpMp1kdUBcPG0sE0QMT
- What is
airflow
and why do weneed
it ?- Airflow is a
workflow orchestration platform
that allows users to programmatically create, schedule, and monitor workflows. It's often used to automate machine learning tasks and create complexdata pipelines
.
- Airflow is a
Here's a well-structured guide for setting up Apache Airflow in a virtual environment and using the current working directory (.
) as AIRFLOW_HOME
.
I'll also highlight the commands that should go in README.md
for easy reference.
This guide covers:
✅ Installing Airflow inside a Python virtual environment
✅ Using the current directory (.
) as AIRFLOW_HOME
✅ Running Airflow webserver and scheduler
✅ Managing DAGs and users
Ensure you have:
- Python 3.11 installed
- pip, venv, and other required system packages
- Enough disk space and proper permissions
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-distutils python3-pip -y
brew install [email protected]
- Download Python 3.11 from python.org.
- During installation, check the box: "Add Python to PATH".
- Open PowerShell and run:
python -m ensurepip
🚀 (Add These Commands to README.md)
# Navigate to your project directory
cd ~/your-project-folder # Change this to your actual folder
# Create and activate a virtual environment
python3.11 -m venv airflow-env
source airflow-env/bin/activate # Linux/macOS
airflow-env\Scripts\activate # Windows
# Verify Python version inside the virtual environment
python --version
🚀 (Add These Commands to README.md)
# Set Airflow to use the current directory
export AIRFLOW_HOME=$(pwd) # Linux/macOS
set AIRFLOW_HOME=%cd% # Windows
# Add this to your ~/.bashrc or ~/.zshrc to make it persistent
echo 'export AIRFLOW_HOME=$(pwd)' >> ~/.bashrc
source ~/.bashrc
🚀 (Add These Commands to README.md)
pip install --upgrade pip
pip install apache-airflow==2.7.1
✅ Verify Installation
airflow version
🚀 (Add These Commands to README.md)
airflow db init
This will create:
airflow.cfg
→ Airflow configuration fileairflow.db
→ SQLite database (for local use)
✅ Check if the files are created in the current directory:
ls -l | grep airflow
🚀 (Add These Commands to README.md)
# Start the Airflow web server (Runs on port 8080 by default)
airflow webserver --port 8080
# Open in browser: http://localhost:8080
# In a separate terminal, start the scheduler
airflow scheduler
🚀 (Add These Commands to README.md)
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email [email protected]
🛠 Now, log in to the Airflow UI at http://localhost:8080
with the admin credentials.
🚀 (Add These Commands to README.md)
mkdir -p dags
- Place your DAG Python files inside the
dags/
directory. - Example DAG (
dags/example_dag.py
):from airflow import DAG from airflow.operators.dummy import DummyOperator from datetime import datetime with DAG('example_dag', start_date=datetime(2024, 1, 1), schedule_interval="@daily") as dag: start = DummyOperator(task_id="start")
✅ Activate DAGs in UI:
- Start the scheduler:
airflow scheduler
- Enable the DAG in Airflow UI (
http://localhost:8080
).
🚀 (Add These Commands to README.md)
# Stop Airflow (Find and kill processes)
pkill -f "airflow webserver"
pkill -f "airflow scheduler"
# Deactivate virtual environment
deactivate
# Install dependencies
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-distutils python3-pip -y
# Create and activate virtual environment
python3.11 -m venv airflow-env
source airflow-env/bin/activate # (Linux/macOS)
airflow-env\Scripts\activate # (Windows)
# Set AIRFLOW_HOME to current directory
export AIRFLOW_HOME=$(pwd)
echo 'export AIRFLOW_HOME=$(pwd)' >> ~/.bashrc
source ~/.bashrc
# Install Apache Airflow
pip install --upgrade pip
pip install apache-airflow==2.7.1
# Initialize Airflow database
airflow db init
# Start Airflow webserver and scheduler (run in separate terminals)
airflow webserver --port 8080
airflow scheduler
# Create an admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email [email protected]
# Create DAGs directory
mkdir -p dags
# Stop Airflow and deactivate environment
pkill -f "airflow webserver"
pkill -f "airflow scheduler"
deactivate
WORKFLOW -> DAG(when to do what) -> TASK(what to do) -> OPERATOR (how to do)
DAG, tasks and operators