This is a simple demonstration of Apache Airflow hosted on heroku
This project implements a simple DAG that fetches the top questions from stackoverflow with the tag "airflow" and forwards to a specified email address
Actually this is over engineered and can be done with a simple cronjob or a simple .py script but this a simple project I used to learn apache airflow
To get started a basic knowledge of apache airflow, Heroku cli , AWS S3 bucket and python is required
-
Option 1
- 🍴 Fork this repo!
-
Option 2
- 👯 Clone this repo to your local machine using
https://github.com/jaywonder20/apache_airflow_basics.git
- 👯 Clone this repo to your local machine using
- Create heroku app and add postgreSql Add-on 🔨🔨🔨
necessary configuration for heroku app
Set the following from Heroku CLi
heroku config:set AIRFLOW_HOME=/app
set environment variables
set AIRFLOW__CORE__SQL_ALCHEMY_CONN in .profile to your postgreSql connection string
Heroku will automatically export .profile to the env on dyno start up. This way if/when your DB URL changes, it will automatically update.
- NB: To prevent error during configuration change the "dags_folder" in the airflow.cfg file to a non existent folder to prevent error as the airflow instance is not configured yet
- push app to heroku
Now some configuration
configure the following in the airflow.cfg file
sql_alchemy_conn= postgress db uri
smtp_user [email protected]
smtp_password =password
smtp_port = 587
create s3 bucket and get key https://preventdirectaccess.com/docs/amazon-s3-quick-start-guide/
set the following connection parameters:
s3_connection
postgres_default
- Create a Stackoverflow app
- Set the parameters in the variables.json file
- import variables.json file into variables from the airflow UI
- Run the dag from the airflow UI (The dag runs sucessfully and sends the mail to the specified email address)
secure your account
secure the app by adding an extra environment variables to the .profile file.
export AIRFLOW__WEBSERVER__AUTHENTICATE=True
export AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.contrib.auth.backends.password_auth
Open heroku bash with the Command
heroku run bash
Start python on the heroku bash and type (you know i mean copy right) the following commands as also described in Airflow’s official Documentation.
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'new_user_name'
>>> user.email = '[email protected]'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
If everything went well, you should be able to see this screen in your browser:
#####Proceed to modify DAG for further customization
Reach out to me at one of the following places!
- Website at
jaywonder20.netlify.app
- Twitter at
@jaywonder20
- Linkedin at
linkedin.com/in/jaywonder20
- MIT license
- Copyright 2020 © Jaywonder20.