Skip to content

Commit

Permalink
Explain how to use uv with airflow virtualenv and make it works (#43604)
Browse files Browse the repository at this point in the history
Since we are switching to ``uv`` as our main development tooling,
we should explain how to use ``uv`` with airflow and explain some
basic commands that should be used to have a workign uv-manaaged venv.

This documentation explains some why's and initial hows with uv,
also it fixes uv to work on macos with some default extras - such
as devel, devel-tests and --all-extras, so that it works on a wider
range of systems (includin MacOS). This includes making plyvel not
installed on MacOS, because it's next to impossible to compile
levelDB on a modern MacOS Operating system and it is anyway an
optional component of google provider.

Fixes: #43200
  • Loading branch information
potiuk authored Nov 2, 2024
1 parent 17e5100 commit 229c6a3
Show file tree
Hide file tree
Showing 12 changed files with 210 additions and 196 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,6 @@ licenses/LICENSES-ui.txt

# airflow-build-dockerfile and correconding ignore file
airflow-build-dockerfile*

# Temporary ignore uv.lock until we integrate it fully in our constraint preparation mechanism
/uv.lock
10 changes: 9 additions & 1 deletion airflow/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,15 @@ def configure_adapters():

if SQL_ALCHEMY_CONN.startswith("mysql"):
try:
import MySQLdb.converters
try:
import MySQLdb.converters
except ImportError:
raise RuntimeError(
"You do not have `mysqlclient` package installed. "
"Please install it with `pip install mysqlclient` and make sure you have system "
"mysql libraries installed, as well as well as `pkg-config` system package "
"installed in case you see compilation error during installation."
)

MySQLdb.converters.conversions[Pendulum] = MySQLdb.converters.DateTime2literal
except ImportError:
Expand Down
4 changes: 3 additions & 1 deletion airflow/utils/dot_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@
import graphviz
except ImportError:
warnings.warn(
"Could not import graphviz. Rendering graph to the graphical format will not be possible.",
"Could not import graphviz. Rendering graph to the graphical format will not be possible. \n"
"You might need to install the graphviz package and necessary system packages.\n"
"Run `pip install graphviz` to attempt to install it.",
UserWarning,
stacklevel=2,
)
Expand Down
300 changes: 117 additions & 183 deletions contributing-docs/07_local_virtualenv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,8 @@ That's why we recommend using local virtualenv for development and testing.

**The outline for this document in GitHub is available at top-right corner button (with 3-dots and 3 lines).**

Installation in local virtualenv
--------------------------------

Required Software Packages
..........................
--------------------------

Use system-level package managers like yum, apt-get for Linux, or
Homebrew for macOS to install required software packages:
Expand All @@ -42,8 +39,12 @@ Homebrew for macOS to install required software packages:
* libxml
* helm (only for helm chart tests)

Refer to the `Dockerfile.ci <../Dockerfile.ci>`__ for a comprehensive list
of required packages.
There are also sometimes other system level packages needed to install python packages - especially
those that are coming from providers. For example you might need to install ``pkgconf`` to be able to
install ``mysqlclient`` package for ``mysql`` provider . Or you might need to install ``graphviz`` to be able to install
``devel`` extra bundle.

Please refer to the `Dockerfile.ci <../Dockerfile.ci>`__ for a comprehensive list of required packages.

.. note::

Expand All @@ -61,26 +62,114 @@ of required packages.
released wheel packages.


Installing Airflow
..................
Creating and maintaining local virtualenv with uv
-------------------------------------------------

As of November 2024 we are recommending to use ``uv`` for local virtualenv management for Airflow development.
The ``uv`` utility is a build frontend tool that is designed to manage python, virtualenvs and workspaces for development
and testing of Python projects. It is a modern tool that is designed to work with PEP 517/518 compliant projects
and it is much faster than "reference" ``pip`` tool. It has extensive support to not only create development
environment but also to manage python versions, development environments, workspaces and Python tools used
to develop Airflow (via ``uv tool`` command - such as ``pre-commit`` and others, you can also use ``uv tool``
to install ``breeze`` - containerized development environment for Airflow that we use to reproduce the
CI environment locally and to run release-management and certain development tasks.

You can read more about ``uv`` in `UV Getting started <https://docs.astral.sh/uv/getting-started/>`_ but
below you will find a few typical steps to get you started with ``uv``.

Installing uv
.............

You can follow the `installation instructions <https://docs.astral.sh/uv/getting-started/installation/>`_ to install
``uv`` on your system. Once you have ``uv`` installed, you can do all the environment preparation tasks using
``uv`` commands.

Installing Python versions
..........................

You can install Python versions using ``uv python install`` command. For example, to install Python 3.9.7, you can run:

.. code:: bash
uv python install 3.9.7
This is optional step - ``uv`` will automatically install the Python version you need when you create a virtualenv.

Creating virtualenvs with uv
............................

.. code:: bash
uv venv
This will create a default venv in your project's ``.venv`` directory. You can also create a venv
with a specific Python version by running:

.. code:: bash
uv venv --python 3.9.7
The simplest way to install Airflow in local virtualenv is to use ``pip``:
You can also create a venv with a different venv directory name by running:

.. code:: bash
pip install -e ".[devel,<OTHER EXTRAS>]" # for example: pip install -e ".[devel,google,postgres]"
uv venv .my-venv
However ``uv`` creation/re-creation of venvs is so fast that you can easily create and delete venvs as needed.
So usually you do not need to have more than one venv and recreate it as needed - for example when you
need to change the python version.

Syncing project (including providers) with uv
.............................................

In a project like airflow it's important to have a consistent set of dependencies across all developers.
You can use ``uv sync`` to install dependencies from ``pyproject.toml`` file. This will install all dependencies
from the ``pyproject.toml`` file in the current directory.

.. code:: bash
uv sync
If you also need to install development and provider dependencies you can specify extras for that providers:

.. code:: bash
uv sync --extra devel --extra devel-tests --extra google
This will synchronize all extras that you need for development and testing of Airflow and google provider
dependencies - including their runtime dependencies.

.. code:: bash
uv sync --all-extras
This will synchronize all extras of airflow (this might require some system dependencies to be installed).


Creating and installing airflow with other build-frontends
----------------------------------------------------------

While ``uv`` uses ``workspace`` feature to synchronize both Airflow and Providers in a single sync
command, you can still use other frontend tools (such as ``pip``) to install Airflow and Providers
and to develop them without relying on ``sync`` and ``workspace`` features of ``uv``. Below chapters
describe how to do it with ``pip``.

Installing Airflow with pip
...........................

Since Airflow follows the standards define by the packaging community, we are not bound with
``uv`` as the only tool to manage virtualenvs - and you can use any other compliant frontends to install
airflow for development. The standard way of installing environment with dependencies necessary to
run tests is to use ``pip`` to install airflow dependencies:

.. code:: bash
pip install -e ".[devel,devel-tests,<OTHER EXTRAS>]" # for example: pip install -e ".[devel,devel-tests,google,postgres]"
This will install Airflow in 'editable' mode - where sources of Airflow are taken directly from the source
code rather than moved to the installation directory. You need to run this command in the virtualenv you
want to install Airflow in - and you need to have the virtualenv activated.

While you can use any virtualenv manager, we recommend using `Hatch <https://hatch.pypa.io/latest/>`__
as your development environment front-end, and we already use Hatch backend ``hatchling`` for Airflow.

Hatchling is automatically installed when you build Airflow but since airflow build system uses
``PEP`` compliant ``pyproject.toml`` file, you can use any front-end build system that supports
``PEP 517`` and ``PEP 518``. You can also use ``pip`` to install Airflow in editable mode.

Extras (optional dependencies)
..............................

Expand Down Expand Up @@ -145,169 +234,6 @@ both runtime and development dependencies of the google provider.
The second one installs providers source code in development mode, so that modifications
to the code are automatically reflected in your installed virtualenv.

Using Hatch
-----------

Airflow uses `hatch <https://hatch.pypa.io/>`_ as a build and development tool of choice. It is one of popular
build tools and environment managers for Python, maintained by the Python Packaging Authority.
It is an optional tool that is only really needed when you want to build packages from sources, but
it is also very convenient to manage your Python versions and virtualenvs.

Airflow project contains some pre-defined virtualenv definitions in ``pyproject.toml`` that can be
easily used by hatch to create your local venvs. This is not necessary for you to develop and test
Airflow, but it is a convenient way to manage your local Python versions and virtualenvs.

Installing Hatch
................

You can install hatch using various other ways (including Gui installers).

Example using ``pipx``:

.. code:: bash
pipx install hatch
We recommend using ``pipx`` as you can manage installed Python apps easily and later use it
to upgrade ``hatch`` easily as needed with:

.. code:: bash
pipx upgrade hatch
Using Hatch to manage your Python versions
..........................................

You can also use hatch to install and manage airflow virtualenvs and development
environments. For example, you can install Python 3.10 with this command:

.. code:: bash
hatch python install 3.10
or install all Python versions that are used in Airflow:

.. code:: bash
hatch python install all
Manage your virtualenvs with Hatch
..................................

Airflow has some pre-defined virtualenvs that you can use to develop and test airflow.
You can see the list of available envs with:

.. code:: bash
hatch env show
This is what it shows currently:

+-------------+---------+---------------------------------------------------------------+
| Name | Type | Description |
+=============+=========+===============================================================+
| default | virtual | Default environment with Python 3.9 for maximum compatibility |
+-------------+---------+---------------------------------------------------------------+
| airflow-39 | virtual | Environment with Python 3.9. No devel installed. |
+-------------+---------+---------------------------------------------------------------+
| airflow-310 | virtual | Environment with Python 3.10. No devel installed. |
+-------------+---------+---------------------------------------------------------------+
| airflow-311 | virtual | Environment with Python 3.11. No devel installed |
+-------------+---------+---------------------------------------------------------------+
| airflow-312 | virtual | Environment with Python 3.12. No devel installed |
+-------------+---------+---------------------------------------------------------------+

The default env (if you have not used one explicitly) is ``default`` and it is a Python 3.9
virtualenv for maximum compatibility. You can install devel set of dependencies with it
by running:

.. code:: bash
pip install -e ".[devel]"
After entering the environment.

The other environments are just bare-bones Python virtualenvs with Airflow core requirements only,
without any extras installed and without any tools. They are much faster to create than the default
environment, and you can manually install either appropriate extras or directly tools that you need for
testing or development.

.. code:: bash
hatch env create
You can create specific environment by using them in create command:

.. code:: bash
hatch env create airflow-310
You can install extras in the environment by running pip command:

.. code:: bash
hatch -e airflow-310 run -- pip install -e ".[devel,google]"
And you can enter the environment with running a shell of your choice (for example zsh) where you
can run any commands

.. code:: bash
hatch -e airflow-310 shell
Once you are in the environment (indicated usually by updated prompt), you can just install
extra dependencies you need:

.. code:: bash
[~/airflow] [airflow-310] pip install -e ".[devel,google]"
You can also see where hatch created the virtualenvs and use it in your IDE or activate it manually:

.. code:: bash
hatch env find airflow-310
You will get path similar to:

.. code::
/Users/jarek/Library/Application Support/hatch/env/virtual/apache-airflow/TReRdyYt/apache-airflow
Then you will find ``python`` binary and ``activate`` script in the ``bin`` sub-folder of this directory and
you can configure your IDE to use this python virtualenv if you want to use that environment in your IDE.

You can also set default environment name by HATCH_ENV environment variable.

You can clean the env by running:

.. code:: bash
hatch env prune
More information about hatch can be found in `Hatch: Environments <https://hatch.pypa.io/latest/environment/>`__

Using Hatch to build your packages
..................................

You can use hatch to build installable package from the airflow sources. Such package will
include all metadata that is configured in ``pyproject.toml`` and will be installable with pip.

The packages will have pre-installed dependencies for providers that are always
installed when Airflow is installed from PyPI. By default both ``wheel`` and ``sdist`` packages are built.

.. code:: bash
hatch build
You can also build only ``wheel`` or ``sdist`` packages:

.. code:: bash
hatch build -t wheel
hatch build -t sdist

Local and Remote Debugging in IDE
---------------------------------
Expand Down Expand Up @@ -388,11 +314,11 @@ run the command above and commit the changes to ``pyproject.toml``. Then running
install the dependencies automatically when you create or switch to a development environment.


Installing recommended version of dependencies
----------------------------------------------
Installing "golden" version of dependencies
-------------------------------------------

Whatever virtualenv solution you use, when you want to make sure you are using the same
version of dependencies as in main, you can install recommended version of the dependencies by using
version of dependencies as in main, you can install recommended version of the dependencies by using pip:
constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as ``constraint`` file. This might be useful
to avoid "works-for-me" syndrome, where you use different version of dependencies than the ones
that are used in main, CI tests and by other contributors.
Expand All @@ -405,6 +331,14 @@ all basic devel requirements and requirements of google provider as last success
pip install -e ".[devel,google]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.9.txt"
Or with ``uv``:

.. code:: bash
uv pip install -e ".[devel,google]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.9.txt"
Make sure to use latest main for such installation, those constraints are "development constraints" and they
are refreshed several times a day to make sure they are up to date with the latest changes in the main branch.

Expand Down
2 changes: 1 addition & 1 deletion generated/provider_dependencies.json
Original file line number Diff line number Diff line change
Expand Up @@ -909,7 +909,7 @@
"apache-airflow-providers-common-sql>=1.17.0",
"apache-airflow>=2.8.0",
"mysql-connector-python>=8.0.29",
"mysqlclient>=1.4.0"
"mysqlclient>=1.4.0; sys_platform != 'darwin'"
],
"devel-deps": [],
"plugins": [],
Expand Down
Loading

0 comments on commit 229c6a3

Please sign in to comment.