Should Airflow keep SQLAlchemy connection pooling enabled when using PgBouncer (transaction pooling mode)? #56890
Replies: 3 comments 1 reply
-
|
This is a very good question and I think we have no good answer for now - i.e. no evidences not enough benchmarks to give definite answer. Purely teorethically - PGBouncer might not be needed any more and could be completely replaced by connection pooling in Airflow. PGBouncer used to be badly needed when each of the workers connnected directly to the DB, but since this is not happening in Airfow 3 - only scheduler, triggerer, api-server, dag processor - and you can limit a number of connections by using the SQLAlchemy pools you mentioned. So theorethically you could just skip PGBouncer and connect directly to the DB if you use SQLAlchemy pools. That would lkely give a slight boost over using PGBouncer. But I have no evidences for it. |
Beta Was this translation helpful? Give feedback.
-
|
@potiuk After trying your suggestion and removing PgBouncer, the main issue we encountered was that it became difficult to centrally manage the total number of database connections. When using PgBouncer, we only needed to monitor the PgBouncer pool count. However, without it, we now have to manage and coordinate multiple factors — such as the number of API replicas, each API server’s internal pool size, and the number of schedulers — to control overall connection usage. |
Beta Was this translation helpful? Give feedback.
-
|
Hello @archsyscall, Are you still using PgBouncer with Airflow 3.2.x? I'm considering adding PgBouncer because my PostgreSQL database is using around 2 GB of memory, and pg_stat_activity shows 60+ open idle connections. Before adding that I'd like to understand how to determine or figure out the right configuration. How did you decide on the pool size for each Airflow component (scheduler, workers, webserver, dag processor, etc.)? Also, what approach did you use to determine the appropriate max_connections and PgBouncer pool settings? Any guidance on where to start or best practices for sizing the pools would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
We’re running Airflow 3.1.0 (CeleryExecutor) on Kubernetes, with an external PostgreSQL (RDS) metadata database sitting behind PgBouncer configured in transaction pooling mode.
The issue we’re seeing is that when triggering very large DAGs with many tasks, RDS Performance Insights shows high AAS (Average Active Sessions) and frequent Client:ClientWrite wait events.
Interestingly, when we reduced the PgBouncer pool size, the AAS spikes decreased, and the database became noticeably more stable.
This led us to wonder:
Since PgBouncer is already handling connection pooling,
does it still make sense for Airflow to maintain its own SQLAlchemy connection pool?
In other words:
When using PgBouncer in transaction pooling mode, is it better to disable Airflow’s internal SQLAlchemy pool (AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_ENABLED=False) and rely entirely on PgBouncer for connection management?
Or does keeping a small SQLAlchemy pool still provide any meaningful benefit?
Beta Was this translation helpful? Give feedback.
All reactions