Replies: 4 comments 3 replies
-
This is a great breakdown. We've been putting a lot of effort into standardizing things in the AWS package and this will definitely help. This may be out of scope, or you may want to include it here as well: Another similar standardizing change we have in mind is to convert as many of the hooks as possible into "thin hooks". If a hook method just passes through to a boto call, then there's really no reason to have that method defined and it can/should be removed. For a good example of what we'd like to move towards, see how the |
Beta Was this translation helpful? Give feedback.
-
I agree with Dennis. This is a very good deep dive and research. Thanks for the effort. One first good step would also be to write a README about guidance/convention we want to follow in terms of style, naming, ... At least we would ask any new PR to follow these standards and for the legacy one, we can fix that iteratively |
Beta Was this translation helpful? Give feedback.
-
I converted it into a discussion. I think you can make some good set of consistency improvements for the amazon provider @ferruzzi @vincbeck @Taragolis. I propose to make a series of PRs from those ideas :) |
Beta Was this translation helpful? Give feedback.
-
Hello, this inconsistency is causing a problem for me and I think this is the right thread to post it? Please redirect me if I'm wrong. EmrBaseSensor can set "aws_conn_id" to change the region, but this is not supported as a templated field. This means if I do not know my region (aws_conn_id) at runtime, then there is no way for me to set it for any of the Sensors that inherit from that base class (I assume the same problem exists for the RDS, Sagemaker base operators, and possibly more). |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
main branch
Apache Airflow version
main (development)
Operating System
any
Deployment
Other
Deployment details
No response
What happened
I'm investigate amazon-provider and found that different operators/sensors and other components use different approach to do the same things
Operators/Sensors set hook during initialise (
__init__
)At that moment Operators/Sensors uses 4 different approach to get hook:
__init__
- which could cost use additional resources of scheduler/dag-processorget_hook
)hook
or similarexecute
/poke
methodI think we should avoid 1 and 2
List of components:
airflow.airflow.providers.amazon.aws.operators.batch.BatchOperator
- set during operator initialiseairflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator
- set None during operator initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.operators.ecs.EcsOperator
- set None during operator initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator
- set during operator initialiseairflow.airflow.providers.amazon.aws.sensors.batch.BatchSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.glue_catalog_partition.GlueCatalogPartitionSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor
- attributesquicksight_hook
andsts_hook
doesn't useairflow.airflow.providers.amazon.aws.operators.rds.RdsBaseSensor
- set during sensor initialiseairflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.s3.S3KeySensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor
- set None during sensor initialise, init hook byget_hook
airflow.airflow.providers.amazon.aws.sensors.step_function.StepFunctionExecutionSensor
- set None during sensor initialise, init hook byget_hook
region
vsregion_name
attributeAWSBaseHook expected
region_name
however some operator/sensors usesregion
.For consistency better rename to
region_name
with markregion
as deprecated fieldList of components:
airflow.airflow.providers.amazon.aws.operators.eks.EksCreateClusterOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksCreateNodegroupOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksCreateFargateProfileOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteClusterOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteNodegroupOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteFargateProfileOperator
airflow.airflow.providers.amazon.aws.operators.eks.EksPodOperator
airflow.airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperator
airflow.airflow.providers.amazon.aws.operators.quicksight.QuickSightCreateIngestionOperator
airflow.airflow.providers.amazon.aws.sensors.eks.EksClusterStateSensor
airflow.airflow.providers.amazon.aws.sensors.eks.EksFargateProfileStateSensor
airflow.airflow.providers.amazon.aws.sensors.eks.EksNodegroupStateSensor
No explicit set
region_name
Some components use region_name from connection, and doesn't have parameter/argument
region_name
Note: At that moment only glacier component, and some S3 operations non-regional, however even for this components better set region_name
List of components:
airflow.airflow.providers.amazon.aws.operators.athena.AthenaOperator
airflow.airflow.providers.amazon.aws.operators.aws_lambda.AwsLambdaInvokeFunctionOperator
airflow.airflow.providers.amazon.aws.operators.athena.CloudFormationCreateStackOperator
airflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator
airflow.airflow.providers.amazon.aws.operators.dms.DmsCreateTaskOperator
airflow.airflow.providers.amazon.aws.operators.dms.DmsDescribeTasksOperator
airflow.airflow.providers.amazon.aws.operators.dms.DmsStartTaskOperator
airflow.airflow.providers.amazon.aws.operators.dms.DmsStopTaskOperator
airflow.airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator
airflow.airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
airflow.airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator
airflow.airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator
airflow.airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperator
airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator
- and all dependenciesairflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperator
airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperator
airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperator
airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperator
airflow.airflow.providers.amazon.aws.operators.redshift_sql.RedshiftSQLOperator
airflow.airflow.providers.amazon.aws.operators.s3.S3DeleteBucketOperator
airflow.airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator
- and all dependenciesairflow.airflow.providers.amazon.aws.operators.sns.SnsPublishOperator
airflow.airflow.providers.amazon.aws.operators.sqs.SqsPublishOperator
airflow.airflow.providers.amazon.aws.operators.sqs.StepFunctionStartExecutionOperator
airflow.airflow.providers.amazon.aws.operators.step_function.StepFunctionStartExecutionOperator
airflow.airflow.providers.amazon.aws.sensors.athena.AthenaSensor
airflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationCreateStackSensor
- missing docstringairflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationDeleteStackSensor
- missing docstringairflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor
airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor
- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.glue.GlacierJobOperationSensor
airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor
airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor
airflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor
airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor
- and all dependenciesairflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor
airflow.airflow.providers.amazon.aws.sensors.sqs.StepFunctionExecutionSensor
airflow.airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator
airflow.airflow.providers.amazon.aws.transfers.hive_to_dynamodb.HiveToDynamoDBOperator
airflow.airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator
- redshift_region_name ???What you think should happen instead
Try to make some generic stuff by the same way It may help to changes/contributions in the futures.
How to reproduce
No response
Anything else
I do not create PR just because in single PR it will affect almost all sensors/operators
IMHO, It is better to implement in parts
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions