Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Amazon provider package #46498

Closed
wants to merge 8 commits into from

Conversation

o-nikolas
Copy link
Contributor

Provider tests pass but there are many issues (after fixing a handful of failures related to paths being used in tests which needed updating):

1) mypy failures:

Example:

providers/amazon/src/airflow/providers/amazon/aws/operators/bedrock.py:358: error:
Value of type "Optional[dict[str, Any]]" is not indexable  [index]
            if event["status"] != "success":

We have a helper validate_execute_complete_event(event) to check if the event is None, for this exact reason. Which was working perfectly fine before the migration but is now failing type checking locally for me on all operators.

2) docs

There are many _api docs which either seem out of date or have spelling errors that seem to already be fixed. I do not know how to generate these docs. I tried a docs build but that fails because all paths to system test examples are now incorrect. I'm not sure if this is something unique to amazon, or if the migration script failed (there were some issues related to the aws directory within the amazon provider).

3) there are many bits of code that have not been static checked for issues of spelling and words like "master". I've fixed most but some are in places that are difficult to change.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Feb 6, 2025

Yeah. Some problems are expected - especially for such huge providers as amazon. This is one of the reasons we did no move all providers at once, because rather than havin 10 people solving those things individually, we would have to have 1 person solving all 100 providers issues at once, we are basically crowd-sourcing problem solving.

But I see my role here as help to guide people (and I did that in other providers) how to solve those difficult issues. That's a good learning for everyone as well. Worked so far very well. and I am happy to guide you as well.

  1. mypy failures:

Yes. It's known that mypy is not fully deterministic and might or migh not find different errors when files are moved. Happened many times for others who moved other providers.

Seems that pretty much all the problems you see are the same -> the optional part of event is not automatically detected by MyPy now. Why exactly it has not detected it before - I think it's a good question for MyPy creators (basically at this stage we are waiting until astral will come up with their own type hint consistency checker, because MyPy is plagued by all kinds of those problems and we hit it many times

But it seems in this case (or at least when I googled it) setting allow_redefinition should solve the problem https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition -> this is one of the features of mypy that you can enable ot allow redefining type of variable. It might - of course - cause a ripple effect and MyPy complaining about something else, but it's worth trying if you do not want to fix the way things are implemented now. Because I personally think that redefining type of the variable like that is a bit fishy (that's why also it's not enabled by default by MyPY) - I would rather create another variable (coerced_event would be a good name) and use this one instead of event - but If you do not want to refactor it in all those places (should be relatively easy with IntelliJ or VSCode refactoring - just rename the variable with refactor and bring event back in the method signature and validate_execute_complete_event method call without refactoring. Yep. A bit "dull" change, and will take about 10-15 minutes of time to change it in all those places, but possibly it's worth it.

  1. docs
    There are many _api docs which either seem out of date or have spelling errors that seem to already be fixed. I do not know how to generate these docs. I tried a docs build but that fails because all paths to system test examples are now incorrect. I'm not sure if this is something unique to amazon, or if the migration script failed (there were some issues related to the aws directory within the amazon provider).

Unfortunately when autoapi generates API docs and fails, no API docs are generated and then you can have many unrelated errors about missing docs. Not much we can do about it.

This is how sphinx/autoapi works. The errors are likely coming from just a few failing docstring generation by autoapi - likely those last errors are the root cause - those classes are referred to in some docstrings you just need to fin them: tests.system.providers.amazon.aws.example_appflow_run.create_s3_to_s3_flow. The migration script does not handle all cases. It handled "most common cases" - 95% of those - simply those that could be easily automated, common for all operators and are not risky - does not require human to diagnose and fix those. The rest of fixes for providers have to be done manually. That's one of the tasks when migrating provider to identify and fix those remaing 5%.

If we could 100% migrate all providers, I would not ask others for help, I would just run the scripts myself. Unfortunately, with inter-relations between providers, doc generation and others, automating absolutely all cases would take likely far more effort of one person to track, identify and fix and especially automate those. The premise was that individuals looking at those will do it much faster (and in parallel or at different times) - also that their involvement in helping to move will make it more of a "community" effort. Which pretty much worked so far.

  1. there are many bits of code that have not been static checked for issues of spelling and words

Yep. We are well aware that things are not always fixable for those. There are exclusions in .pre-commit-config.yaml for thsoe. Likely those need to be moved. Like the exclusions should be updated - for example here: https://github.com/apache/airflow/blob/main/.pre-commit-config.yaml#L661 rather than the words removed

Other problems:

  • Breeze unit tests -> There are also some unit tests to fix in Breeze -> but those are mainly about changed path

  • Check if RST files use double backticks for code............................Failed -> looks like some of the api generated docs contain single backticks - which for some reason was excluded so far, but now we seem to see it. Those are easy to fix and I think it's good we started to detect those.

  • nohup.out is accidentally merged.

  • some system tests are failing - likely there are some imports that have not been covered by the migration script - but should be as easy as search/replace for those.

I hope we can get it green quickly :)

@potiuk
Copy link
Member

potiuk commented Feb 6, 2025

I think you need to rebase and resolve conflicts @o-nikolas - in order to trigger tests.

BTW. The mypy satisfying (I prefer that name ;) change looks good.

@o-nikolas
Copy link
Contributor Author

I think you need to rebase and resolve conflicts @o-nikolas - in order to trigger tests.

Agreed, I'm already working on the rebase but it's a bit of a nasty one! Lots of stuff touches the amazone provider package 😢

I have a busy day today as well, so I'm only making bits of progress here and there as I have time

@potiuk
Copy link
Member

potiuk commented Feb 6, 2025

I have a busy day today as well, so I'm only making bits of progress here and there as I have time

Yeah. Amazon is a beast :)

@o-nikolas o-nikolas force-pushed the onikolas/migrate_ampp branch from 9a97eaa to 62dfec2 Compare February 6, 2025 22:38
@o-nikolas
Copy link
Contributor Author

o-nikolas commented Feb 6, 2025

Updates so far:

  • I have fixed the original mypy issues and also many more that have cropped up (I'm not quite sure what's causing more to show up)
  • I have done a rebase which need some handholding
  • I have fixed most of the breeze tests (code not yet pushed). I have two left. It is tricky because I don't quite know how to run them locally, so cannot use a debugger.

@o-nikolas o-nikolas force-pushed the onikolas/migrate_ampp branch 2 times, most recently from 22d2d23 to b43c4d0 Compare February 7, 2025 21:26
@o-nikolas o-nikolas force-pushed the onikolas/migrate_ampp branch from b43c4d0 to a2944b5 Compare February 7, 2025 21:56
@potiuk
Copy link
Member

potiuk commented Feb 8, 2025

That looks way better now :) -> only small/easy things seems to be remaining :)

@potiuk
Copy link
Member

potiuk commented Feb 8, 2025

I have fixed most of the breeze tests (code not yet pushed). I have two left. It is tricky because I don't quite know how to run them locally, so cannot use a debugger.

Super easy: breeze is just standard Python package with pyproject.toml etc.

  1. cd ./dev/breeze/
  2. Run uv sync
  3. Use the .venv generated there as your project's Python venv
  4. Mark src directory (if you are using intellij) as sources and tests as tests
  5. Run tests same way as other tests
  6. CLI is uv run pytest tests when you are in dev/breeze

It's absolutley 100% standard Python project - but just placed in dev/breeze folder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the generated RST files are checked-in by accident? I would assume they are created by Sphinx during build from code usually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I just removed it in #46590

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to see if I can fix it now :) - those errors seemed easy to fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking it on!

@o-nikolas
Copy link
Contributor Author

I have fixed most of the breeze tests (code not yet pushed). I have two left. It is tricky because I don't quite know how to run them locally, so cannot use a debugger.

Super easy: breeze is just standard Python package with pyproject.toml etc.

  1. cd ./dev/breeze/
  2. Run uv sync
  3. Use the .venv generated there as your project's Python venv
  4. Mark src directory (if you are using intellij) as sources and tests as tests
  5. Run tests same way as other tests
  6. CLI is uv run pytest tests when you are in dev/breeze

It's absolutley 100% standard Python project - but just placed in dev/breeze folder

Thanks! I ended up figuring out a similar way of doing it after a little while, accelerated a lot of debugging!

@potiuk
Copy link
Member

potiuk commented Feb 9, 2025

Closing in favour of #46590 -> that was started from this one

@potiuk potiuk closed this Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants