-
Notifications
You must be signed in to change notification settings - Fork 17
handle_urls decorator using a new PageObjectRegistry #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 15 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
a5a8f42
meta module
ivanprado ec80b69
CMD for listing overrides
ivanprado 308bd1d
Refactoring with better names and structures and meta inclusion
ivanprado aa8000d
docstring
ivanprado a2d5cb6
Fix url_matcher dep
ivanprado 1f1f410
Fix CI tests
ivanprado bdb8987
Make mypy happy again
ivanprado a3e3eea
Documentation fixed
ivanprado ef9945b
Minor changes
ivanprado f6fdac4
url-matcher has now been released.
ivanprado b050d01
Merge branch 'master' into handle_urls
BurnzZ ba52ce0
add entry point for CLI command
BurnzZ ba61626
fix import which fails tests
BurnzZ f5cffef
refactor namespace to be classes instead
BurnzZ c3579b9
fix failing mypy tests after refactoring
BurnzZ 234b8d9
Merge branch 'master' of github.com:scrapinghub/web-poet into handle_…
BurnzZ 531752f
update tests to improve coverage
BurnzZ 7495b58
add missing import for find_page_object_overrides
BurnzZ 0a0ee12
add docs for overrides
BurnzZ 46d40e7
refactor by removing the need for find_page_object_overrides()
BurnzZ 495642b
add docs about using multiple PageObjectRegistries
BurnzZ 75593ed
add docs regarding organizing Page Object Overrides
BurnzZ 0a2d779
update override docs to showcase url-matcher patterns
BurnzZ c000cbc
rename get_overrides_from_module into get_overrides_from
BurnzZ 10dff5b
fix bug where module substring paths are not filtered out correctly
BurnzZ daa3ff9
create consume_modules() to properly load annotations in get_overrides()
BurnzZ 3b05c07
update get_overrides_from to accept an arbitrary number of str inputs
BurnzZ f626efc
add more warning docs to get_overrides() to use consume_modules()
BurnzZ bd3a88e
enable ease of combining external Page Object packages
BurnzZ 0cbeb0b
refactor get_overrides() to have a simpler interface with consume_mod…
BurnzZ de5563a
introduce concept of 'registry_pool' to access all PageObjectRegistry…
BurnzZ e7cca69
implement __hash__() in OverrideRule to easily identify uniqueness
BurnzZ eab277a
polish documentation with better examples and discussion
BurnzZ 38e56cd
add more tests when PageObjectRegistry is instantiated
BurnzZ bf0b3e5
update PageObjectRegistry API for manipulating rules from different r…
BurnzZ d5a5d75
update OverrideRule __hash__() implementation after url-matcher==0.2.…
BurnzZ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ | |
| author='Scrapinghub', | ||
| author_email='[email protected]', | ||
| url='https://github.com/scrapinghub/web-poet', | ||
| entry_points={'console_scripts': ['web_poet = web_poet.__main__:main']}, | ||
| packages=find_packages( | ||
| exclude=( | ||
| 'tests', | ||
|
|
@@ -22,6 +23,8 @@ | |
| install_requires=( | ||
| 'attrs', | ||
| 'parsel', | ||
| 'url-matcher', | ||
| 'tabulate', | ||
BurnzZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ), | ||
| classifiers=( | ||
| 'Development Status :: 2 - Pre-Alpha', | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| """ | ||
| This package is just for overrides testing purposes. | ||
| """ | ||
| from typing import Dict, Any, Callable | ||
|
|
||
| from url_matcher import Patterns | ||
|
|
||
| from web_poet import handle_urls, PageObjectRegistry | ||
|
|
||
|
|
||
| class POBase: | ||
| expected_overrides: Callable | ||
| expected_patterns: Patterns | ||
| expected_meta: Dict[str, Any] | ||
|
|
||
|
|
||
| class POTopLevelOverriden1: | ||
| ... | ||
|
|
||
|
|
||
| class POTopLevelOverriden2: | ||
| ... | ||
|
|
||
|
|
||
| secondary_registry = PageObjectRegistry(name="secondary") | ||
|
|
||
|
|
||
| # This first annotation is ignored. A single annotation per registry is allowed | ||
| @handle_urls("example.com", POTopLevelOverriden1) | ||
| @handle_urls("example.com", POTopLevelOverriden1, exclude="/*.jpg|", priority=300) | ||
| class POTopLevel1(POBase): | ||
| expected_overrides = POTopLevelOverriden1 | ||
| expected_patterns = Patterns(["example.com"], ["/*.jpg|"], priority=300) | ||
| expected_meta = {} # type: ignore | ||
|
|
||
|
|
||
| # The second annotation is for a different registry | ||
| @handle_urls("example.com", POTopLevelOverriden2) | ||
| @secondary_registry.handle_urls("example.org", POTopLevelOverriden2) | ||
| class POTopLevel2(POBase): | ||
| expected_overrides = POTopLevelOverriden2 | ||
| expected_patterns = Patterns(["example.com"]) | ||
| expected_meta = {} # type: ignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| from url_matcher import Patterns | ||
|
|
||
| from tests.po_lib import POBase | ||
| from web_poet import handle_urls | ||
|
|
||
|
|
||
| class POModuleOverriden: | ||
| ... | ||
|
|
||
|
|
||
| @handle_urls("example.com", overrides=POModuleOverriden, extra_arg="foo") | ||
| class POModule(POBase): | ||
| expected_overrides = POModuleOverriden | ||
| expected_patterns = Patterns(["example.com"]) | ||
| expected_meta = {"extra_arg": "foo"} # type: ignore | ||
|
|
Empty file.
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| from url_matcher import Patterns | ||
|
|
||
| from tests.po_lib import POBase | ||
| from web_poet import handle_urls | ||
|
|
||
|
|
||
| class PONestedPkgOverriden: | ||
| ... | ||
|
|
||
|
|
||
| @handle_urls(include=["example.com", "example.org"], exclude=["/*.jpg|"], overrides=PONestedPkgOverriden) | ||
| class PONestedPkg(POBase): | ||
| expected_overrides = PONestedPkgOverriden | ||
| expected_patterns = Patterns(["example.com", "example.org"], ["/*.jpg|"]) | ||
| expected_meta = {} # type: ignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| from url_matcher import Patterns | ||
|
|
||
| from tests.po_lib import POBase, secondary_registry | ||
| from web_poet import handle_urls | ||
|
|
||
|
|
||
| class PONestedModuleOverriden: | ||
| ... | ||
|
|
||
|
|
||
| class PONestedModuleOverridenSecondary: | ||
| ... | ||
|
|
||
|
|
||
| @handle_urls(include=["example.com", "example.org"], exclude=["/*.jpg|"], overrides=PONestedModuleOverriden) | ||
| @secondary_registry.handle_urls("example.com", PONestedModuleOverridenSecondary) | ||
| class PONestedModule(POBase): | ||
| expected_overrides = PONestedModuleOverriden | ||
| expected_patterns = Patterns(include=["example.com", "example.org"], exclude=["/*.jpg|"]) | ||
| expected_meta = {} # type: ignore | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| import pytest | ||
| from url_matcher import Patterns | ||
|
|
||
| from tests.po_lib import POTopLevel1, POTopLevel2, POTopLevelOverriden2 | ||
| from tests.po_lib.a_module import POModule | ||
| from tests.po_lib.nested_package import PONestedPkg | ||
| from tests.po_lib.nested_package.a_nested_module import PONestedModule, PONestedModuleOverridenSecondary | ||
| from web_poet.overrides import find_page_object_overrides | ||
|
|
||
|
|
||
| POS = {POTopLevel1, POTopLevel2, POModule, PONestedPkg, PONestedModule} | ||
|
|
||
|
|
||
| def test_list_page_objects_from_pkg(): | ||
| """Tests that metadata is extracted properly from the po_lib package""" | ||
| rules = find_page_object_overrides("tests.po_lib") | ||
| assert {po.use for po in rules} == POS | ||
|
|
||
| for rule in rules: | ||
| assert rule.instead_of == rule.use.expected_overrides, rule.use | ||
| assert rule.for_patterns == rule.use.expected_patterns, rule.use | ||
| assert rule.meta == rule.use.expected_meta, rule.use | ||
|
|
||
|
|
||
| def test_list_page_objects_from_module(): | ||
| rules = find_page_object_overrides("tests.po_lib.a_module") | ||
| assert len(rules) == 1 | ||
| rule = rules[0] | ||
| assert rule.use == POModule | ||
| assert rule.for_patterns == POModule.expected_patterns | ||
| assert rule.instead_of == POModule.expected_overrides | ||
|
|
||
|
|
||
| def test_list_page_objects_from_empty_module(): | ||
| rules = find_page_object_overrides("tests.po_lib.an_empty_module") | ||
| assert len(rules) == 0 | ||
|
|
||
|
|
||
| def test_list_page_objects_from_empty_pkg(): | ||
| rules = find_page_object_overrides("tests.po_lib.an_empty_package") | ||
| assert len(rules) == 0 | ||
|
|
||
|
|
||
| def test_list_page_objects_from_unknown_module(): | ||
| with pytest.raises(ImportError): | ||
| find_page_object_overrides("tests.po_lib.unknown_module") | ||
|
|
||
|
|
||
| def test_list_page_objects_from_imported_registry(): | ||
| rules = find_page_object_overrides("tests.po_lib", registry_name="secondary") | ||
| assert len(rules) == 2 | ||
| rule_for = {po.use: po for po in rules} | ||
|
|
||
| potop2 = rule_for[POTopLevel2] | ||
| assert potop2.for_patterns == Patterns(["example.org"]) | ||
| assert potop2.instead_of == POTopLevelOverriden2 | ||
|
|
||
| pones = rule_for[PONestedModule] | ||
| assert pones.for_patterns == Patterns(["example.com"]) | ||
| assert pones.instead_of == PONestedModuleOverridenSecondary | ||
|
|
||
|
|
||
| def test_list_page_objects_from_non_existing_registry(): | ||
| assert find_page_object_overrides("tests.po_lib", registry_name="not-exist") == [] | ||
|
|
||
|
|
||
| def test_cmd(): | ||
| from web_poet.__main__ import main | ||
| main(["tests.po_lib"]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,3 @@ | ||
| from .pages import WebPage, ItemPage, ItemWebPage, Injectable | ||
| from .page_inputs import ResponseData | ||
| from .page_inputs import ResponseData | ||
| from .overrides import handle_urls, PageObjectRegistry |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| import argparse | ||
| from typing import Callable | ||
|
|
||
| import tabulate | ||
|
|
||
| from web_poet.overrides import find_page_object_overrides | ||
|
|
||
|
|
||
| def qualified_name(cls: Callable) -> str: | ||
| return f"{cls.__module__}.{cls.__name__}" | ||
|
|
||
|
|
||
| def main(args=None): | ||
BurnzZ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| parser = argparse.ArgumentParser( | ||
| description="Tool that list the Page Object overrides from a package or module recursively" | ||
| ) | ||
| parser.add_argument( | ||
| "module", | ||
| metavar="PKG_OR_MODULE", | ||
| type=str, | ||
| help="A package or module to list overrides from", | ||
| ) | ||
| parser.add_argument( | ||
| "--registry", | ||
| "-n", | ||
| metavar="REGISTRY_NAME", | ||
| type=str, | ||
| help="Registry name to list overrides from", | ||
| default="default", | ||
| ) | ||
| args = parser.parse_args(args) | ||
| table = [ | ||
| ( | ||
| "Use this", | ||
| "instead of", | ||
| "for the URL patterns", | ||
| "except for the patterns", | ||
| "with priority", | ||
| "meta", | ||
| ) | ||
| ] | ||
| table += [ | ||
| ( | ||
| qualified_name(rule.use), | ||
| qualified_name(rule.instead_of), | ||
| rule.for_patterns.include, | ||
| rule.for_patterns.exclude, | ||
| rule.for_patterns.priority, | ||
| rule.meta, | ||
| ) | ||
| for rule in find_page_object_overrides(args.module, registry_name=args.registry) | ||
| ] | ||
| print(tabulate.tabulate(table, headers="firstrow")) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.