-
Couldn't load subscription status.
- Fork 28
url-matcher integration with scrapy-poet #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
b4ac789
35e7876
a2902f5
327139e
d85766e
670715a
706e4ac
bf4e61b
c865c60
63029dc
5305da4
2d0c3bc
ce23923
0c94cf6
1f52f3b
17689b5
10ba139
da93452
e305751
dd2a302
0588105
0bc51b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,15 +41,15 @@ def to_item(self): | |
| class BPBookListPage(BookListPage): | ||
| """Logic to extract listings from pages like https://bookpage.com/reviews""" | ||
| def book_urls(self): | ||
| return self.css('.article-info a::attr(href)').getall() | ||
| return self.css('article.post h4 a::attr(href)').getall() | ||
|
|
||
|
|
||
| class BPBookPage(BookPage): | ||
| """Logic to extract from pages like https://bookpage.com/reviews/25879-laird-hunt-zorrie-fiction""" | ||
| def to_item(self): | ||
| return { | ||
| 'url': self.url, | ||
| 'name': self.css(".book-data h4::text").get().strip(), | ||
| 'name': self.css("body div > h1::text").get().strip(), | ||
| } | ||
|
|
||
|
|
||
|
|
@@ -58,16 +58,12 @@ class BooksSpider(scrapy.Spider): | |
| start_urls = ['http://books.toscrape.com/', 'https://bookpage.com/reviews'] | ||
| # Configuring different page objects pages for different domains | ||
| custom_settings = { | ||
| "SCRAPY_POET_OVERRIDES": { | ||
| "toscrape.com": { | ||
| BookListPage: BTSBookListPage, | ||
| BookPage: BTSBookPage | ||
| }, | ||
| "bookpage.com": { | ||
| BookListPage: BPBookListPage, | ||
| BookPage: BPBookPage | ||
| }, | ||
| } | ||
| "SCRAPY_POET_OVERRIDES": [ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should provide an example with handle_urls decorator There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A good point! Added such an example in 0bc51b8. |
||
| ("toscrape.com", BTSBookListPage, BookListPage), | ||
| ("toscrape.com", BTSBookPage, BookPage), | ||
| ("bookpage.com", BPBookListPage, BookListPage), | ||
| ("bookpage.com", BPBookPage, BookPage) | ||
BurnzZ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ] | ||
| } | ||
|
|
||
| def parse(self, response, page: BookListPage): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.