-
Notifications
You must be signed in to change notification settings - Fork 399
docs: Add guide about integrating Stagehand #1290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I had to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool tool and nice guide. I have just small comments about the CrawleeStagehandPage
wrapper
docs/guides/code_examples/playwright_crawler_stagehand/support_classes.py
Outdated
Show resolved
Hide resolved
docs/guides/code_examples/playwright_crawler_stagehand/browser_classes.py
Outdated
Show resolved
Hide resolved
docs/guides/code_examples/playwright_crawler_stagehand/support_classes.py
Show resolved
Hide resolved
docs/guides/code_examples/playwright_crawler_stagehand/support_classes.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job Max!
The integration itself is not as easy as I expected. Maybe this could show us the direction in which we could improve/simplify the browsers/Playwright-related interface.
And/or we could introduce a dedicated crawler to this directly in Crawlee, something like PlaywrightStagehandCrawler
. Then the guide could focus solely on its usage, showing how to use AI-based selectors for web scraping.
Let's further discuss it with @B4nan and maybe @janbuchar once they're back from their vacations.
@@ -0,0 +1,66 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can't use "Run on Apify" for these examples as it contains more than 1 file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right.
And in one file, it would look very cumbersome
I think the integration comes out more complicated because of the current I hope that they will improve their API and then the guide can be simplified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, just some minor things to address at will.
|
||
self._total_opened_pages += 1 | ||
|
||
# Wrap StagehandPage to provide Playwright Page interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems inaccurate
pw_page = page._page # noqa: SLF001 | ||
|
||
# Handle page close event | ||
pw_page.on(event='close', f=self._on_page_close) | ||
|
||
# Update internal state | ||
self._pages.append(pw_page) | ||
self._last_page_opened_at = datetime.now(timezone.utc) | ||
|
||
self._total_opened_pages += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of code copied over from PlaywrightBrowserController
, isn't it? Any chance we could improve the PlaywrightBrowserController
internal API so that integrating libraries that extend Playwright is easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps context creation should be put into a separate public method. As well as updating states. That would make the same thing a bit cleaner.
But I would say that the main problem with this integration is that you have to do for example, this - pw_page = page._page
.
docs/guides/code_examples/playwright_crawler_stagehand/stagehand_run.py
Outdated
Show resolved
Hide resolved
…nd_run.py Co-authored-by: Jan Buchar <[email protected]>
Description
stagehand-python
v.0.4.0Issues