Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UC Mode updates and refactoring #2882

Merged
merged 7 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 47 additions & 26 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,34 @@
# SeleniumBase Docker Image
FROM ubuntu:22.04
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

#======================
# Locale Configuration
#======================
RUN apt-get update
RUN apt-get install -y --no-install-recommends tzdata locales
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV TZ="America/New_York"

#======================
# Install Common Fonts
#======================
RUN apt-get update
RUN apt-get install -y \
fonts-liberation \
fonts-open-sans \
fonts-mononoki \
fonts-roboto \
fonts-lato

#============================
# Install Linux Dependencies
#============================
RUN apt-get update && apt-get install -y \
fonts-liberation \
RUN apt-get update
RUN apt-get install -y \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
Expand All @@ -17,60 +40,57 @@ RUN apt-get update && apt-get install -y \
libgtk-3-0 \
libnspr4 \
libnss3 \
libu2f-udev \
libvulkan1 \
libwayland-client0 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxkbcommon0 \
libxrandr2 \
libu2f-udev \
libvulkan1 \
xdg-utils
libxrandr2

#==========================
# Install useful utilities
#==========================
RUN apt-get update
RUN apt-get install -y xdg-utils

#=================================
# Install Bash Command Line Tools
#=================================
RUN apt-get update
RUN apt-get -qy --no-install-recommends install \
curl \
sudo \
unzip \
vim \
wget \
xvfb \
&& rm -rf /var/lib/apt/lists/*
xvfb

#================
# Install Chrome
#================
RUN curl -LO https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN apt-get install -y ./google-chrome-stable_current_amd64.deb
RUN apt-get update
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb
RUN apt-get -fy --no-install-recommends install
RUN rm google-chrome-stable_current_amd64.deb

#================
# Install Python
#================
RUN apt-get update -y
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev
RUN apt-get update
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev python3-tk
RUN alias python=python3
RUN echo "alias python=python3" >> ~/.bashrc
RUN apt-get -qy --no-install-recommends install python3.10
RUN rm /usr/bin/python3
RUN ln -s python3.10 /usr/bin/python3

#=============================================
# Allow Special Characters in Python Programs
#=============================================
RUN export PYTHONIOENCODING=utf8
RUN echo "export PYTHONIOENCODING=utf8" >> ~/.bashrc

#===========================
# Configure Virtual Display
#===========================
RUN set -e
RUN echo "Starting X virtual framebuffer (Xvfb) in background..."
RUN Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
RUN export DISPLAY=:99
RUN exec "$@"
#===============
# Cleanup Lists
#===============
RUN rm -rf /var/lib/apt/lists/*

#=====================
# Set up SeleniumBase
Expand All @@ -89,6 +109,7 @@ RUN find . -name '*.pyc' -delete
RUN pip install --upgrade pip setuptools wheel
RUN cd /SeleniumBase && ls && pip install -r requirements.txt --upgrade
RUN cd /SeleniumBase && pip install .
RUN pip install pyautogui

#=======================
# Download chromedriver
Expand Down
3 changes: 1 addition & 2 deletions examples/raw_cdp_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
try:
url = "seleniumbase.io/apps/turnstile"
driver.uc_open_with_reconnect(url, 2)
driver.switch_to_frame("iframe")
driver.uc_click("span")
driver.uc_gui_handle_cf()
driver.sleep(3)
pprint(driver.get_log("performance"))
finally:
Expand Down
6 changes: 3 additions & 3 deletions examples/raw_form_turnstile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/form_turnstile"
sb.driver.uc_open_with_reconnect(url, 2)
sb.uc_open_with_reconnect(url, 2)
sb.press_keys("#name", "SeleniumBase")
sb.press_keys("#email", "[email protected]")
sb.press_keys("#phone", "1-555-555-5555")
Expand All @@ -12,8 +12,8 @@
sb.click('span:contains("9:00 PM")')
sb.highlight_click('input[value="AR"] + span')
sb.click('input[value="cc"] + span')
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.scroll_to("iframe")
sb.uc_gui_handle_cf()
sb.highlight("img#captcha-success", timeout=3)
sb.highlight_click('button:contains("Request & Pay")')
sb.highlight("img#submit-success")
Expand Down
18 changes: 5 additions & 13 deletions examples/raw_nopecha.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,11 @@
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
sb.driver.uc_open_with_reconnect("nopecha.com/demo/turnstile", 4)
if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
if not sb.is_element_visible("circle.success-circle"):
sb.driver.uc_click("span", reconnect_time=3)
sb.switch_to_frame("#example-container0 iframe")
sb.switch_to_default_content()

sb.switch_to_frame("#example-container5 iframe")
sb.driver.uc_click("span", reconnect_time=2.5)
sb.switch_to_frame("#example-container5 iframe")
sb.assert_element("svg#success-icon", timeout=3)
sb.switch_to_parent_frame()
sb.uc_open_with_disconnect("nopecha.com/demo/turnstile", 3.5)
sb.uc_gui_press_keys("\t\t ")
sb.sleep(3.5)
sb.connect()
sb.uc_gui_handle_cf("#example-container5 iframe")

if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
Expand Down
20 changes: 3 additions & 17 deletions examples/raw_turnstile.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,9 @@
from seleniumbase import SB


def open_the_turnstile_page(sb):
with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/turnstile"
sb.driver.uc_open_with_reconnect(url, reconnect_time=2)


def click_turnstile_and_verify(sb):
sb.driver.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_open_with_reconnect(url, reconnect_time=2)
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)


with SB(uc=True, test=True) as sb:
open_the_turnstile_page(sb)
try:
click_turnstile_and_verify(sb)
except Exception:
open_the_turnstile_page(sb)
click_turnstile_and_verify(sb)
sb.set_messenger_theme(location="top_left")
sb.post_message("SeleniumBase wasn't detected", duration=3)
5 changes: 2 additions & 3 deletions examples/uc_cdp_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,15 @@ def add_cdp_listener(self):
)

def click_turnstile_and_verify(sb):
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)
sb.highlight("img#captcha-success", loops=8)

def test_display_cdp_events(self):
if not (self.undetectable and self.uc_cdp_events):
self.get_new_driver(undetectable=True, uc_cdp_events=True)
url = "seleniumbase.io/apps/turnstile"
self.driver.uc_open_with_reconnect(url, 2)
self.uc_open_with_reconnect(url, 2)
self.add_cdp_listener()
self.click_turnstile_and_verify()
self.sleep(1)
Expand Down
85 changes: 32 additions & 53 deletions help_docs/uc_mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

* Automatically changing user agents to prevent detection.
* Automatically setting various chromium args as needed.
* Has special methods. Eg. `driver.uc_click(selector)`
* Has special `uc_*()` methods.

👤 Here's an example with the <b><code translate="no">Driver</code></b> manager:

Expand Down Expand Up @@ -67,22 +67,11 @@ with SB(uc=True, test=True) as sb:
```python
from seleniumbase import SB

def open_the_turnstile_page(sb):
with SB(uc=True, test=True) as sb:
url = "seleniumbase.io/apps/turnstile"
sb.driver.uc_open_with_reconnect(url, reconnect_time=2)

def click_turnstile_and_verify(sb):
sb.switch_to_frame("iframe")
sb.driver.uc_click("span")
sb.uc_open_with_reconnect(url, reconnect_time=2)
sb.uc_gui_handle_cf()
sb.assert_element("img#captcha-success", timeout=3)

with SB(uc=True, test=True) as sb:
open_the_turnstile_page(sb)
try:
click_turnstile_and_verify(sb)
except Exception:
open_the_turnstile_page(sb)
click_turnstile_and_verify(sb)
sb.set_messenger_theme(location="top_left")
sb.post_message("SeleniumBase wasn't detected", duration=3)
```
Expand Down Expand Up @@ -129,6 +118,27 @@ with SB(uc=True, test=True, ad_block_on=True) as sb:

<img src="https://seleniumbase.github.io/other/ttm_bypass.png" title="SeleniumBase" width="540">

👤 <b>On Linux</b>, use `sb.uc_gui_handle_cf()` to handle Cloudflare Turnstiles:

```python
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
url = "https://www.virtualmanager.com/en/login"
sb.uc_open_with_reconnect(url, 4)
print(sb.get_page_title())
sb.uc_gui_handle_cf() # Ready if needed!
print(sb.get_page_title())
sb.assert_element('input[name*="email"]')
sb.assert_element('input[name*="login"]')
sb.set_messenger_theme(location="bottom_center")
sb.post_message("SeleniumBase wasn't detected!")
```

<a href="https://github.com/mdmintz/undetected-testing/actions/runs/9637461606/job/26576722411"><img width="540" alt="uc_gui_handle_cf on Linux" src="https://github.com/seleniumbase/SeleniumBase/assets/6788579/6aceb2a3-2a32-4521-b30a-f79446d2ce28"></a>

The 2nd `print()` should output "Virtual Manager", which means that the automation successfully passed the Turnstile.

--------

👤 In <b translate="no">UC Mode</b>, <code translate="no">driver.get(url)</code> has been modified from its original version: If anti-bot services are detected from a <code translate="no">requests.get(url)</code> call that's made before navigating to the website, then <code translate="no">driver.uc_open_with_reconnect(url)</code> will be used instead. To open a URL normally in <b translate="no">UC Mode</b>, use <code translate="no">driver.default_get(url)</code>.
Expand All @@ -144,6 +154,7 @@ with SB(uc=True, test=True, ad_block_on=True) as sb:
<img src="https://seleniumbase.github.io/other/pixelscan.jpg" title="SeleniumBase" width="540">

### 👤 Here are some UC Mode examples that bypass CAPTCHAs when clicking is required:
* [SeleniumBase/examples/raw_pyautogui.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_pyautogui.py)
* [SeleniumBase/examples/raw_turnstile.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_turnstile.py)
* [SeleniumBase/examples/raw_form_turnstile.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_form_turnstile.py)
* [SeleniumBase/examples/uc_cdp_events.py](https://github.com/seleniumbase/SeleniumBase/blob/master/examples/uc_cdp_events.py)
Expand Down Expand Up @@ -214,11 +225,6 @@ driver.reconnect("breakpoint")

(Note that while the special <b><code translate="no">UC Mode</code></b> breakpoint is active, you can't use <b><code translate="no">Selenium</code></b> commands in the browser, and the browser can't detect <b><code translate="no">Selenium</code></b>.)

👤 The two main causes of getting detected in <b translate="no">UC Mode</b> (which are both easily handled) are:

<li>Timing. (<b translate="no">UC Mode</b> methods let you customize default values that aren't good enough for your environment.)</li>
<li>Not using <b><code translate="no">driver.uc_click(selector)</code></b> when you need to remain undetected while clicking something.</li>

👤 On Linux, you may need to use `driver.uc_gui_handle_cf()` to successfully bypass a Cloudflare CAPTCHA. If there's more than one iframe on that website (and Cloudflare isn't the first one) then put the CSS Selector of that iframe as the first arg to `driver.uc_gui_handle_cf()`. This method uses `pyautogui`. In order for `pyautogui` to focus on the correct element, use `xvfb=True` / `--xvfb` to activate a special virtual display on Linux.

👤 To find out if <b translate="no">UC Mode</b> will work at all on a specific site (before adjusting for timing), load your site with the following script:
Expand Down Expand Up @@ -268,46 +274,15 @@ with ThreadPoolExecutor(max_workers=len(urls)) as executor:

--------

👥 <b>Double Duty:</b> Here's an example of handling two CAPTCHAs on one page:

<img src="https://seleniumbase.github.io/other/nopecha.png" title="SeleniumBase" align="center" width="630">

```python
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
sb.driver.uc_open_with_reconnect("nopecha.com/demo/turnstile", 3.4)
if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
if not sb.is_element_visible("circle.success-circle"):
sb.driver.uc_click("span", reconnect_time=3)
sb.switch_to_frame("#example-container0 iframe")
sb.switch_to_default_content()

sb.switch_to_frame("#example-container5 iframe")
sb.driver.uc_click("span", reconnect_time=2.5)
sb.switch_to_frame("#example-container5 iframe")
sb.assert_element("svg#success-icon", timeout=3)
sb.switch_to_parent_frame()

if sb.is_element_visible("#example-container0 iframe"):
sb.switch_to_frame("#example-container0 iframe")
sb.assert_element("circle.success-circle")
sb.switch_to_parent_frame()

sb.set_messenger_theme(location="top_center")
sb.post_message("SeleniumBase wasn't detected!", duration=3)
```

--------

👤 <b>What makes UC Mode work?</b>

Here are the 3 primary things that <b translate="no">UC Mode</b> does to make bots appear human:

<ul>
<li>Modifies <b><code translate="no">chromedriver</code></b> to rename <b translate="no">Chrome DevTools Console</b> variables.</li>
<li>Launches <b translate="no">Chrome</b> browsers before attaching <b><code translate="no">chromedriver</code></b> to them.</li>
<li>Disconnects <b><code translate="no">chromedriver</code></b> from <b translate="no">Chrome</b> during stealthy actions.</li>
</ul>

For example, if the <b translate="no">Chrome DevTools Console</b> variables aren't renamed, you can expect to find them easily when using <b><code translate="no">selenium</code></b> for browser automation:

Expand All @@ -321,13 +296,17 @@ While <b><code translate="no">chromedriver</code></b> is connected to <b transla

Links to those <a href="https://github.com/SeleniumHQ/selenium">raw <b>Selenium</b></a> method definitions have been provided for reference (but you don't need to call those methods directly):

<ul>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/common/service.py#L135">driver.service.stop()</a></code></b></li>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/common/service.py#L91">driver.service.start()</a></code></b></li>
<li><b><code translate="no"><a href="https://github.com/SeleniumHQ/selenium/blob/9c6ccdbf40356284fad342f70fbdc0afefd27bd3/py/selenium/webdriver/remote/webdriver.py#L284">driver.start_session(capabilities)</a></code></b></li>
</ul>

Also note that <b><code translate="no">chromedriver</code></b> isn't detectable in a browser tab if it never touches that tab. Here's a JS command that lets you open a URL in a new tab (from your current tab):

<ul>
<li><b><code translate="no">window.open("URL");</code></b> --> (Info: <a href="https://www.w3schools.com/jsref/met_win_open.asp" target="_blank">W3Schools</a>)</li>
</ul>

The above JS method is used within <b translate="no"><code>SeleniumBase</code></b> <b translate="no">UC Mode</b> methods for opening URLs in a stealthy way. Since some websites try to detect if your browser is a bot on the initial page load, this allows you to bypass detection in those situations. After a few seconds (customizable), <b translate="no">UC Mode</b> tells <b><code translate="no">chromedriver</code></b> to connect to that tab so that automated commands can now be issued. At that point, <b><code translate="no">chromedriver</code></b> could be detected if websites are looking for it (but generally websites only look for it during specific events, such as page loads, form submissions, and button clicks).

Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
pip>=24.0;python_version<"3.8"
pip>=24.1;python_version>="3.8"
pip>=24.1.1;python_version>="3.8"
packaging>=24.0;python_version<"3.8"
packaging>=24.1;python_version>="3.8"
setuptools>=68.0.0;python_version<"3.8"
setuptools>=70.1.0;python_version>="3.8"
setuptools>=70.1.1;python_version>="3.8"
wheel>=0.42.0;python_version<"3.8"
wheel>=0.43.0;python_version>="3.8"
attrs>=23.2.0
Expand Down
2 changes: 1 addition & 1 deletion seleniumbase/__version__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# seleniumbase package
__version__ = "4.28.0"
__version__ = "4.28.1"
Loading