Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work since 2/24 8:00 AM UTC #104

Open
lota09 opened this issue Feb 26, 2025 · 4 comments
Open

Doesn't work since 2/24 8:00 AM UTC #104

lota09 opened this issue Feb 26, 2025 · 4 comments

Comments

@lota09
Copy link

lota09 commented Feb 26, 2025

from autoscraper import AutoScraper

url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.
wanted_list = ["What are metaclasses in Python?"]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

Above is example code written on the README.

Image

Image

Even there's phrase "What are metaclasses in Python?" in that page, empty array returned. ( [] )
since 2/24 8:00 AM UTC

@lota09 lota09 changed the title Doesn't work since 2/24 10:00 AM UTC Doesn't work since 2/24 8:00 AM UTC Feb 26, 2025
@lota09
Copy link
Author

lota09 commented Mar 2, 2025

Is there anyone has this working correctly?

@amine-bs
Copy link

amine-bs commented Mar 2, 2025

I had the same problem. It looks like a problem with beautifulsoup4 version 4.13.x. I tested it with beatufilsoup 4.12.3 and it worked well

@lota09
Copy link
Author

lota09 commented Mar 3, 2025

I had the same problem. It looks like a problem with beautifulsoup4 version 4.13.x. I tested it with beatufilsoup 4.12.3 and it worked well

Thanks for sharing.

For those who struggle with this problem :

  1. If you install beautifulsoup4-4.12.3 like below this problem may be fixed.
    pip install beautifulsoup4==4.12.3

  2. Or creating virtual environment is great solution for this.
    Here's requirements.txt for virtual environment.

autoscraper==1.1.14
beautifulsoup4==4.12.3
bs4==0.0.2
certifi==2025.1.31
charset-normalizer==3.4.1
idna==3.10
lxml==5.3.1
requests==2.32.3
soupsieve==2.6
typing_extensions==4.12.2
urllib3==2.3.0

place it on your project and type pip -m venv .venv then type below to activate venv.

✅ Windows (PowerShell)
.venv\Scripts\Activate
✅ Windows (CMD)
.venv\Scripts\activate.bat
✅ Linux/macOS (bash/zsh)
source .venv/bin/activate

Then type pip install -r requirements.txt then intact version of beautifulsoup and autoscraper will be installed.

Now I understand why my professors told me to use virtual environment...

@lota09
Copy link
Author

lota09 commented Mar 7, 2025

Maybe we want to ask @alirezamika to update autoscraper to be compatible with latest version of bs4,
or designate beautifulsoup 4.12.3 as compatible dependency at setup.py
For those who encounter this issue in the future..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants