About

Scrape main content on multiple websites using Python in parallel.

You still need to use proxy if the access to the website is blocked. Read "More Parameters" section.

Dependency

How to use

pip install py-websites-scraper

Quick usage:

import asyncio
from py_websites_scraper import scrape_urls

urls = ["https://news.ycombinator.com", "https://example.com"]
data = asyncio.run(scrape_urls(urls, max_concurrency=5))
for item in data:
    if item["success"] is True:
        print(item["url"], item.get("title"), item.get("content"))
    else:
        print("Failed fetching this URL: " + item["url"])

Available key on the response:

url
success # True/False
title   # only available when it's successful
content # only available when it's successful
error   # only available when it's failed

You can always check the success value if it's true, before fetching the title or content.

More parameters

You can add any parameters for aiohttp to perform the request like headers, proxy, and more. Please check aiohttp documentation for reference.

Example:

urls = []
results = await scrape_urls(
    urls,
    proxy="YOUR_PROXY_INFO",
    headers={"User-Agent": "USER_AGENT_INFO"},
)

Limitation

Gated content
Dynamic generated content

How the test the package locally for Dev

Install in editable mode:

pip install -e .

Run any file that importing this package

python test_local.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Dependency

How to use

More parameters

Limitation

How the test the package locally for Dev

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

hilmanski/py-websites-scraper

Folders and files

Latest commit

History

Repository files navigation

About

Dependency

How to use

More parameters

Limitation

How the test the package locally for Dev

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages