How to use it if the examples are from different pages? #103

brenorb · 2025-01-27T20:34:32Z

Hi,

I'm trying to scrape a list of articles, so it's a two step process.
The first step I provide the first target URL, and works great, I got a list of all URLs.

Second step is that I need to enter in each page and scrape just ONE thing: the article.

I tried it and it gave me a list of just one string: the copyright note in the end of the page.

# /// script
# dependencies = [
#   "autoscraper",
# ]
# ///

# &If you want to automatically scrape
# a website with Python, use ‘autoscraper’ & &
# pip install autoscraper

from autoscraper import AutoScraper

# Define the URL and the wanted data
# (an example headline from BBC)
url = "https://example-blog.com"
wanted_list = [
    "https://example-blog.com/articles/4415101547419-first-article",
]
# replace this with an actual headline to learn from

# Create an instance and build the scraper model
scraper = AutoScraper()
result = scraper.build(url, wanted_list)

# Testing the model
print(result)

# Save result to file, one link per line
with open("links.txt", "w") as f:
    for link in result:
        f.write(link + "\n")

sample_article = """Article Title
Article first paragraph. 

Article second paragraph.

Last Paragraph."""

for link in result:
    article = scraper.build(link, sample_article)
    print(article)
    break # I break in the first article just to see if it got what I needed

The text was updated successfully, but these errors were encountered:

github-actions · 2025-02-27T02:39:34Z

This issue is stale because it has been open for 30 days with no activity.

github-actions bot added the Stale label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use it if the examples are from different pages? #103

How to use it if the examples are from different pages? #103

brenorb commented Jan 27, 2025

github-actions bot commented Feb 27, 2025

How to use it if the examples are from different pages? #103

How to use it if the examples are from different pages? #103

Comments

brenorb commented Jan 27, 2025

github-actions bot commented Feb 27, 2025