Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative URLs were mishandled, causing potential duplicate page visits #10

Open
Sachin-NK opened this issue Feb 16, 2025 · 0 comments
Open

Comments

@Sachin-NK
Copy link

The web scraper wasn't handling web addresses correctly. It was getting confused by relative links (like those that just say "go up a level" instead of giving the full address). This meant it could accidentally visit the same page multiple times, wasting time and possibly messing up the data it was collecting. The fix makes sure all web addresses are complete before the scraper uses them, so it knows exactly which page is which.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant