Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCRAPER] - Getting 403 from damndelicious #4868

Open
3 tasks done
KaiHicks opened this issue Jan 9, 2025 · 2 comments
Open
3 tasks done

[SCRAPER] - Getting 403 from damndelicious #4868

KaiHicks opened this issue Jan 9, 2025 · 2 comments
Labels
bug Something isn't working scraper triage

Comments

@KaiHicks
Copy link

KaiHicks commented Jan 9, 2025

First Check

  • I used the GitHub search to find a similar issue and didn't find it.

  • I have verified that this issue is not related to the underlying library
    hhyrsev/recipe-scrapers by 1) checking
    the debugger and data is returned, 2)
    verifying that there are errors in the log related to application level code, or
    3) verified that the site provides recipe data, or is otherwise supported by
    hhyrsev/recipe-scrapers

  • This issue can be replicated on the demo site (https://demo.mealie.io/)

Please provide 1-5 example URLs that are having errors

https://damndelicious.net/2023/02/13/roasted-sweet-potatoes/
https://damndelicious.net/2018/04/18/thai-red-curry-noodle-soup/

Please provide your logs for the Mealie container docker logs <container-id> > mealie.logs

INFO     2025-01-08T20:45:40 - HTTP Request: GET https://damndelicious.net/2023/02/13/roasted-sweet-potatoes/ "HTTP/1.1 403 Forbidden"

INFO     2025-01-08T20:45:40 - [192.168.7.1:0] 400 Bad Request "POST /api/recipes/create/url HTTP/1.1"

Deployment

Docker (Linux)

@KaiHicks KaiHicks added bug Something isn't working scraper triage labels Jan 9, 2025
@KaiHicks
Copy link
Author

KaiHicks commented Jan 9, 2025

I am unable to import any recipes from Damn Delicious via URL. I verified that the page does have the correct metadata using the google rich results tool. I checked the logs on my server and I found that I am getting a 403 response from Damn Delicious. I verified that I am not getting this response from my browser on a machine on the same network, and I even sshed into my server and successfully curled the website. Finally, I was able to confirm that this issue also exists on the demo Mealie instance.

I see there are also a couple of other open issues where other websites give a 403. I ran the same tests detailed above on the sites mentioned in the following issues, and I got the same results:

@Kuchenpirat
Copy link
Collaborator

This site is blocking scraping attempts by our user agent.
Mealie has become in the crossfire between content creators and AI companies.

The easiest way to get recipes from pages that use recipe schema but that are blocking scraping attempts would be to use "Inspect" -> Copy Html -> go into mealie -> Create -> Import from HTML or JSON -> Paste.

In the future we might develop an browser extension that does that for you but that does not exist at this point and time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working scraper triage
Projects
None yet
Development

No branches or pull requests

2 participants