Can this be used to to crawl a specific website and be used for a "website search" #216

drwankingstein · 2024-08-20T06:19:17Z

Was wondering if it was possible to use this as a website specific search, in place of the "powered by google" search you often see. If so what would the process of setting this up look like? I did try to look into it, but i'm not sure how to setup the crawler and stuff to crawl specific website(s)

mikkeldenker · 2024-08-20T07:47:32Z

Stracts crawler can't be limited to specific sites, but the index is built from plain .warc files so other crawlers such as nutch and heritrix should also work. I don't have experience with them so I don't know if they can be limited to specific sites, but they might.

As far as I know the 'powered by google' actually just executes a search {query} site:{site} to google which would very much be possible to build on top of stracts api as well.

satonotdead · 2024-09-10T00:09:56Z

This will be a must! Was discussed before :)

drwankingstein · 2025-01-29T01:14:05Z

Is it possible to specify a website to start with for crawling? I don't necesssairly need to limit the index to just the site in question, but I would like to try to keep it relevant I found the documentation a bit hard to understand. Also is it possible to override the user agent the crawler uses?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this be used to to crawl a specific website and be used for a "website search" #216

Can this be used to to crawl a specific website and be used for a "website search" #216

drwankingstein commented Aug 20, 2024

mikkeldenker commented Aug 20, 2024

satonotdead commented Sep 10, 2024

drwankingstein commented Jan 29, 2025

Can this be used to to crawl a specific website and be used for a "website search" #216

Can this be used to to crawl a specific website and be used for a "website search" #216

Comments

drwankingstein commented Aug 20, 2024

mikkeldenker commented Aug 20, 2024

satonotdead commented Sep 10, 2024

drwankingstein commented Jan 29, 2025