Distributed Web Crawler

This project distributed web crawler implemented in Go. It uses Kafka for URL frontier management, Redis for caching, and Cassandra for storing crawled data.

Quickstart

Install Docker on your system.
Clone this repository: git clone https://github.com/yourusername/distributed-web-crawler.git

cd distributed-web-crawler

Run the setup script: ./scripts/setup.sh

The setup script will start all necessary Docker containers and launch the crawler. Once you run the script once, you can just start the crawler with "go run cmd/crawler/main.go".

To stop the crawler, press Ctrl+C, and then run ./scripts/cleanup.sh to stop and remove the Docker containers.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cmd/crawler		cmd/crawler
config		config
internal		internal
pkg		pkg
scripts		scripts
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed Web Crawler

Quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Languages

nicolasgarza/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Distributed Web Crawler

Quickstart

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages