It is a scalable microservice which scrapes the urls and stores its response. We can setup it using the following two methods.
First Method - Using Docker Compose
The recommended way to install URL Scraper microservice is through Docker Compose.
# Clone the Repository
git clone https://github.com/spandansingh/url_scraper.gitdocker-compose up --build -dYay! Everything is now up and running. It will now build and run three services in separate docker containers -
- Microservice ( Lumen Micro-Framework - The stunningly fast micro-framework by Laravel )
- Database Server (MySQL)
- Database Client (PhpMyAdmin)
which can be listed by the following command-
docker-compose psDocker compose creates a local network between these containers.
If the worker failed to scrape any url it retries to scrape it. The Threshold number of retries could be modified by changing the environment variable RETRIES_THRESHOLD inside the docker-compose.yml. Default number of retries is 3. In the docker-compose.yml, we can also modfiy other environment variables like database credentials.
Docker Compose automatically pulls the docker image. However, the docker image could also built locally using the Dockerfile inside the root folder. Run the following command to build the docker image locally.
docker build -t spandy/url_scraper .Let's populate some urls in database now!! Please navigate to phpmyadmin which is running at http://localhost:8181.
Create a database moveinsync and import the sql file that is in the root folder of this repository.
Note: Since worker is already running so you will be able to see the results inside the table.
API to get the report for the failed urls - http://localhost:8000/urls/failed
Username - root
Password - moveinsync
Database - moveinsyncdocker-compose downSecond Method - Using Composer
# Clone the Repository
git clone https://github.com/spandansingh/url_scraper.git# Install composer
curl -sS https://getcomposer.org/installer | phpNext, run the composer command inside the /app folder.
# Install dependencies
composer installNow, set the MySQL database credentials in the app/.env file to connect with database.
Please find the exported sql file in the root folder.
The Threshold number of retries could be modifiled by changing the environment variable RETRIES_THRESHOLD inside the /app/.env file Default number of retries is 3.
Now, change the directory to /app and start the worker to process urls by running the following command.
# Start the worker
php artisan moveinsync:url_scraperNow open another terminal instance and start the http server.
# Start the HTTP Server
php -S localhost:8000 -t publicTo get the report of failed urls use the api - http://localhost:8000/urls/failed