Scrapman is a Python script to crawl a list of elements from a specified URL with a specified selector.
Clone this repository:
git clone https://github.com/anned20/scrapman.gitInstall the dependencies:
pip install -r requirements.txtYou are now ready to use Scrapman:
python scrapman.py --helpYou should see something like:
Usage: scrapman.py [OPTIONS]
Options:
--debug / --no-debug           Debug mode
--url TEXT                     URL to crawl
--selector TEXT                Selector for the elements
--output-type [dict|json|csv]
--output-file TEXT             File to output the result into. Use "-" for
stdout
--help                         Show this message and exit.
To run the tests you use pytest
Execute them with pytest in the project directory
- requests - Getting the webpage
 - click - Parsing command line options
 - BeautifulSoup - Parsing the HTML of the webpage
 
This project is licensed under the MIT License - see the LICENSE.md file for details