robotSpider

A simple python script that will collect and analyze the robots.txt file of websites.

Use

Create a text file list of domains you wish to search top-100.txt is included as an example, larger lists can be found in the Top-Site-Lists repo
Run the command ./robotSpider.py -i YOURINPUTFILE -t NUMBEROFTHREADS for example ./robotSpider.py -i top-100.txt, number of threads defaults to 200
The files will be output into the current directory, prefixed with the name of your input file

##Interesting One Liners

Find any rules with "test" in the rules file cat top-100_rules.csv | grep "test"
Find any rules with "beta" in the rules file cat top-100_rules.csv | grep "beta"
Find any rules with "admin" in the rules file cat top-100_rules.csv | grep "admin"
Find any rules with ".pdf" or ".xls" or ".doc" in the rules file cat top-100_rules.csv | grep ".pdf\|.xls\|.doc"

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
exampleresults		exampleresults
README.md		README.md
robotSpider.py		robotSpider.py
top-100.txt		top-100.txt