Skip to content

Simple crawler to get pages of interest based on keywords

Notifications You must be signed in to change notification settings

robinmaben/rbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

r-bot

Simple, low-ceremony crawler to find pages of interest based on provided keywords and starting points - and report them to a Slack message / channel.

Algorithm -

CrawlList = []
InterestingList = []

StartingPoints = []

Launch(depth):

    for level in depth:
        Crawl()

    Collect():

Reset():
    # Add/Update StartingPoints to CrawlList
    # Reset all crawled pages metadata

Crawl():
    for page in CrawlList:
        Save page content and metadata

        urls = Fetch URLs()
        for url in urls:
            if not in CrawlList:
            CrawlList.append(url)


Collect():
    for page in CrawlList
        analysis = Analyze(page)
        if analysis.is_interesting:
            Add or update InterestingList


Analyze(page):
    # Conditions
    # key words


Relaunch():
    #

About

Simple crawler to get pages of interest based on keywords

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages