Skip to content

stas-polos/github-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Crawler

  1. Installation
  2. Configuration
  3. CLI

Installation

Clone the repository

git clone https://github.com/stas-polos/github-crawler.git

Change into the project directory

cd github-crawler

Configuration

Copy the appropriate file and rename it, removing the name ".dist"

cd .dist.env .env

Set configuration options to .env

  • Settings for docker image
    • PYTHON_IMAGE_VERSION - version of python image (default: 3.11.6-slim-bullseye)
  • Settings for logging
    • LOG_LEVEL - level of logging

CLI

For build image, run command:

docker compose build github-crawler

For run crawler need run command:

docker compose run --rm github-crawler python /app/src/cli.py searcher crawl \
  --path /app/src/storage/input/task.json

Expected result:

2024-07-11 00:00:00 [searcher] INFO: Input task: {
  "keywords": [
    "keyword1",
    "keyword2"
  ],
  "proxies": [
    "194.126.37.94:8080",
    "13.78.125.167:8080"
  ],
  "type": "repositories"
}
2024-07-11 00:00:00 [searcher] INFO: Formatted url: https://github.com/search?q=keyword1+OR+keyword2&type=repositories
2024-07-11 00:00:00 [searcher] INFO: Search results: [
  {
    "url": "https://github.com/clarkbynum/a-ORIG",
    "extra": {
      "owner": "clarkbynum",
      "language_stats": {
        "JavaScript": 99.1,
        "Shell": 0.9
      }
    }
  },
  {
    "url": "https://github.com/wecacuee/keyword2cmdline",
    "extra": {
      "owner": "wecacuee",
      "language_stats": {
        "Python": 100.0
      }
    }
  },
  {
    "url": "https://github.com/bearmi/mgrep",
    "extra": {
      "owner": "bearmi",
      "language_stats": {
        "C": 100.0
      }
    }
  },
  {
    "url": "https://github.com/zhangnn520/keyword2textgenration",
    "extra": {
      "owner": "zhangnn520",
      "language_stats": {
        "Jupyter Notebook": 76.3,
        "Python": 23.4,
        "Shell": 0.3
      }
    }
  },
  {
    "url": "https://github.com/gh-silent/Keyword1",
    "extra": {
      "owner": "gh-silent",
      "language_stats": null
    }
  },
  {
    "url": "https://github.com/ucon19/keyword2024",
    "extra": {
      "owner": "ucon19",
      "language_stats": null
    }
  },
  {
    "url": "https://github.com/AzzaSalahEldin/KEYWORD1",
    "extra": {
      "owner": "AzzaSalahEldin",
      "language_stats": {
        "Groovy": 100.0
      }
    }
  },
  {
    "url": "https://github.com/UjjwalaChaudhari/keyword1",
    "extra": {
      "owner": "UjjwalaChaudhari",
      "language_stats": {
        "HTML": 51.3,
        "Java": 35.5,
        "CSS": 8.0,
        "JavaScript": 5.2
      }
    }
  },
  {
    "url": "https://github.com/anantj-optimus/keyword1",
    "extra": {
      "owner": "anantj-optimus",
      "language_stats": null
    }
  },
  {
    "url": "https://github.com/namechk1/keyword1",
    "extra": {
      "owner": "namechk1",
      "language_stats": null
    }
  }
]

About

Crawler for github.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published