CrawlerBox

Description

CrawlerBox is an automated analysis framework designed for parsing emails and crawling embedded web resources. This infrastructure was developed to facilitate the study of evasive phishing emails reported by end users.

For more detailed information on CrawlerBox, its functionality, and the results obtained, please refer to our paper "A Closer Look at Modern Evasive Phishing Emails".

Figure 1: CrawlerBox Analysis Pipeline

Getting started

Installation

CrawlerBox is meant to be run on Windows.

Local installation

Local installation can be done using uv

git clone https://github.com/AmadeusITGroup/CrawlerBox.git
cd CrawlerBox
uv venv -p python3.10
uv pip install -e .
.venv\Scripts\activate.bat

Necessary dependencies and configuration

First you need to install vcredist_x64.exe from the Visual C++ Redistributable Packages for Visual Studio 2013. It is necessary for the working of the library responsible for reading QR codes (QReader).

CrawlerBox relies on external services to operate (e.g., Cisco Umbrella and Shodan for data enrichment). Additionally, it connects to two external servers: one database for retrieving newly user-reported messages and another for storing the obtained results. Before running CrawlerBox, you must configure these dependencies. Please use the config.py file accordingly.

Please also consider rewriting the functions in personalized_config.py: fetch_new_emails_by_date, fetch_new_emails_by_id, and url_rewrite. The two first functions should match your implemetation for fetching newly reported emails, and url_rewrite is designed to extract and return a decoded URL from a given string. In case the URLs within the messages are rewritten (e.g., rewritten by Microsoft's Safe Links or Proofpoint's URL Defense), you might need to decode these URLs before loading them by the crawler.

Running CrawlerBox

You can run CrawlerBox in three manners.

With the -id (--phish_id) option:

The "id" argument corresponds to the id of the message to be analyzed (as is in your input DB). Example:

run_crawlerbox -id xxxx-xxxx-xxxx-xxxxxxx

With the -d (--date) option:

The "d" argument represents a date string. CrawlerBox fetches all the reported emails on date "d" and analyzes them. Example:

run_crawlerbox -d 2025-01-01

With no options:

CrawlerBox runs continously and fetches new reported emails every ten minutes. It automatically starts the analysis for the fetched messages. Example:

run_crawlerbox

Citation

Please consider citing our paper if you find it useful:

@book{boulila2025,
  title = {A Closer Look at Modern Evasive Phishing Emails},
  author = {Boulila, Elyssa and Dacier, Marc and Vengadessa Peroumal, Siva Prem and Veys, Nicolas and Aonzo, Simone},
  booktitle={2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
  year = {2025},
  organization = {IEEE}
}

Contributing

We welcome your contributions. Please feel free to fork the code, play with it, make some patches and send us pull requests using issues.

We do have a Code of conduct. Make sure to check it out before contributing.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
crawlerbox		crawlerbox
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
architecture.png		architecture.png
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrawlerBox

Description

Getting started

Installation

Local installation

Necessary dependencies and configuration

Running CrawlerBox

With the -id (--phish_id) option:

With the -d (--date) option:

With no options:

Citation

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

AmadeusITGroup/CrawlerBox

Folders and files

Latest commit

History

Repository files navigation

CrawlerBox

Description

Getting started

Installation

Local installation

Necessary dependencies and configuration

Running CrawlerBox

With the -id (--phish_id) option:

With the -d (--date) option:

With no options:

Citation

Contributing

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages