-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ingest domains, add web api, document configuration, add tests
- Loading branch information
Showing
32 changed files
with
1,004 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.sqlite3* | ||
.idea | ||
.venv | ||
*.pyc | ||
.pytest_cache | ||
.coverage |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,106 @@ | ||
# Internet.nl-ct-log-subdomain-suggestions-api | ||
Internet.nl ct-log subdomain suggestions api | ||
# Internet.nl Certificate Transparency Log Subdomain Suggestions | ||
|
||
## What does this do / Intended use case | ||
The goal is to replace subdomain suggestions from crt.sh with higher uptime and faster response times. This way it can | ||
be used in other applications, such as the internet.nl dashboard, to suggest possible subdomains to end users. | ||
|
||
## How does it work | ||
This tool ingests subdomains from public certificate transparency logs using a connection from a certstream server. A | ||
web interface allows for querying the stored data, which results in a list of known subdomains. | ||
|
||
There are several key-optimizations performed that reduce the amount of subdomains stored in the database. The most | ||
important one is the list of allowed tlds that are being stored. By default only domains relevant to the Kingdom of | ||
the Netherlands are being stored. | ||
|
||
## What are the limits of this tool | ||
The limits have not yet been discovered and no optimizations have been performed yet, aside from a few proactive | ||
database indexes. It is expected to being able to store about a years worth of data from the .nl zone. This means | ||
about 5 million domains with an estimated 50 million subdomains, each which will have a new certificate every 90 days. | ||
In total about 200 million records per year. This is the same in most EU countries. There is no expectation that this | ||
tool will work quickly on the combined com/net/org zones. Although some partitioning and smarter inserting might just | ||
do the trick. For the Netherlands the total number of certificate renewals seems to be much lower for subdomains, | ||
between 0.5 to 2 per second. | ||
|
||
The goal is to being able to run this on medium sized virtual machines with just a few cores and a few gigabytes of | ||
ram. That should be enough for the Netherlands and most EU countries. We've not tried to see if this solution is 'web | ||
scale'. | ||
|
||
## How to ingest data from cerstream | ||
Configure `CTLSSA_CERTSTREAM_SERVER_URL` to point to a certstream-server instance. The default points to a certstream | ||
server hosted by the creator of certstream, calidog. This is great for testing and development, but don't use it for | ||
production purposes. | ||
|
||
Read more about setting up a certstream server here: https://github.com/CaliDog/certstream-server | ||
|
||
After configuration run the following command: | ||
```python manage.py migrate``` | ||
```python manage.py ingest``` | ||
|
||
This command should run forever. In case your certstream server is down it will patiently wait until the server is up. | ||
|
||
## How to query the results | ||
The webserver can be started with the command: | ||
```python manage.py runserver``` | ||
|
||
When you visit the web interface at http://localhost:8000/ you will see a blank JSON response. Use the following | ||
parameters to retrieve data: `http://localhost:8000/?domain=example&suffix=nl&period=365` | ||
|
||
|
||
## Further configuration options | ||
Configuration is done via environment variables, but can also be hardcoded in the settings.py file if need be. | ||
|
||
Everything is configured with environment variables and fallbacks. Environment variables of the app are prefixed with | ||
CTLSSA_, so they stand out in your `env`. | ||
|
||
CTLSSA_ACCEPTED_TLDS: Comma separted string with the zones you want subdomains from. | ||
The default is set to "nl,aw,cw,sr,sx,bq,frl,amsterdam,politie". Mileage will vary with .com, .net, .org zones and | ||
we expect ingestion not to be fast enough. | ||
|
||
DEQUE_LENGTH: Configure this to be around the amount of domains you ingest in a few hours to a day, but in a way that | ||
it doesn't hit the database limit. This value is used to deduplicate certificate renewal requests. It's very common to | ||
see certificate renewals containing the same domain for every subdomain. It's also very common to see the same request | ||
happening over and over again because the administrator made some configuration mistake and needs to repeat the process. | ||
The default is 100.000 domains. | ||
|
||
There are various database settings so any django-supported database can be used. We recommend postgres as it has more | ||
options regarding optimization than mysql. Either should be fine. Sqlite might also work, as there is only one process | ||
that writes to the database. | ||
|
||
Database settings: | ||
|
||
- CTLSSA_DB_ENGINE | ||
- CTLSSA_DB_NAME | ||
- CTLSSA_DB_USER | ||
- CTLSSA_DB_PASSWORD | ||
- CTLSSA_DB_HOST | ||
- CTLSSA_DJANGO_DATABASE | ||
|
||
|
||
## Expectations in database size and performance | ||
|
||
This package assumes that insertions in the database are faster than the amount of newly found domains. This will not | ||
hold true for every zone, especially when combining .com, .net and .org. | ||
|
||
Once this assumption doesn't hold optimizations are needed. There are several options that might help: bulk insert, | ||
parallel inserts from multiple processes, database partitioning, index ordering, reducing the amount of indexes by | ||
merging domain+suffix and so on. Other solutions might work as well. None of these have been tried yet, but you might | ||
need them. If you do, please get in touch with the repository owner so this project can be optimized for everyone. | ||
|
||
|
||
## Development | ||
This project does not have a managed virtual environment yet. This might be added in the future if need be. | ||
|
||
### Linting | ||
Run these commands before checking in. These should all pass without error. | ||
``` | ||
isort . | ||
black . | ||
pytest | ||
``` | ||
|
||
### Dependency management | ||
Run these commands to create a dependency hierarchy | ||
``` | ||
pip-compile requirements.in --output-file=requirements.txt | ||
pip-compile requirements-dev.in --output-file=requirements-dev.txt | ||
``` |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
""" | ||
ASGI config for api project. | ||
It exposes the ASGI callable as a module-level variable named ``application``. | ||
For more information on this file, see | ||
https://docs.djangoproject.com/en/4.2/howto/deployment/asgi/ | ||
""" | ||
|
||
import os | ||
|
||
from django.core.asgi import get_asgi_application | ||
|
||
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "api.settings") | ||
|
||
application = get_asgi_application() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
""" | ||
Django settings for app project. | ||
Generated by 'django-admin startproject' using Django 4.2.2. | ||
For more information on this file, see | ||
https://docs.djangoproject.com/en/4.2/topics/settings/ | ||
For the full list of settings and their values, see | ||
https://docs.djangoproject.com/en/4.2/ref/settings/ | ||
""" | ||
import os | ||
import time | ||
from pathlib import Path | ||
|
||
# Build paths inside the project like this: BASE_DIR / 'subdir'. | ||
BASE_DIR = Path(__file__).resolve().parent.parent | ||
|
||
|
||
# Quick-start development settings - unsuitable for production | ||
# See https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/ | ||
|
||
# SECURITY WARNING: keep the secret key used in production secret! | ||
SECRET_KEY = "django-insecure-6b!vz+!y)9b%8mm)=$a4wc-vh!--7l%-925o7l19asa0r$2h2a" | ||
|
||
# SECURITY WARNING: don't run with debug turned on in production! | ||
DEBUG = True | ||
|
||
ALLOWED_HOSTS = [] | ||
|
||
|
||
# Application definition | ||
|
||
INSTALLED_APPS = [ | ||
"suggestions", | ||
"django.contrib.admin", | ||
"django.contrib.auth", | ||
"django.contrib.contenttypes", | ||
"django.contrib.sessions", | ||
"django.contrib.messages", | ||
"django.contrib.staticfiles", | ||
] | ||
|
||
MIDDLEWARE = [ | ||
"django.middleware.security.SecurityMiddleware", | ||
"django.contrib.sessions.middleware.SessionMiddleware", | ||
"django.middleware.common.CommonMiddleware", | ||
"django.middleware.csrf.CsrfViewMiddleware", | ||
"django.contrib.auth.middleware.AuthenticationMiddleware", | ||
"django.contrib.messages.middleware.MessageMiddleware", | ||
"django.middleware.clickjacking.XFrameOptionsMiddleware", | ||
] | ||
|
||
ROOT_URLCONF = "app.urls" | ||
|
||
TEMPLATES = [ | ||
{ | ||
"BACKEND": "django.template.backends.django.DjangoTemplates", | ||
"DIRS": [], | ||
"APP_DIRS": True, | ||
"OPTIONS": { | ||
"context_processors": [ | ||
"django.template.context_processors.debug", | ||
"django.template.context_processors.request", | ||
"django.contrib.auth.context_processors.auth", | ||
"django.contrib.messages.context_processors.messages", | ||
], | ||
}, | ||
}, | ||
] | ||
|
||
WSGI_APPLICATION = "app.wsgi.application" | ||
|
||
|
||
# Database | ||
# https://docs.djangoproject.com/en/4.2/ref/settings/#databases | ||
DATABASE_OPTIONS = {} | ||
|
||
DB_ENGINE = os.environ.get("CTLSSA_DB_ENGINE", "postgresql") | ||
DATABASE_ENGINES = {"postgresql": "django.db.backends.postgresql"} | ||
|
||
DATABASES_SETTINGS = { | ||
# persist local database used during development | ||
"dev": { | ||
"ENGINE": "django.db.backends.sqlite3", | ||
"NAME": os.environ.get("CTLSSA_DB_NAME", "db.sqlite3"), | ||
}, | ||
# sqlite memory database for running tests without storing them permanently | ||
"test": { | ||
"ENGINE": "django.db.backends.sqlite3", | ||
"NAME": os.environ.get("CTLSSA_DB_NAME", "db.sqlite3"), | ||
}, | ||
# for production get database settings from environment (eg: docker) | ||
"production": { | ||
"ENGINE": DATABASE_ENGINES.get(DB_ENGINE, f"django.db.backends.{DB_ENGINE}"), | ||
"NAME": os.environ.get("CTLSSA_DB_NAME", "ctlssa"), | ||
"USER": os.environ.get("CTLSSA_DB_USER", "ctlssa"), | ||
"PASSWORD": os.environ.get("CTLSSA_DB_PASSWORD", "ctlssa"), | ||
"HOST": os.environ.get("CTLSSA_DB_HOST", "postgresql"), | ||
"OPTIONS": DATABASE_OPTIONS.get(os.environ.get("CTLSSA_DB_ENGINE", "postgresql"), {}), | ||
}, | ||
} | ||
# allow database to be selected through environment variables | ||
DATABASE = os.environ.get("CTLSSA_DJANGO_DATABASE", "dev") | ||
DATABASES = {"default": DATABASES_SETTINGS[DATABASE]} | ||
|
||
|
||
# Password validation | ||
# https://docs.djangoproject.com/en/4.2/ref/settings/#auth-password-validators | ||
|
||
AUTH_PASSWORD_VALIDATORS = [ | ||
{ | ||
"NAME": "django.contrib.auth.password_validation.UserAttributeSimilarityValidator", | ||
}, | ||
{ | ||
"NAME": "django.contrib.auth.password_validation.MinimumLengthValidator", | ||
}, | ||
{ | ||
"NAME": "django.contrib.auth.password_validation.CommonPasswordValidator", | ||
}, | ||
{ | ||
"NAME": "django.contrib.auth.password_validation.NumericPasswordValidator", | ||
}, | ||
] | ||
|
||
|
||
# Internationalization | ||
# https://docs.djangoproject.com/en/4.2/topics/i18n/ | ||
|
||
LANGUAGE_CODE = "en-us" | ||
|
||
TIME_ZONE = "UTC" | ||
|
||
USE_I18N = True | ||
|
||
USE_TZ = True | ||
|
||
|
||
# Static files (CSS, JavaScript, Images) | ||
# https://docs.djangoproject.com/en/4.2/howto/static-files/ | ||
|
||
STATIC_URL = "static/" | ||
|
||
# Default primary key field type | ||
# https://docs.djangoproject.com/en/4.2/ref/settings/#default-auto-field | ||
|
||
DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField" | ||
|
||
# .an has been dissolved, but this page lists the other options: https://en.wikipedia.org/wiki/.an | ||
# .nl is managed by the SIDN and is the domain of the Netherlands. | ||
# .aw, .cw, .sr, .sx, .bq are the special municipalities and countries within the kingdom of the Netherlands. | ||
# .frl is a province with their own recognized language | ||
# .amsterdam is the capitcal city of the Netherlands which provides this extension | ||
ACCEPTED_TLDS = os.environ.get("CTLSSA_ACCEPTED_TLDS", "nl,aw,cw,sr,sx,bq,frl,amsterdam,politie") | ||
ACCEPTED_TLDS = ACCEPTED_TLDS.split(",") | ||
|
||
if not ACCEPTED_TLDS: | ||
print( | ||
"Warning: no filter set on ACCEPTED_TLDS, will try to import all subdomains of everything to the database. " | ||
"This tool has not been developed for this use case and might not perform well with this amount of data. " | ||
) | ||
print("This script will continue in 10 seconds. We're excited how far this solution scaled for you. For science!") | ||
time.sleep(10) | ||
|
||
DEQUE_LENGTH = os.environ.get("CTLSSA_DEQUE_LENGTH", 100000) | ||
|
||
CERTSTREAM_SERVER_URL = os.environ.get("CTLSSA_CERTSTREAM_SERVER_URL", "wss://certstream.calidog.io/") | ||
|
||
|
||
LOGGING = { | ||
"version": 1, | ||
"disable_existing_loggers": False, | ||
"handlers": { | ||
"console": { | ||
"class": "logging.StreamHandler", # sys.stdout | ||
"formatter": "color", | ||
}, | ||
}, | ||
"formatters": { | ||
"debug": { | ||
"format": "%(asctime)s\t%(levelname)-8s - %(filename)-20s:%(lineno)-4s - " "%(funcName)20s() - %(message)s", | ||
}, | ||
"color": { | ||
"()": "colorlog.ColoredFormatter", | ||
# to get the name of the logger a message came from, add %(name)s. | ||
"format": "%(log_color)s%(asctime)s\t%(levelname)-8s - " "%(message)s", | ||
"datefmt": "%Y-%m-%d %H:%M:%S", | ||
"log_colors": { | ||
"DEBUG": "green", | ||
"INFO": "white", | ||
"WARNING": "yellow", | ||
"ERROR": "red", | ||
"CRITICAL": "bold_red", | ||
}, | ||
}, | ||
}, | ||
"loggers": { | ||
"django": { | ||
"handlers": ["console"], | ||
"level": os.getenv("CTLSSA_DJANGO_LOG_LEVEL", "INFO"), | ||
}, | ||
"app": { | ||
"handlers": ["console"], | ||
"level": os.getenv("CTLSSA_APP_LOG_LEVEL", "DEBUG"), | ||
}, | ||
"suggestions": { | ||
"handlers": ["console"], | ||
"level": os.getenv("CTLSSA_SUGGESTIONS_LOG_LEVEL", "DEBUG"), | ||
}, | ||
}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
""" | ||
URL configuration for api project. | ||
The `urlpatterns` list routes URLs to views. For more information please see: | ||
https://docs.djangoproject.com/en/4.2/topics/http/urls/ | ||
Examples: | ||
Function views | ||
1. Add an import: from my_app import views | ||
2. Add a URL to urlpatterns: path('', views.home, name='home') | ||
Class-based views | ||
1. Add an import: from other_app.views import Home | ||
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home') | ||
Including another URLconf | ||
1. Import the include() function: from django.urls import include, path | ||
2. Add a URL to urlpatterns: path('blog/', include('blog.urls')) | ||
""" | ||
# from django.contrib import admin | ||
from django.urls import include, path | ||
|
||
urlpatterns = [ | ||
path("", include("suggestions.urls")), | ||
# No administration tools have been developed for now. | ||
# path("admin/", admin.site.urls), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
""" | ||
WSGI config for api project. | ||
It exposes the WSGI callable as a module-level variable named ``application``. | ||
For more information on this file, see | ||
https://docs.djangoproject.com/en/4.2/howto/deployment/wsgi/ | ||
""" | ||
|
||
import os | ||
|
||
from django.core.wsgi import get_wsgi_application | ||
|
||
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "api.settings") | ||
|
||
application = get_wsgi_application() |
Oops, something went wrong.