Skip to content

Conversation

valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Sep 12, 2025

Summary

As part of evaluating and improving import/reimport performance we have selected django-pghistory as the new auditlog for Defect Dojo.

It is entirely based on Postgres triggers to maintain a history/audit log. This makes it run faster and offers new features such as reverting a record to an old version, adding richer context to processes and events, finding events on related records, etc.
This PR doesn't introduce new features yet, it just adds django-pghistory as replacement for django-auditlog.

This PR allows still for using django-auditlog as auditlog. There's also some builds published to Docker Hub: https://hub.docker.com/r/defectdojo/defectdojo-django/tags?name=pghistory

In a future release dhango-pghistory will be the auditlog implementation and django-auditlog will no longer be available for auditlogging.

HowTo

To switch an instance over to django-pghistory, some steps are needed:

  • set DD_AUDITLOG_TYPE to django-pghistory
  • run docker compose exec -it uwsgi ./manage.py pghistory backfill to load the initial snapshot of data into django-pghistory. (this is NOT a data migration from django-auditlog, but an initial snapshot from the current data in Defect Dojo).

In the future both will be part of a new release which will perform the backfill automatically.
Once switched over, you cannot switch back (unless you know what you're doing).

The action history pages will always display the data from both django-pghistory and django-auditlog. As these data formats are completely different, there are two tables on the action history page.

Some notes about the implementation

  • django-pghistory works with database triggers these are created as part of a migration regardless of which audit log is configured.
  • the entrypoint script will enabled/disable the triggers based on the chosen auditlog type. this can not be done in models.py, apps.py or other Django lifecycle events as it was always to early or too late.
  • models are registered with django-auditlog depending on the chosen auditlog type in settings.
image image

Performance

I did a couple of test runs to compare both audit log types using the JFrog Unified Very Many sample scan that contains ~14000 findings. These tests runs show a 20-30% speed improvement on my laptop running docker compose. In a (production) environment with an external database the difference might be bigger due to the increased latency. django-pghistory runs inside the database so there are a lot less network roundtrips needed.

Scaling

djang-pghistory documents two settings or design decisions that affect performance: https://django-pghistory.readthedocs.io/en/3.4.3/performance/

  • Row level vs Statement level triggers.
    Defect Dojo doesn't do (m)any bulk inserts/updates on the tracked models. So there's no benefit now to switch to Statement level triggers.
    If needed we can switch in the future using a one-time schema-only migration.

  • Denormalizing Context
    Context is stored in its own table and events have a foreign key relation to this table. If the Context is intensively used and updated during processes/requests and lots of parallel requests are happening that trigger pghistory events, there could be some contention on that Context table. To "solve" this we could choose to store the context in a column in the Event tables. I don't think the typical use-case in Defect Dojo would really benefit from this, it could be detrimental as it would require more storage and filtering/extracting data from the context might be slower.
    If needed we can switch in the future using a one-time schema+data migration.

  • Indices
    By default djang-pghistory doesn't create much indices, only a couple of ones we don't need. Changes made:

  • Remove indices on foreign key fields in the event tables (i.e. user_id, found_by, etc). We don't query on those.

  • Add index on fields that we query on like pgh_created, pgh_label.

  • Add index on obj_id field used in joins/queries

  • Add index on fields from the json context, i.e. user, remote_addr, url

Flushing the Auditlog

This PR also optimize the existing auditlog flushing (fixes #10835):

  • Use a timestamp instead of a date for filtering the entries to be deleted so we can use the database index
  • Batch the deletes in batches to avoid deleting millions of records at once
  • Add a management command to flush the audit log

Next steps

When testing is succesfull and we want to make django-pghistory the default, we need to do the following in a follow up PR

  • Add a migration that performs the backfill. Leave the management command in place as it provides a way for users to perform the backfill manually. The migration code will skip records that already have an initial_import record
  • Remove registering models for django-auditlog
  • Remove registering models for django-auditlog from MIDDLEWARE
  • Remove registering models for django-auditlog installed apps (not sure, it might need to remain to be able to load LogEntry model)

@github-actions github-actions bot added docker New Migration Adding a new migration file. Take care when merging. settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests ui labels Sep 12, 2025
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@valentijnscholten
Copy link
Member Author

valentijnscholten commented Sep 27, 2025

We should default to just audit-log for now, and then convert later on

That was the idea, but as mentioned in the description above it's non-trivial to make everything work in a code base where both django-auditlog and django-pghistory are available and we want to have a working toggle. Flipping the toggle to default to django-auditlog would break the unit test around delete notifications.

I spent some more time on this and found out it's caused by our post_delete signal now running *before& the django-auditlog signal running. So we cannot retrieve the actor information from the auditlog event. I am not sure if that approach was even a valid one from the beginning, but that's how it's done in the signal. I now changed the signal to just use the get_current_user method, which is also used by our AuditLogMiddleware under the hood. The test now passes.

However, I expect the get_current_user to return None for when ASYNC_OBJECT_DELETE is enabled. The options that I see are:

  1. Accept that when using ASYNC_OBJECT_DELETE the notifications sent by Defect Dojo do not contain the user/actor who deleted the object
  2. Change our DojoAsyncTask annotation to propagate the current user into the celery tasks. I tried this, but it's a bigger change probably and would affect the call chain of tasks. I think this would be too big a change to embed in this auditlog PR.
  3. Remove django-auditlog from this branch and just make the switch to django-pghistory. We would need do add a migration and we would be good to go a new release or for instances to use this build. We could keep the branch in sync with master for a while until we add the PR to a release.

When using django-pghistory the problem goes away because the auditlog is created in-line by postgres so the actor/user information will be available when our post_delete signal is called and the info is retrieved from the pghistory entries as it was before.

I think option 2 is something to look at later in a separate PR. I think we will need it at some point if we push more work into the celeryworkers. Maybe Pro is already doing this?

Option 1 is the easiest and most flexible. There are no other signals depending on presence of auditlog entries. Still there's a chance of something not working around auditlogs. The chance is small since we have unit tests.
Option 3 is the safest as it would simplify the codebase a lot.

We could to option 1 for now so we can easily test on the Demo instances and maybe some dev/test/staging instances for Pro. Once everything looks good we could transform the branch or a new PR to implement option 3.

I've updated the GHA unit tests to run the test suite against both auditlog types.

@Maffooch
Copy link
Contributor

We could to option 1 for now so we can easily test on the Demo instances and maybe some dev/test/staging instances for Pro. Once everything looks good we could transform the branch or a new PR to implement option 3.

This sounds very reasonable to me. Thank you for providing a very detailed to write up to make the conclusion very easy to grok

Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Option 1 👍

Copy link
Contributor

github-actions bot commented Oct 2, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

github-actions bot commented Oct 3, 2025

Conflicts have been resolved. A maintainer will review the pull request shortly.

@valentijnscholten valentijnscholten marked this pull request as draft October 3, 2025 16:10
@valentijnscholten valentijnscholten marked this pull request as ready for review October 3, 2025 16:19
Copy link

dryrunsecurity bot commented Oct 3, 2025

DryRun Security

🟡 Please give this pull request extra attention during review.

This pull request introduces multiple security and operational concerns: unsafe use of f-strings to execute raw SQL (potential SQL injection) and destructive management commands (pghistory_clear, flush_auditlog) with no authorization checks, plus risks of information disclosure and user enumeration via notifications and audit reconstruction, audit middleware ordering that loses user context, and configurable audit-flush settings that could allow resource exhaustion.

🟡 Potential SQL Injection in dojo/management/commands/pghistory_clear.py
Vulnerability Potential SQL Injection
Description The command executes raw SQL via connection.cursor().execute() using f-strings that embed the database table name directly into the SQL statements (e.g. cursor.execute(f'DROP TABLE IF EXISTS "{db_table}" CASCADE') and cursor.execute(f'TRUNCATE TABLE "{db_table}" RESTART IDENTITY CASCADE')). While the table names originate from EventModel._meta.db_table (derived from installed models), if an attacker can influence that value (for example via a maliciously crafted model, dynamic model loading, or misconfiguration) or if any future change causes user-controlled input to reach db_table, this would allow SQL injection. Using parameterized queries or validating/whitelisting table names would eliminate the risk. The use of .execute() with formatted SQL makes this code potentially vulnerable.

"""
Management command to clear all pghistory Event tables.
This command removes all historical event data from django-pghistory tables.
Use with caution as this operation is irreversible. It's meant to be used only during development/testing.
"""
import logging
from django.apps import apps
from django.conf import settings
from django.core.management.base import BaseCommand
from django.db import connection, transaction
logger = logging.getLogger(__name__)
class Command(BaseCommand):
help = "Clear all pghistory Event tables"
def add_arguments(self, parser):
parser.add_argument(
"--dry-run",
action="store_true",
help="Show what would be cleared without actually clearing",
)
parser.add_argument(
"--force",
action="store_true",
help="Skip confirmation prompt (use with caution)",
)
parser.add_argument(
"--drop",
action="store_true",
help="Drop tables entirely instead of truncating (EXTREMELY DESTRUCTIVE)",
)
def handle(self, *args, **options):
if not settings.ENABLE_AUDITLOG or settings.AUDITLOG_TYPE != "django-pghistory":
self.stdout.write(
self.style.WARNING(
"pghistory is not enabled. Set DD_ENABLE_AUDITLOG=True and "
"DD_AUDITLOG_TYPE=django-pghistory",
),
)
return
# All pghistory Event tables based on tracked models
event_tables = [
"Cred_UserEvent",
"Dojo_UserEvent",
"EndpointEvent",
"EngagementEvent",
"Finding_GroupEvent",
"Finding_TemplateEvent",
"FindingEvent",
"Notification_WebhooksEvent",
"Product_TypeEvent",
"ProductEvent",
"Risk_AcceptanceEvent",
"TestEvent",
]
dry_run = options["dry_run"]
force = options["force"]
drop_tables = options["drop"]
if dry_run:
self.stdout.write(
self.style.WARNING("DRY RUN MODE - No data will be cleared"),
)
total_records = 0
table_counts = {}
# First, count all records
self.stdout.write("Analyzing pghistory Event tables...")
for table_name in event_tables:
try:
EventModel = apps.get_model("dojo", table_name)
count = EventModel.objects.count()
table_counts[table_name] = count
total_records += count
if count > 0:
self.stdout.write(f" {table_name}: {count:,} records")
else:
self.stdout.write(f" {table_name}: empty")
except LookupError:
self.stdout.write(
self.style.WARNING(f" {table_name}: table not found (skipping)"),
)
continue
except Exception as e:
self.stdout.write(
self.style.ERROR(f" {table_name}: error counting records - {e}"),
)
continue
if total_records == 0:
self.stdout.write(
self.style.SUCCESS("No pghistory records found. Nothing to clear."),
)
return
self.stdout.write(f"\nTotal records to clear: {total_records:,}")
if dry_run:
operation = "drop" if drop_tables else "clear"
self.stdout.write(
self.style.SUCCESS(
f"\nDRY RUN COMPLETE: Would {operation} {total_records:,} records "
f"from {len([t for t in table_counts.values() if t > 0])} tables",
),
)
return
# Confirmation prompt
if not force:
if drop_tables:
self.stdout.write(
self.style.ERROR(
f"\n🚨 EXTREMELY DESTRUCTIVE WARNING: This will DROP {len([t for t in table_counts.values() if t > 0])} "
f"pghistory Event tables entirely, deleting {total_records:,} records and the table structure! "
"You will need to recreate tables and run migrations to restore them!",
),
)
else:
self.stdout.write(
self.style.WARNING(
f"\n⚠️ WARNING: This will permanently delete {total_records:,} "
"pghistory records. This operation cannot be undone!",
),
)
operation_type = "DROP TABLES" if drop_tables else "truncate tables"
confirm = input(f"Are you sure you want to {operation_type}? Type 'yes' to continue: ")
if confirm.lower() != "yes":
self.stdout.write(self.style.ERROR("Operation cancelled."))
return
# Clear the tables using TRUNCATE or DROP
operation_verb = "Dropping" if drop_tables else "Truncating"
self.stdout.write(f"\n{operation_verb} pghistory Event tables...")
cleared_records = 0
cleared_tables = 0
for table_name in event_tables:
if table_counts.get(table_name, 0) == 0:
continue # Skip empty tables
try:
EventModel = apps.get_model("dojo", table_name)
# Use raw SQL TRUNCATE or DROP for better performance on large tables
with transaction.atomic():
count = table_counts.get(table_name, 0)
if count > 0:
# Get the actual database table name
db_table = EventModel._meta.db_table
with connection.cursor() as cursor:
if drop_tables:
# DROP TABLE - completely removes the table structure
cursor.execute(f'DROP TABLE IF EXISTS "{db_table}" CASCADE')
operation_past = "Dropped"
else:
# TRUNCATE TABLE - removes all data but keeps table structure
cursor.execute(f'TRUNCATE TABLE "{db_table}" RESTART IDENTITY CASCADE')
operation_past = "Truncated"
cleared_records += count
cleared_tables += 1
self.stdout.write(
self.style.SUCCESS(f" ✓ {operation_past} {table_name}: {count:,} records"),
)
except LookupError:
# Already handled in counting phase
continue
except Exception as e:
operation_verb_lower = "drop" if drop_tables else "truncate"
self.stdout.write(
self.style.ERROR(f" ✗ Failed to {operation_verb_lower} {table_name}: {e}"),
)
logger.error(f"Error {operation_verb_lower}ing {table_name}: {e}")
# Final success message
if drop_tables:
self.stdout.write(
self.style.SUCCESS(
f"\n🎉 DROP COMPLETE: Dropped {cleared_tables} tables with {cleared_records:,} records",
),
)
self.stdout.write(
self.style.WARNING(
"⚠️ Remember to run migrations to recreate the dropped tables!",
),
)
else:
self.stdout.write(
self.style.SUCCESS(
f"\n🎉 CLEARING COMPLETE: Cleared {cleared_records:,} records "
f"from {cleared_tables} tables",
),
)

Information Disclosure (User Enumeration) in dojo/product/signals.py
Vulnerability Information Disclosure (User Enumeration)
Description The product_deleted notification includes the full name and username of the user who performed the deletion. This information is derived from the Dojo_User model's __str__ method. These notifications can be sent to and viewed by non-administrative users who have 'Product_View' permissions on the deleted product, leading to potential user enumeration.

description = labels.ASSET_DELETE_WITH_NAME_WITH_USER_SUCCESS_MESSAGE % {"name": instance.name, "user": user}
create_notification(event="product_deleted", # template does not exists, it will default to "other" but this event name needs to stay because of unit testing
title=_("Deletion of %(name)s") % {"name": instance.name},

Lack of Authorization on Destructive Management Command in dojo/management/commands/flush_auditlog.py
Vulnerability Lack of Authorization on Destructive Management Command
Description The flush_auditlog management command allows for the deletion of audit log entries. This command, and the underlying functions it calls, do not perform any authorization checks. Consequently, any user with shell access to the application environment can execute this command to delete potentially critical audit data, regardless of their application-level permissions.

class Command(BaseCommand):
help = "Flush old audit log entries based on retention and batching settings"
def add_arguments(self, parser):

Information Disclosure via User Attribution in Notifications in dojo/product_type/signals.py
Vulnerability Information Disclosure via User Attribution in Notifications
Description The __str__ representation of the Dojo_User model, which includes the user's first name, last name, and username, is used in notification messages for product_type_deleted events. These notifications are stored as Alerts in the database and are also dispatched to various channels (e.g., email, Slack, MS Teams, webhooks) based on system and user configurations. If these notifications are accessible to unauthorized users, it could lead to information disclosure or user enumeration.

description = labels.ORG_DELETE_WITH_NAME_WITH_USER_SUCCESS_MESSAGE % {"name": instance.name, "user": user}
create_notification(event="product_type_deleted", # template does not exists, it will default to "other" but this event name needs to stay because of unit testing
title=_("Deletion of %(name)s") % {"name": instance.name},

Potential Resource Exhaustion via Audit Log Flush Configuration in dojo/settings/settings.dist.py
Vulnerability Potential Resource Exhaustion via Audit Log Flush Configuration
Description The DD_AUDITLOG_FLUSH_BATCH_SIZE and DD_AUDITLOG_FLUSH_MAX_BATCHES settings, which control the audit log flushing process, lack explicit upper bound validation. While default values are reasonable, an administrator could configure extremely large values for these settings. A very large batch_size could lead to excessive memory consumption when fetching primary keys into a Python list and could also strain the database with a DELETE ... WHERE pk IN (...) query containing a massive number of IDs. Similarly, an extremely large max_batches could cause the flush task to run for an unacceptably long time, consuming resources.

AUDITLOG_FLUSH_BATCH_SIZE = env("DD_AUDITLOG_FLUSH_BATCH_SIZE")
AUDITLOG_FLUSH_MAX_BATCHES = env("DD_AUDITLOG_FLUSH_MAX_BATCHES")

Audit Log User Context Misconfiguration in dojo/settings/settings.dist.py
Vulnerability Audit Log User Context Misconfiguration
Description The middleware for both django-auditlog and django-pghistory is inserted before crum.CurrentRequestUserMiddleware in dojo/settings/settings.dist.py. This is problematic because crum.CurrentRequestUserMiddleware is responsible for making the authenticated user available via request.user. As a result, audit entries generated by either django-auditlog or django-pghistory will likely have a None or anonymous user associated with them, leading to a loss of critical 'who did what' information in the audit trail.

if AUDITLOG_TYPE == "django-auditlog":
# Insert AuditlogMiddleware before CurrentRequestUserMiddleware
middleware_list.insert(crum_index, "dojo.middleware.AuditlogMiddleware")
elif AUDITLOG_TYPE == "django-pghistory":
# Insert pghistory HistoryMiddleware before CurrentRequestUserMiddleware
middleware_list.insert(crum_index, "dojo.middleware.PgHistoryMiddleware")

Information Disclosure via Object Reconstruction in Audit View in dojo/views.py
Vulnerability Information Disclosure via Object Reconstruction in Audit View
Description The get_object_str function reconstructs a model instance from historical data (event.pgh_data) and calls its __str__ method. For the Endpoint model, the __str__ method constructs a URL that includes the userinfo field. Since the Endpoint model is tracked by pghistory without explicitly excluding the userinfo field, any sensitive data (e.g., credentials) present in the userinfo part of an endpoint's URL at the time of an audit event would be stored in pgh_data. When get_object_str is called, this sensitive userinfo would be reconstructed and displayed on the action_history page, leading to information disclosure to any user with access to that page.

temp_instance = model_class(**event.pgh_data)
return str(temp_instance)

Lack of Authorization on Highly Destructive Management Command in dojo/management/commands/pghistory_clear.py
Vulnerability Lack of Authorization on Highly Destructive Management Command
Description The pghistory_clear management command, located in dojo/management/commands/pghistory_clear.py, allows for the truncation or dropping of database tables containing historical audit data. This is an extremely destructive and irreversible action. Upon reviewing the handle method of this command, it is evident that there are no authorization checks in place to verify the user's permissions (e.g., superuser status) before these operations are executed. While the command includes a confirmation prompt, this can be bypassed using the --force flag, further increasing the risk. Any user with shell access to the application environment could potentially execute this command, leading to catastrophic data loss of audit records.

class Command(BaseCommand):
help = "Clear all pghistory Event tables"
def add_arguments(self, parser):


All finding details can be found in the DryRun Security Dashboard.

@valentijnscholten valentijnscholten merged commit 40569ba into dev Oct 3, 2025
150 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker New Migration Adding a new migration file. Take care when merging. settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR ui unittests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants