-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Auditlog: Add django-pghistory as audit log (optional for now) #13169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
That was the idea, but as mentioned in the description above it's non-trivial to make everything work in a code base where both I spent some more time on this and found out it's caused by our However, I expect the
When using I think option 2 is something to look at later in a separate PR. I think we will need it at some point if we push more work into the celeryworkers. Maybe Pro is already doing this? Option 1 is the easiest and most flexible. There are no other signals depending on presence of auditlog entries. Still there's a chance of something not working around auditlogs. The chance is small since we have unit tests. We could to option 1 for now so we can easily test on the Demo instances and maybe some dev/test/staging instances for Pro. Once everything looks good we could transform the branch or a new PR to implement option 3. I've updated the GHA unit tests to run the test suite against both auditlog types. |
This sounds very reasonable to me. Thank you for providing a very detailed to write up to make the conclusion very easy to grok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Option 1 👍
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Conflicts have been resolved. A maintainer will review the pull request shortly. |
🟡 Please give this pull request extra attention during review.This pull request introduces multiple security and operational concerns: unsafe use of f-strings to execute raw SQL (potential SQL injection) and destructive management commands (pghistory_clear, flush_auditlog) with no authorization checks, plus risks of information disclosure and user enumeration via notifications and audit reconstruction, audit middleware ordering that loses user context, and configurable audit-flush settings that could allow resource exhaustion.
🟡 Potential SQL Injection in
|
Vulnerability | Potential SQL Injection |
---|---|
Description | The command executes raw SQL via connection.cursor().execute() using f-strings that embed the database table name directly into the SQL statements (e.g. cursor.execute(f'DROP TABLE IF EXISTS "{db_table}" CASCADE') and cursor.execute(f'TRUNCATE TABLE "{db_table}" RESTART IDENTITY CASCADE')). While the table names originate from EventModel._meta.db_table (derived from installed models), if an attacker can influence that value (for example via a maliciously crafted model, dynamic model loading, or misconfiguration) or if any future change causes user-controlled input to reach db_table, this would allow SQL injection. Using parameterized queries or validating/whitelisting table names would eliminate the risk. The use of .execute() with formatted SQL makes this code potentially vulnerable. |
django-DefectDojo/dojo/management/commands/pghistory_clear.py
Lines 1 to 206 in 1fd90d1
""" | |
Management command to clear all pghistory Event tables. | |
This command removes all historical event data from django-pghistory tables. | |
Use with caution as this operation is irreversible. It's meant to be used only during development/testing. | |
""" | |
import logging | |
from django.apps import apps | |
from django.conf import settings | |
from django.core.management.base import BaseCommand | |
from django.db import connection, transaction | |
logger = logging.getLogger(__name__) | |
class Command(BaseCommand): | |
help = "Clear all pghistory Event tables" | |
def add_arguments(self, parser): | |
parser.add_argument( | |
"--dry-run", | |
action="store_true", | |
help="Show what would be cleared without actually clearing", | |
) | |
parser.add_argument( | |
"--force", | |
action="store_true", | |
help="Skip confirmation prompt (use with caution)", | |
) | |
parser.add_argument( | |
"--drop", | |
action="store_true", | |
help="Drop tables entirely instead of truncating (EXTREMELY DESTRUCTIVE)", | |
) | |
def handle(self, *args, **options): | |
if not settings.ENABLE_AUDITLOG or settings.AUDITLOG_TYPE != "django-pghistory": | |
self.stdout.write( | |
self.style.WARNING( | |
"pghistory is not enabled. Set DD_ENABLE_AUDITLOG=True and " | |
"DD_AUDITLOG_TYPE=django-pghistory", | |
), | |
) | |
return | |
# All pghistory Event tables based on tracked models | |
event_tables = [ | |
"Cred_UserEvent", | |
"Dojo_UserEvent", | |
"EndpointEvent", | |
"EngagementEvent", | |
"Finding_GroupEvent", | |
"Finding_TemplateEvent", | |
"FindingEvent", | |
"Notification_WebhooksEvent", | |
"Product_TypeEvent", | |
"ProductEvent", | |
"Risk_AcceptanceEvent", | |
"TestEvent", | |
] | |
dry_run = options["dry_run"] | |
force = options["force"] | |
drop_tables = options["drop"] | |
if dry_run: | |
self.stdout.write( | |
self.style.WARNING("DRY RUN MODE - No data will be cleared"), | |
) | |
total_records = 0 | |
table_counts = {} | |
# First, count all records | |
self.stdout.write("Analyzing pghistory Event tables...") | |
for table_name in event_tables: | |
try: | |
EventModel = apps.get_model("dojo", table_name) | |
count = EventModel.objects.count() | |
table_counts[table_name] = count | |
total_records += count | |
if count > 0: | |
self.stdout.write(f" {table_name}: {count:,} records") | |
else: | |
self.stdout.write(f" {table_name}: empty") | |
except LookupError: | |
self.stdout.write( | |
self.style.WARNING(f" {table_name}: table not found (skipping)"), | |
) | |
continue | |
except Exception as e: | |
self.stdout.write( | |
self.style.ERROR(f" {table_name}: error counting records - {e}"), | |
) | |
continue | |
if total_records == 0: | |
self.stdout.write( | |
self.style.SUCCESS("No pghistory records found. Nothing to clear."), | |
) | |
return | |
self.stdout.write(f"\nTotal records to clear: {total_records:,}") | |
if dry_run: | |
operation = "drop" if drop_tables else "clear" | |
self.stdout.write( | |
self.style.SUCCESS( | |
f"\nDRY RUN COMPLETE: Would {operation} {total_records:,} records " | |
f"from {len([t for t in table_counts.values() if t > 0])} tables", | |
), | |
) | |
return | |
# Confirmation prompt | |
if not force: | |
if drop_tables: | |
self.stdout.write( | |
self.style.ERROR( | |
f"\n🚨 EXTREMELY DESTRUCTIVE WARNING: This will DROP {len([t for t in table_counts.values() if t > 0])} " | |
f"pghistory Event tables entirely, deleting {total_records:,} records and the table structure! " | |
"You will need to recreate tables and run migrations to restore them!", | |
), | |
) | |
else: | |
self.stdout.write( | |
self.style.WARNING( | |
f"\n⚠️ WARNING: This will permanently delete {total_records:,} " | |
"pghistory records. This operation cannot be undone!", | |
), | |
) | |
operation_type = "DROP TABLES" if drop_tables else "truncate tables" | |
confirm = input(f"Are you sure you want to {operation_type}? Type 'yes' to continue: ") | |
if confirm.lower() != "yes": | |
self.stdout.write(self.style.ERROR("Operation cancelled.")) | |
return | |
# Clear the tables using TRUNCATE or DROP | |
operation_verb = "Dropping" if drop_tables else "Truncating" | |
self.stdout.write(f"\n{operation_verb} pghistory Event tables...") | |
cleared_records = 0 | |
cleared_tables = 0 | |
for table_name in event_tables: | |
if table_counts.get(table_name, 0) == 0: | |
continue # Skip empty tables | |
try: | |
EventModel = apps.get_model("dojo", table_name) | |
# Use raw SQL TRUNCATE or DROP for better performance on large tables | |
with transaction.atomic(): | |
count = table_counts.get(table_name, 0) | |
if count > 0: | |
# Get the actual database table name | |
db_table = EventModel._meta.db_table | |
with connection.cursor() as cursor: | |
if drop_tables: | |
# DROP TABLE - completely removes the table structure | |
cursor.execute(f'DROP TABLE IF EXISTS "{db_table}" CASCADE') | |
operation_past = "Dropped" | |
else: | |
# TRUNCATE TABLE - removes all data but keeps table structure | |
cursor.execute(f'TRUNCATE TABLE "{db_table}" RESTART IDENTITY CASCADE') | |
operation_past = "Truncated" | |
cleared_records += count | |
cleared_tables += 1 | |
self.stdout.write( | |
self.style.SUCCESS(f" ✓ {operation_past} {table_name}: {count:,} records"), | |
) | |
except LookupError: | |
# Already handled in counting phase | |
continue | |
except Exception as e: | |
operation_verb_lower = "drop" if drop_tables else "truncate" | |
self.stdout.write( | |
self.style.ERROR(f" ✗ Failed to {operation_verb_lower} {table_name}: {e}"), | |
) | |
logger.error(f"Error {operation_verb_lower}ing {table_name}: {e}") | |
# Final success message | |
if drop_tables: | |
self.stdout.write( | |
self.style.SUCCESS( | |
f"\n🎉 DROP COMPLETE: Dropped {cleared_tables} tables with {cleared_records:,} records", | |
), | |
) | |
self.stdout.write( | |
self.style.WARNING( | |
"⚠️ Remember to run migrations to recreate the dropped tables!", | |
), | |
) | |
else: | |
self.stdout.write( | |
self.style.SUCCESS( | |
f"\n🎉 CLEARING COMPLETE: Cleared {cleared_records:,} records " | |
f"from {cleared_tables} tables", | |
), | |
) |
Information Disclosure (User Enumeration) in dojo/product/signals.py
Vulnerability | Information Disclosure (User Enumeration) |
---|---|
Description | The product_deleted notification includes the full name and username of the user who performed the deletion. This information is derived from the Dojo_User model's __str__ method. These notifications can be sent to and viewed by non-administrative users who have 'Product_View' permissions on the deleted product, leading to potential user enumeration. |
django-DefectDojo/dojo/product/signals.py
Lines 71 to 74 in 1fd90d1
description = labels.ASSET_DELETE_WITH_NAME_WITH_USER_SUCCESS_MESSAGE % {"name": instance.name, "user": user} | |
create_notification(event="product_deleted", # template does not exists, it will default to "other" but this event name needs to stay because of unit testing | |
title=_("Deletion of %(name)s") % {"name": instance.name}, |
Lack of Authorization on Destructive Management Command in dojo/management/commands/flush_auditlog.py
Vulnerability | Lack of Authorization on Destructive Management Command |
---|---|
Description | The flush_auditlog management command allows for the deletion of audit log entries. This command, and the underlying functions it calls, do not perform any authorization checks. Consequently, any user with shell access to the application environment can execute this command to delete potentially critical audit data, regardless of their application-level permissions. |
django-DefectDojo/dojo/management/commands/flush_auditlog.py
Lines 6 to 9 in 1fd90d1
class Command(BaseCommand): | |
help = "Flush old audit log entries based on retention and batching settings" | |
def add_arguments(self, parser): |
Information Disclosure via User Attribution in Notifications in dojo/product_type/signals.py
Vulnerability | Information Disclosure via User Attribution in Notifications |
---|---|
Description | The __str__ representation of the Dojo_User model, which includes the user's first name, last name, and username, is used in notification messages for product_type_deleted events. These notifications are stored as Alerts in the database and are also dispatched to various channels (e.g., email, Slack, MS Teams, webhooks) based on system and user configurations. If these notifications are accessible to unauthorized users, it could lead to information disclosure or user enumeration. |
django-DefectDojo/dojo/product_type/signals.py
Lines 73 to 76 in 1fd90d1
description = labels.ORG_DELETE_WITH_NAME_WITH_USER_SUCCESS_MESSAGE % {"name": instance.name, "user": user} | |
create_notification(event="product_type_deleted", # template does not exists, it will default to "other" but this event name needs to stay because of unit testing | |
title=_("Deletion of %(name)s") % {"name": instance.name}, |
Potential Resource Exhaustion via Audit Log Flush Configuration in dojo/settings/settings.dist.py
Vulnerability | Potential Resource Exhaustion via Audit Log Flush Configuration |
---|---|
Description | The DD_AUDITLOG_FLUSH_BATCH_SIZE and DD_AUDITLOG_FLUSH_MAX_BATCHES settings, which control the audit log flushing process, lack explicit upper bound validation. While default values are reasonable, an administrator could configure extremely large values for these settings. A very large batch_size could lead to excessive memory consumption when fetching primary keys into a Python list and could also strain the database with a DELETE ... WHERE pk IN (...) query containing a massive number of IDs. Similarly, an extremely large max_batches could cause the flush task to run for an unacceptably long time, consuming resources. |
django-DefectDojo/dojo/settings/settings.dist.py
Lines 1935 to 1936 in 1fd90d1
AUDITLOG_FLUSH_BATCH_SIZE = env("DD_AUDITLOG_FLUSH_BATCH_SIZE") | |
AUDITLOG_FLUSH_MAX_BATCHES = env("DD_AUDITLOG_FLUSH_MAX_BATCHES") |
Audit Log User Context Misconfiguration in dojo/settings/settings.dist.py
Vulnerability | Audit Log User Context Misconfiguration |
---|---|
Description | The middleware for both django-auditlog and django-pghistory is inserted before crum.CurrentRequestUserMiddleware in dojo/settings/settings.dist.py . This is problematic because crum.CurrentRequestUserMiddleware is responsible for making the authenticated user available via request.user . As a result, audit entries generated by either django-auditlog or django-pghistory will likely have a None or anonymous user associated with them, leading to a loss of critical 'who did what' information in the audit trail. |
django-DefectDojo/dojo/settings/settings.dist.py
Lines 2030 to 2035 in 1fd90d1
if AUDITLOG_TYPE == "django-auditlog": | |
# Insert AuditlogMiddleware before CurrentRequestUserMiddleware | |
middleware_list.insert(crum_index, "dojo.middleware.AuditlogMiddleware") | |
elif AUDITLOG_TYPE == "django-pghistory": | |
# Insert pghistory HistoryMiddleware before CurrentRequestUserMiddleware | |
middleware_list.insert(crum_index, "dojo.middleware.PgHistoryMiddleware") |
Information Disclosure via Object Reconstruction in Audit View in dojo/views.py
Vulnerability | Information Disclosure via Object Reconstruction in Audit View |
---|---|
Description | The get_object_str function reconstructs a model instance from historical data (event.pgh_data ) and calls its __str__ method. For the Endpoint model, the __str__ method constructs a URL that includes the userinfo field. Since the Endpoint model is tracked by pghistory without explicitly excluding the userinfo field, any sensitive data (e.g., credentials) present in the userinfo part of an endpoint's URL at the time of an audit event would be stored in pgh_data . When get_object_str is called, this sensitive userinfo would be reconstructed and displayed on the action_history page, leading to information disclosure to any user with access to that page. |
django-DefectDojo/dojo/views.py
Lines 43 to 44 in 1fd90d1
temp_instance = model_class(**event.pgh_data) | |
return str(temp_instance) |
Lack of Authorization on Highly Destructive Management Command in dojo/management/commands/pghistory_clear.py
Vulnerability | Lack of Authorization on Highly Destructive Management Command |
---|---|
Description | The pghistory_clear management command, located in dojo/management/commands/pghistory_clear.py , allows for the truncation or dropping of database tables containing historical audit data. This is an extremely destructive and irreversible action. Upon reviewing the handle method of this command, it is evident that there are no authorization checks in place to verify the user's permissions (e.g., superuser status) before these operations are executed. While the command includes a confirmation prompt, this can be bypassed using the --force flag, further increasing the risk. Any user with shell access to the application environment could potentially execute this command, leading to catastrophic data loss of audit records. |
django-DefectDojo/dojo/management/commands/pghistory_clear.py
Lines 17 to 20 in 1fd90d1
class Command(BaseCommand): | |
help = "Clear all pghistory Event tables" | |
def add_arguments(self, parser): |
All finding details can be found in the DryRun Security Dashboard.
Summary
As part of evaluating and improving import/reimport performance we have selected django-pghistory as the new auditlog for Defect Dojo.
It is entirely based on Postgres triggers to maintain a history/audit log. This makes it run faster and offers new features such as reverting a record to an old version, adding richer context to processes and events, finding events on related records, etc.
This PR doesn't introduce new features yet, it just adds
django-pghistory
as replacement fordjango-auditlog
.This PR allows still for using
django-auditlog
as auditlog. There's also some builds published to Docker Hub: https://hub.docker.com/r/defectdojo/defectdojo-django/tags?name=pghistoryIn a future release
dhango-pghistory
will be the auditlog implementation anddjango-auditlog
will no longer be available for auditlogging.HowTo
To switch an instance over to
django-pghistory
, some steps are needed:DD_AUDITLOG_TYPE
todjango-pghistory
docker compose exec -it uwsgi ./manage.py pghistory backfill
to load the initial snapshot of data intodjango-pghistory
. (this is NOT a data migration fromdjango-auditlog
, but an initial snapshot from the current data in Defect Dojo).In the future both will be part of a new release which will perform the backfill automatically.
Once switched over, you cannot switch back (unless you know what you're doing).
The action history pages will always display the data from both
django-pghistory
anddjango-auditlog
. As these data formats are completely different, there are two tables on the action history page.Some notes about the implementation
django-pghistory
works with database triggers these are created as part of a migration regardless of which audit log is configured.django-auditlog
depending on the chosen auditlog type in settings.Performance
I did a couple of test runs to compare both audit log types using the JFrog Unified Very Many sample scan that contains ~14000 findings. These tests runs show a 20-30% speed improvement on my laptop running docker compose. In a (production) environment with an external database the difference might be bigger due to the increased latency.
django-pghistory
runs inside the database so there are a lot less network roundtrips needed.Scaling
djang-pghistory
documents two settings or design decisions that affect performance: https://django-pghistory.readthedocs.io/en/3.4.3/performance/Row level vs Statement level triggers.
Defect Dojo doesn't do (m)any bulk inserts/updates on the tracked models. So there's no benefit now to switch to Statement level triggers.
If needed we can switch in the future using a one-time schema-only migration.
Denormalizing Context
Context is stored in its own table and events have a foreign key relation to this table. If the Context is intensively used and updated during processes/requests and lots of parallel requests are happening that trigger pghistory events, there could be some contention on that Context table. To "solve" this we could choose to store the context in a column in the Event tables. I don't think the typical use-case in Defect Dojo would really benefit from this, it could be detrimental as it would require more storage and filtering/extracting data from the context might be slower.
If needed we can switch in the future using a one-time schema+data migration.
Indices
By default
djang-pghistory
doesn't create much indices, only a couple of ones we don't need. Changes made:Remove indices on foreign key fields in the event tables (i.e. user_id, found_by, etc). We don't query on those.
Add index on fields that we query on like
pgh_created
,pgh_label
.Add index on
obj_id
field used in joins/queriesAdd index on fields from the json context, i.e.
user
,remote_addr
,url
Flushing the Auditlog
This PR also optimize the existing auditlog flushing (fixes #10835):
timestamp
instead of adate
for filtering the entries to be deleted so we can use the database indexNext steps
When testing is succesfull and we want to make
django-pghistory
the default, we need to do the following in a follow up PRinitial_import
recorddjango-auditlog
django-auditlog
from MIDDLEWAREdjango-auditlog
installed apps (not sure, it might need to remain to be able to load LogEntry model)