DefensePro Forensics Data Analysis & Report Generator

A professional Python-based tool that processes DefensePro forensics data from CSV files (raw or zipped) and generates comprehensive HTML and PDF reports with interactive visualizations for both technical and sales audiences.

🚀 Features

Memory-Efficient Processing: Handles files from 1MB to 1GB+ using chunked processing
Intelligent Date Parsing: Automatically detects and handles multiple date formats
Month-to-Month Trends: Analyzes complete calendar months for accurate trend analysis
Interactive Visualizations: Professional charts with configurable themes using Plotly
Comprehensive Chart Customization: 6 color palettes and 11 configurable chart types
Dual Output Formats: Generates both HTML (interactive) and PDF reports
Cross-Platform: Works on Windows, Linux, and macOS
Comprehensive Analysis: Both holistic (entire dataset) and trend (monthly) analysis
Batch Processing: Process multiple files automatically
Professional Reports: Executive summaries and detailed technical analysis

📋 Requirements

Python 3.8 or higher
8-16GB RAM recommended for large files
Internet connection for initial setup (package installation)

🛠️ Installation

Option 1: Quick Setup (Recommended)

Clone or download this repository
```
cd SE_new_report
```
Create your configuration file

The script will automatically create config.py from config_example.py on first run. Alternatively, you can create it manually:
```
# Windows PowerShell
Copy-Item config_example.py config.py

# Linux/Mac
cp config_example.py config.py
```
Note: config.py is not tracked by git, so your custom settings will remain local.
Install dependencies
```
pip install -r requirements.txt
```
Install Playwright for PDF generation (optional but recommended)
```
playwright install chromium
```

Option 2: Virtual Environment (Recommended for isolation)

Create virtual environment

python -m venv forensics_env

# Windows
forensics_env\Scripts\activate

# Linux/Mac
source forensics_env/bin/activate

Create your configuration file

# Windows PowerShell
Copy-Item config_example.py config.py

# Linux/Mac
cp config_example.py config.py

Install dependencies

pip install -r requirements.txt
playwright install chromium

Dependencies

Core dependencies automatically installed:

polars - High-performance data processing
plotly - Interactive visualizations
jinja2 - HTML templating
playwright - PDF generation (or weasyprint as fallback)
python-dateutil - Flexible date parsing
tqdm - Progress bars
click - Command-line interface
psutil - Memory monitoring

📁 Project Structure

SE_new_report/
├── analyzer.py              # Main orchestrator script
├── data_processor.py        # CSV parsing and data processing
├── report_generator.py      # HTML/PDF generation
├── visualizations.py        # Chart creation logic
├── utils.py                 # Helper functions
├── config_example.py        # Configuration template (tracked in git)
├── config.py               # Your configuration (auto-created, not tracked)
├── requirements.txt        # Python dependencies
├── README.md               # This file
├── forensics_input/        # Input directory (place files here)
│   └── .gitkeep
└── report_files/           # Output directory
    └── .gitkeep

🎯 Quick Start

1. Prepare Input Data

Place your DefensePro forensics export files in the forensics_input/ directory:

CSV files: Direct forensics exports
ZIP files: Compressed forensics exports (will be automatically extracted)

Expected CSV columns (some may be missing depending on device):

S.No, Start Time, End Time, Device IP Address, Threat Category
Attack Name, Policy Name, Action, Attack ID, Source IP Address
Source Port, Destination IP Address, Destination Port, Direction
Protocol, Radware ID, Duration, Total Packets, Packet Type
Total Mbits, Max pps, Max bps, Physical Port, Risk, VLAN Tag
Footprint, Device Name, Device Type, Workflow Rule Process
Activation Id, Protected Object

2. Run Analysis

Basic usage (processes all files in forensics_input/):

python analyzer.py

Advanced usage:

# Specify custom directories
python analyzer.py --input-dir /path/to/files --output-dir /path/to/reports

# Generate only HTML reports
python analyzer.py --format html

# Generate only PDF reports  
python analyzer.py --format pdf

# Enable verbose logging
python analyzer.py --verbose

# Help
python analyzer.py --help

3. View Results

Reports are generated in the report_files/ directory:

{filename}_report.html - Interactive HTML report
{filename}_report.pdf - PDF version for sharing
batch_summary_{timestamp}.html - Summary when processing multiple files

📊 Report Contents

Executive Summary

Total events, date range, daily averages
Top attack types and trends
Business impact assessment

Month-to-Month Trends (when available)

Security events over time
Top attack types by month
Attack volume trends (Mbits, PPS, BPS)
Attack intensity heatmap by hour and month

Comprehensive Analysis (Entire Period)

Attack type distribution
Top source IP addresses
Protocol distribution
Daily attack timeline

Detailed Analysis

Top 10 attack types with percentages
Top 10 source IPs
Top 10 targeted destinations
Data quality and methodology notes

🔧 Configuration

Configuration File Setup

The tool uses config.py for all settings. On first run, this file is automatically created from config_example.py if it doesn't exist. Your config.py file is not tracked by git, allowing you to maintain custom settings without affecting version control.

To reset to default settings, simply delete config.py and run the script again, or manually copy from config_example.py.

Performance Tuning

Edit config.py to adjust:

CHUNK_SIZE: Rows processed per chunk (default: 50,000)
MAX_MEMORY_USAGE_GB: Memory warning threshold (default: 2GB)

Chart Customization

The tool provides extensive chart customization options in config.py:

Color Themes: Choose from 6 professionally designed color palettes:

ACTIVE_COLOR_PALETTE: Switch between 'radware_corporate', 'professional_blue', 'modern_minimal', 'vibrant_corporate', 'high_contrast', 'colorblind_friendly'

Chart Types: Configure visualization types for each chart:

CHART_PREFERENCES: Set chart types (line, bar, pie, donut, heatmap, area, etc.)

Individual Overrides: Customize specific chart colors:

CHART_COLOR_ASSIGNMENTS: Override colors for individual charts while keeping global theme

Example - Switch to Professional Blue theme:

ACTIVE_COLOR_PALETTE = 'professional_blue'  # Instead of 'radware_corporate'

Example - Change chart types:

CHART_PREFERENCES = {
    'monthly_events_trend': {
        'default_type': 'line',  # Change from 'bar' to 'line'
        # ... other configuration
    },
    'attack_type_distribution': {
        'default_type': 'donut',  # Change from 'pie' to 'donut'
        # ... other configuration
    }
}

📈 Performance Guidance

File Size Guidelines

Small files (< 10MB): Process in seconds
Medium files (10-100MB): Process in 1-2 minutes
Large files (100MB-1GB): Process in 2-15 minutes
Very large files (> 1GB): May take 15+ minutes

Memory Management

Tool automatically uses chunked processing
Memory usage scales with chunk size, not file size
Monitor memory warnings in verbose mode

Troubleshooting Performance

Reduce chunk size if memory warnings appear
Close other applications when processing large files
Use SSD storage for better I/O performance

🐛 Troubleshooting

Common Issues

1. "No CSV or ZIP files found"

Ensure files are in the forensics_input/ directory
Check file extensions (.csv or .zip)
Verify file permissions

2. "Missing required columns"

Check CSV has required columns: Start Time, Attack Name, Source IP Address, Destination IP Address
Verify CSV is a DefensePro forensics export (not traffic data)

3. "Failed to parse dates"

Check date format in CSV (common: MM.DD.YYYY HH:MM:SS)
Tool auto-detects formats but may need manual verification

4. "PDF generation failed"

Install Playwright: pip install playwright && playwright install chromium
Alternative: Install WeasyPrint: pip install weasyprint
Manual fallback: Open HTML in browser and use "Print to PDF"

5. "Memory errors"

Reduce CHUNK_SIZE in config.py
Close other applications
Process files individually instead of batch

6. "Chart generation failed"

Install visualization dependencies: pip install plotly kaleido
Check internet connection for initial Plotly setup

Getting Help

Enable verbose logging: python analyzer.py --verbose
Check log output for specific error messages
Verify file formats match expected CSV structure
Test with smaller files first

📋 Input File Specifications

Supported Formats

CSV files: Direct DefensePro forensics exports
ZIP files: Compressed CSV files (auto-extracted)

Date Format Handling

Tool automatically detects common formats:

MM.DD.YYYY HH:MM:SS (primary format)
DD.MM.YYYY HH:MM:SS
YYYY-MM-DD HH:MM:SS
Various delimiter combinations (., /, -)

File Size Limits

Tested: Up to 10 million rows
Recommended: Under 1GB per file
Memory: Scales with chunk size, not file size

📈 Analysis Methodology

Month-to-Month Trends

Complete months only: Excludes partial months at dataset boundaries
Fair comparisons: Ensures accurate trend analysis
Minimum requirement: At least 1 complete month of data

Holistic Analysis

Entire dataset: Uses all available data regardless of month boundaries
Comprehensive statistics: Full picture of security posture
No time filtering: Maximum data utilization

Data Quality

Intelligent parsing: Handles missing columns gracefully
Validation: Verifies data integrity throughout processing
Transparency: Reports data quality issues and exclusions

🔐 Security & Privacy

Local processing: All data remains on your machine
No network transmission: Data never sent to external services
Temporary files: Automatically cleaned up after processing
Memory management: Sensitive data cleared from memory

🤝 Support & Contributing

Reporting Issues

When reporting issues, please include:

Input file characteristics (size, date range, format)
Error messages from verbose logging
System specifications (OS, Python version, RAM)
Command used and expected vs actual behavior

Performance Testing

Tool performance varies by:

Hardware: CPU, RAM, storage type
File characteristics: Size, date range, data density
System load: Other running applications

📄 License

This tool is designed for internal use with DefensePro forensics data. Please ensure compliance with your organization's data handling policies.

🔄 Version History

Version History

Version	Change/Fixes/Features
v2.0.6	- 2/27/26 - Fixed month exclusion caused by date format mismatch: detection now correctly identifies `%H:%M` (no-seconds) timestamps instead of always assuming `%H:%M:%S`. Added no-seconds variants to DATE_FORMATS. Fixed last month (e.g. December) being incorrectly excluded when data ends within last 7 days of month. Fixed duplicate log output by clearing existing handlers before setup. Improved Attack Volume Trends chart spacing.
v2.0.5	- 11/27/25 - Fixed date identification with / separator (e.g. 11/27/2025). Improved full month identification if logged events does not start on the 1st of the month. Cosmetical change- modified headline- removed "Top 5" from "Security Events by Policy". Removed 0.0.0.0 and Multiple from Top 10 Sources and Top 10 attacked destinations
v2.0.4	- 11/7/25 - Fixed EXCLUDE_FILTER - was referenced twice in config.py example
v2.0.3	- 11/3/25 - Fixed 2 charts - Top 5 Attacks by Max Bandwidth and PPS. Added config_example.py and removed config.py from git tracking. Enhanced identification of the last full month.
v2.0.2	- 10/24/25 - Fixed CHART_PREFERENCES variables
v2.0.1	- 10/24/25 - Major UX Enhancement: Added user-configurable control for chart types, layouts, and colors. Improved config.py architecture for intuitive customization. Introduced color palettes with individual chart override support if needed. Updated documentation with customization examples.
v2.0.0	- 10/23/25 - Added new charts: 1.Top 5 attacks by Gbps 2. Top 5 attacks by PPS 3. Security Events by Policy
v1.1.9	- 10/21/25 - Enhanced identification of the first complete month(challenge with Packet Anomalies unfiltered)
v1.1.8	- 10/21/25 - Enhanced Attack Type Distribution pie chart visualization, style and to avoid overlap between categories and title
v1.1.7	- 10/21/25 - Updated height of the bar charts to not overlap legend with axis x text

Added autofont adjustment if length of the text is too long (longest attack duration for example)
Added days to longest attack duration
Added excluded events text to the execs summary html | | v1.1.6 | - 10/21/25 Removed zoom from bar charts, removed vertical zoom from Daily Attack events | | v1.1.5 | - 10/21/25 Bugfix- inconsistent statistics for Max Gbps and Max PPS in Summary statistics and Volume trends | | v1.1.4 | - 10/20/25 Enhancment to better identify complete months | | v1.1.3 | - Bugfix automatically detecting the date format | | v1.1.2 | - Added configurable charts customizations | | v1.1.1 | - Fixed/enhanced accuracy in automatic date identification
Added FORCE_DATE_FORMAT variable in config.py
Fixed csv processing using lazy method
Removed Data Quality Notes section | | v1.1.0 | - Added configurable output format (html, pdf, both) in config.py variable OUTPUT_FORMATS
Added support for columns header name variations for both 'Total Packets' and 'Total Packets Dropped', also 'Total Mbits' and 'Total Mbits Dropped'
Added logic- If VOLUME_UNIT is MB -> show Mbps, if GB -> show Gbps, if TB -> show Gbps
Added cdn mode - reduced HTML size from 37MB to 116Kb
Added Expandable details for Summary statistics | | v1.0.1 | - Added filtering. Use new EXCLUDE_FILTERS var under config.py | | v1.0.0 | - Initial release
- Support for CSV and ZIP input files
- HTML and PDF report generation
- Interactive Plotly visualizations
- Memory-efficient processing
- Cross-platform compatibility
- Batch processing support |

🔄 Key Improvements & Features

Core Analysis Engine

Memory-Efficient Processing: Intelligent chunked processing handles files from MB to GB+ sizes
Smart Date Detection: Automatic format recognition with manual override capabilities
Complete Month Analysis: Sophisticated algorithm identifies and analyzes complete calendar months for accurate trending
Data Quality Validation: Built-in validation and cleansing with transparent reporting

Advanced Visualization System

Professional Color Palettes: 6 scientifically designed themes including corporate branding, accessibility, and colorblind-friendly options
Flexible Chart Types: 11 fully configurable visualization types (line, bar, pie, donut, heatmap, area, stacked, horizontal)
Granular Customization: Individual chart color overrides while maintaining global theme consistency
Modern Configuration Architecture: Clean separation of settings and logic with immediate hot-reload capability

Comprehensive Security Analysis

Multi-Dimensional Trending: Month-over-month analysis of attack patterns, volumes, and intensities
Attack Profiling: Detailed breakdown by type, source, protocol, policy, and temporal patterns
Performance Metrics: Bandwidth utilization, packet rates, and volume analysis with configurable units
Executive Reporting: Professional summaries with expandable technical details

User Experience Excellence

One-Click Theming: Instantly switch color schemes across entire report suite
Cross-Platform Compatibility: Seamless operation on Windows, Linux, and macOS
Dual Output Formats: Interactive HTML and print-ready PDF with consistent styling
Batch Processing: Multiple file analysis with consolidated summary reporting
Command-Line Flexibility: Comprehensive CLI with format control and verbose logging

Technical Architecture

Configuration-Driven: All customization through centralized, well-documented configuration files
Performance Optimized: CDN-based chart delivery reduces file sizes from 37MB to 116KB
Extensible Design: Modular architecture supports easy addition of new chart types and analysis methods
Professional Deployment: Ready for enterprise environments with comprehensive troubleshooting documentation

🚀 Quick Example

# 1. Place forensics CSV files in forensics_input/
# 2. Run analysis
python analyzer.py --verbose

# 3. Open generated reports in report_files/
# HTML: Interactive charts and analysis
# PDF: Print-ready version for sharing

For additional help: python analyzer.py --help

FilesExpand file tree

README.md

Latest commit

History