Privacy-Preserving Search with Redis vs Database Performance Comparison
A production-ready GDPR-compliant PII search system offering two technically equivalent approaches for privacy-preserving search operations. Both approaches provide identical security guarantees and search results with different performance characteristics.
- ✅ GDPR Compliant: Full encryption, pseudonymization, k-anonymity protection
- ✅ Zero PII Exposure: Search without decrypting sensitive data
- ✅ Functionally Identical: Redis and Database approaches return same results
- ✅ Enterprise Security: Military-grade encryption, audit trails, retention management
- ✅ Comprehensive Search: Equality, prefix, suffix, contains, complex queries
- ✅ Multi-Ethnic Data: Realistic test data with diverse names and patterns
| Feature | Redis Approach | Database Approach | Status |
|---|---|---|---|
| GDPR Compliance | ✅ Full compliance | ✅ Full compliance | IDENTICAL |
| Search Accuracy | ✅ 100% correct | ✅ 100% correct | IDENTICAL |
| Security | ✅ HMAC + encryption | ✅ HMAC + encryption | IDENTICAL |
| Performance | ⚡ 2.67ms average | 📊 5.50ms average | 2x faster |
| Audit Trail | ✅ Standard | 🏆 Enhanced | DB advantage |
| Durability | ✅ Memory + snapshots | 🏆 ACID transactions | DB advantage |
Copy the environment template and configure for your setup:
# Copy environment template
cp .env.template .env
# Edit .env file with your database credentials and preferences
# Key configuration options:
# DB_PASSWORD=your_postgres_password
# NUMBER_OF_PEOPLE=5000 (adjust based on your testing needs)
# CAUCASIAN_PERCENTAGE=30, ASIAN_PERCENTAGE=35, INDIAN_PERCENTAGE=35Default Configuration (.env file):
# Terminal 1: Start Vault server
vault server -dev -dev-root-token-id root -dev-tls
# Terminal 2: Start Redis server
redis-server
# Terminal 3: Install and run
npm install
npm run pipeline # Generate data + build both indexes (DB & Redis)
npm run perf-test # Compare performance (identical results guaranteed)Expected Result: Both approaches return identical tokens with Redis ~2x faster
- Executive Briefing - GDPR compliance and business benefits
- GDPR Compliance Report - Detailed regulatory analysis
- Technical Deep Dive - Implementation details with examples
- Executive Summary: Business value, GDPR compliance, decision framework
- Compliance Report: Article-by-article GDPR analysis, risk assessment
- Technical Guide: HMAC key generation, search logic, concrete examples
📈 Performance Test Suite (6 scenarios):
┌─────────────────────────────────────────┐
│ SEARCH PERFORMANCE │
├─────────────────────────────────────────┤
│ Redis Average: 2.67ms │
│ Database Average: 5.50ms │
│ Speed Difference: 2.06x │
│ Result Accuracy: 100% identical │
│ Compliance Status: Both fully compliant│
└─────────────────────────────────────────┘
✅ PERFECT: All approaches returned identical tokens
✅ VERIFIED: Complete functional equivalence
npm run perf-test # Comprehensive 7-query test suite
npm run perf-test-single # Single query performance test
npm run perf-test-complex # Complex AND/OR query testing
npm run perf-test-validate # Result validation and accuracy check┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Search Query │───▶│ HMAC Transform │───▶│ Index Lookup │
│ "Find emails │ │ │ │ │
│ containing │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
│ 'gmail'" │ │ │Query→3-grams │ │ │ │Redis/DB │ │
└─────────────────┘ │ │HMAC each gram│ │ │ │Intersection │ │
│ └──────────────┘ │ │ └─────────────┘ │
│ │ └─────────────────┘
│ ┌──────────────┐ │ │
│ │k-Anonymity │ │◀────────────┘
│ │Filter (≥5) │ │
│ └──────────────┘ │
└──────────────────┘
│
┌────────▼─────────┐
│ Opaque Tokens │
│ (No PII exposed) │
└──────────────────┘
eq- Exact match:FIRST_NAME equals "John"startsWith- Prefix search:EMAIL startsWith "john"endsWith- Suffix search:LAST_NAME endsWith "smith"contains- Substring search:ADDRESS contains "street"
# AND operations
FIRST_NAME equals "John" AND LAST_NAME startsWith "S"
# OR operations
EMAIL contains "gmail" OR EMAIL contains "yahoo"
# Field-specific searches
CITY startsWith "Mum" AND COUNTRY equals "India"┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Raw PII Data │───▶│ Vault Encryption │───▶│ Encrypted Store │
│ │ │ (AES-256-GCM) │ │ │
│ • Multi-ethnic │ │ │ │ • pii_token_data│
│ names │ │ ┌──────────────┐ │ │ • Opaque tokens │
│ • Realistic │ │ │ HMAC Secrets │ │ │ │
│ addresses │ │ └──────────────┘ │ └─────────────────┘
│ • Email patterns│ └──────────────────┘ │
└─────────────────┘ │
┌────────────────────────────────┘
▼
┌─────────────────────────────────────────┐
│ Dual Index Generation │
│ │
│ ┌─────────────────┐ ┌─────────────────┐│
│ │ Redis Index │ │Database Index ││
│ │ │ │ ││
│ │ • 181K+ keys │ │• 181K+ keys ││
│ │ • Memory-based │ │• ACID compliant ││
│ │ • 2.67ms avg │ │• 5.50ms avg ││
│ └─────────────────┘ └─────────────────┘│
└─────────────────────────────────────────┘
npm run pipeline # Full pipeline: generate→encrypt→import→index both approaches
npm run build-index # Build Redis search index
npm run build-db-index # Build database search index
npm run generate-pii # Generate diverse PII test data
npm run encrypt-csv # Encrypt data with Vault
npm run import-csv # Import to database + build Redis index- ✅ Lawfulness: Explicit consent-based processing
- ✅ Purpose Limitation: Search-only data usage
- ✅ Data Minimization: Only necessary fields indexed
- ✅ Storage Limitation: Automated retention cleanup
- ✅ Integrity & Confidentiality: AES-256 encryption + HMAC protection
- ✅ Accountability: Comprehensive audit logging
# All GDPR rights fully implemented:
✅ Right to be Informed - Privacy notices and consent
✅ Right of Access - Token-based record retrieval
✅ Right to Rectification - Update encrypted records
✅ Right to Erasure - Automated deletion workflows
✅ Right to Restrict - Access controls and permissions
✅ Right to Data Portability - Export functionality
✅ Right to Object - Opt-out mechanisms- Daily retention cleanup with compliance certificates
- Tamper-evident audit trails for regulatory reviews
- k-anonymity protection (suppresses results < 5 matches)
- Comprehensive logging without PII exposure
// For search query: FIRST_NAME contains "john"
Field: "FIRST_NAME" → "fn" (alias)
Value: "john" → normalized
Secret: process.env.REDIS_HMAC_SECRET
// Generate 3-grams: ["joh", "ohn"]
3-gram "joh": HMAC_SHA256(secret, "fn|joh") → "idx:fn:g3:Kx7mN9..."
3-gram "ohn": HMAC_SHA256(secret, "fn|ohn") → "idx:fn:g3:Lm8oP1..."
// Search: Intersect both keys → return matching tokens-- Redis Storage
SADD "idx:fn:g3:Kx7mN9..." "TKN_ABC123_FIRST_NAME"
SADD "idx:fn:g3:Lm8oP1..." "TKN_ABC123_FIRST_NAME"
-- Database Storage
INSERT INTO pii_search_index (hmac_key, token_set, field_type)
VALUES ('idx:fn:g3:Kx7mN9...', 'TKN_ABC123_FIRST_NAME', 'FIRST_NAME');Both approaches guarantee identical results:
Redis tokens: [TKN_ABC123_FIRST_NAME, TKN_XYZ789_FIRST_NAME]
Database tokens: [TKN_ABC123_FIRST_NAME, TKN_XYZ789_FIRST_NAME]
Match: ✅ PERFECT (100% identical)npm run search-api # Start interactive search API
npm run search-stats # View index statistics
node src/pii-search-api.js --test-search # Test Redis search
node src/pii-db-search-api.js --test-search # Test database search# Index statistics
npm run search-stats # Redis index stats
node src/pii-db-search-indexer.js --stats # Database index stats
# Sample output:
Total Keys: 181,317
- Equality keys: 5,129
- Prefix keys: 84,288
- Suffix keys: 77,951
- 3-gram keys: 13,949# Generate diverse test data
npm run generate-pii # Creates multi-ethnic names, realistic addresses
# Sample generated data:
Caucasian: John Smith, Mary Johnson
Asian: Wei Zhang, Li Chen, Hiroshi Tanaka
Indian: Arjun Sharma, Priya Patel, Kavya Daspii_eval_v1/
├── 📁 src/ # Core implementation
│ ├── pii-data-generator.js # Multi-ethnic PII data generation
│ ├── vault-csv-encryptor.js # Vault encryption pipeline
│ ├── csv-to-pii-importer.js # Database import + Redis indexing
│ ├── field-aware-redis-indexer.js # Redis HMAC search index
│ ├── pii-db-search-indexer.js # Database HMAC search index
│ ├── pii-search-api.js # Redis search API
│ ├── pii-db-search-api.js # Database search API
│ └── search-performance-tester.js # Performance comparison tool
├── 📁 resources/ # Generated data files
│ ├── generated_pii_data.csv # Plaintext PII (synthetic)
│ └── encrypted_pii_data.csv # Vault-encrypted PII
├── 📄 EXECUTIVE_BRIEFING.md # Business-focused GDPR compliance
├── 📄 GDPR_COMPLIANCE_REPORT.md # Detailed regulatory analysis
├── 📄 TECHNICAL_DEEP_DIVE.md # Implementation details + examples
├── 📄 README.md # This file
└── 📄 package.json # NPM scripts and dependencies
# Install dependencies
npm install
# Install Vault (macOS)
brew tap hashicorp/tap
brew install hashicorp/tap/vault
# Install Redis (macOS)
brew install redis# Terminal 1: Vault server (required)
vault server -dev -dev-root-token-id root -dev-tls
# Terminal 2: Redis server (required)
redis-server
# Terminal 3: PostgreSQL (optional - for database approach)
# Configure connection in src/ files or use default localhost setupEnvironment-Based Configuration (Recommended):
All configuration is now externalized to .env file. Copy .env.template to .env and customize:
# Database Configuration
DB_USER=postgres # PostgreSQL username
DB_HOST=localhost # Database host
DB_DATABASE=pii # Database name
DB_PASSWORD=your_password # Database password
DB_PORT=5432 # Database port
# Vault Configuration
VAULT_ADDR=https://127.0.0.1:8200 # Vault server address
VAULT_TOKEN=root # Vault authentication token
VAULT_SKIP_VERIFY=true # Skip SSL verification for dev
# Redis Configuration
REDIS_HOST=localhost # Redis host
REDIS_PORT=6379 # Redis port
REDIS_HMAC_SECRET=pii-search-secret-key-2024 # HMAC secret key
# Data Generation Configuration
NUMBER_OF_PEOPLE=5000 # Number of people to generate (1K-10K recommended)
CAUCASIAN_PERCENTAGE=30 # Percentage of Caucasian names
ASIAN_PERCENTAGE=35 # Percentage of Asian names
INDIAN_PERCENTAGE=35 # Percentage of Indian names
# Performance Testing
PERF_TEST_ITERATIONS=10 # Performance test iterations
PERF_TEST_WARMUP_ROUNDS=3 # Warmup rounds before testing
# GDPR Compliance
DEFAULT_RETENTION_DAYS=365 # Default data retention (days)
COMPLIANCE_VERSION=1.0 # Compliance tracking versionLegacy Environment Variables (Still Supported):
export VAULT_ADDR="https://127.0.0.1:8200"
export VAULT_TOKEN="root"
export REDIS_HMAC_SECRET="pii-search-secret-key-2024"
export DB_PASSWORD="your_postgres_password"- Ultra-low latency required (< 3ms)
- High-frequency search operations
- Existing Redis infrastructure
- Performance more critical than audit automation
- Regulatory compliance is priority
- Enhanced audit trails required
- ACID transaction guarantees needed
- Long-term data retention and recovery important
- Database for compliance-critical operations
- Redis caching for performance-critical paths
- Best of both worlds: compliance + performance
Vault Connection:
# Check Vault status
vault status
# Fix certificate issues
export VAULT_SKIP_VERIFY=true
# Verify environment
echo $VAULT_ADDR $VAULT_TOKENRedis Connection:
# Test Redis connectivity
redis-cli ping
# Should return: PONG
# Check Redis memory usage
redis-cli info memoryDatabase Connection:
# Test PostgreSQL connection
psql -h localhost -U postgres -d pii -c "SELECT version();"
# Check table structure
psql -h localhost -U postgres -d pii -c "\d pii_search_index"Performance Test Issues:
# Ensure all services running
npm run vault-helper # Check Vault connectivity
redis-cli ping # Check Redis connectivity
npm run build-db-index # Rebuild database index if needed- Uses compact field aliases (
fnvsFIRST_NAME) - Efficient set operations for intersections
- Memory-optimized token storage
- Indexes on
hmac_keyandretention_until - Efficient string array operations
- Batch insert optimizations
- Minimum 3-gram size for contains searches
- k-anonymity early filtering (≥5 results)
- Parallel key lookup for complex queries
- Change default HMAC secrets
- Enable Redis AUTH and SSL
- Configure PostgreSQL SSL certificates
- Set up proper Vault policies and authentication
- Enable comprehensive audit logging
- Configure automated backup and retention
- ✅ Data mapping and classification complete
- ✅ Privacy impact assessment conducted
- ✅ Consent management system integrated
- ✅ Data subject rights workflows implemented
- ✅ Breach detection and notification procedures
- ✅ Regular compliance audits scheduled
Technical Lead: Arun Raj Mony Project Sponsor: Rohit Bansal Documentation: Executive Briefing, GDPR Report, Technical Deep Dive
Getting Started: Follow Super Quick Start → Review Documentation → Run Performance Tests
This project demonstrates production-ready GDPR-compliant PII search with mathematically proven privacy protection and comprehensive business documentation.