Skip to content

epaitoo/wikipedia-edit-war-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🚨 Wikipedia Edit War Detection System

A real-time streaming application that detects edit wars on Wikipedia using Apache Kafka, Spring Boot, React, and Docker.

Java Spring Boot Kafka PostgreSQL React Docker Tests CI

🎯 What It Does

Monitors the Wikimedia EventStreams API in real-time and detects patterns indicating edit wars - situations where multiple users repeatedly revert each other's changes on the same article.

Real Detection: Successfully detected edit wars on pages like Frederick Trump, Hans van Manen, and more! βœ…

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Wikimedia API  │────▢│  Kafka Producer │────▢│   Apache Kafka  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                                                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  React Frontend │◀────│    REST API     │◀────│  Kafka Consumer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                                                         β–Ό
                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                β”‚   PostgreSQL    β”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

Component Description
kafka-producer-api Streams real-time Wikipedia edits to Kafka
kafka-consumer-api Consumes events, detects edit wars, exposes REST API
React Frontend Dashboard displaying real-time alerts (separate repo)

Technologies

  • Spring Boot 3.5.6 - Application framework
  • Apache Kafka (KRaft) - Event streaming (no ZooKeeper required)
  • Spring WebFlux - Reactive programming & Server-Sent Events
  • PostgreSQL 15 - Database persistence
  • Spring Data JPA - ORM with Hibernate
  • React 18 + TypeScript - Frontend dashboard
  • Docker & Docker Compose - Containerization
  • JUnit 5 & Mockito - Testing with TDD approach

πŸš€ Quick Start with Docker

The fastest way to run the entire stack:

Prerequisites

  • Docker and Docker Compose installed
  • Git

1. Clone and Configure

# Clone repository
git clone https://github.com/YOUR_USERNAME/springboot-kafka-realtime.git
cd springboot-kafka-realtime

# Create environment file
cp .env.example .env

# (Optional) Edit .env to change database credentials

2. Start Everything

# Build and start all services
docker-compose up --build

# Or run in background
docker-compose up --build -d

This starts:

  • βœ… PostgreSQL - Database with schema auto-initialized
  • βœ… Apache Kafka - Message broker (KRaft mode)
  • βœ… Producer - Streams Wikipedia events to Kafka
  • βœ… Consumer - Detects edit wars, serves REST API on port 8081

3. Verify It's Working

# Check all containers are running
docker-compose ps

# View logs
docker-compose logs -f

# Test the API
curl http://localhost:8081/api/health | jq
curl http://localhost:8081/api/stats | jq
curl http://localhost:8081/api/alerts | jq

4. Stop Everything

docker-compose down

# To also remove the database volume (fresh start)
docker-compose down -v

πŸ–₯️ Local Development (Without Docker)

If you prefer running services locally:

Prerequisites

  • Java 21+
  • Apache Kafka 3.8+ (KRaft mode)
  • PostgreSQL 15+
  • Maven 3.8+

Database Setup

# Create database and user
psql -U postgres
CREATE DATABASE editwars_detection;
CREATE USER editwar_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE editwars_detection TO editwar_user;
\c editwars_detection
# Run the schema migration
\i kafka-consumer-api/src/main/resources/db/migration/V1__init_schema.sql
\q

Kafka Setup (KRaft Mode)

# Download and extract Kafka
wget https://downloads.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz
tar -xzf kafka_2.13-3.8.0.tgz
cd kafka_2.13-3.8.0

# Generate cluster ID and format storage (first time only)
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Start Kafka
bin/kafka-server-start.sh config/kraft/server.properties

Application Setup

# Build project
./mvnw clean install

# Start Consumer (in one terminal)
cd kafka-consumer-api
../mvnw spring-boot:run

# Start Producer (in another terminal)
cd kafka-producer-api  
../mvnw spring-boot:run

πŸ“‘ REST API Endpoints

Base URL: http://localhost:8081/api

Method Endpoint Description
GET /health Health check
GET /stats System statistics
GET /alerts Get all alerts (paginated)
GET /alerts/{id} Get specific alert
GET /alerts/search?q={keyword} Search by page title
GET /alerts/status/{status} Filter by status
GET /alerts/severity/{level} Filter by severity
GET /alerts/recent Recent active alerts
POST /test/simulate-edit-war Simulate test data

Example Responses

# Get statistics
curl http://localhost:8081/api/stats | jq
{
  "totalAlerts": 12,
  "activeAlerts": 12,
  "resolvedAlerts": 0
}

# Search for alerts
curl "http://localhost:8081/api/alerts/search?q=trump" | jq

# Get high severity alerts
curl http://localhost:8081/api/alerts/severity/HIGH | jq

πŸ” Edit War Detection Algorithm

Criteria

An edit war is detected when:

  • βœ… 5+ edits on the same article within 1 hour
  • βœ… 2-3 distinct human editors (bots excluded)
  • βœ… Main namespace only (articles, not talk pages)
  • βœ… 50%+ conflict ratio (reverts or opposing changes)

Conflict Types

Type Description
Pure Reverts Edit returns article to a previous length
Opposing Edits One user adds content, another removes it

Severity Levels

Level Score Description
CRITICAL β‰₯0.8 Intense, rapid conflict
HIGH β‰₯0.6 Significant edit war
MEDIUM β‰₯0.4 Moderate conflict
LOW <0.4 Minor disagreement

πŸ§ͺ Testing

Test-Driven Development (TDD) approach with comprehensive coverage:

# Run all tests
./mvnw test

# Run specific test suites
./mvnw test -Dtest=AlertServiceTest
./mvnw test -Dtest=AlertControllerTest
./mvnw test -Dtest=EditWarDetectionServiceTest
./mvnw test -Dtest=PageEditWindowTest

Test Coverage

  • βœ… Unit tests for services, repositories, mappers
  • βœ… Integration tests with H2 in-memory database
  • βœ… REST API tests with WebTestClient
  • βœ… Edit war detection algorithm tests

πŸ“Š Project Structure

springboot-kafka-realtime/
β”œβ”€β”€ docker-compose.yml           # Container orchestration
β”œβ”€β”€ .env.example                 # Environment template
β”œβ”€β”€ kafka-producer-api/          # Wikimedia β†’ Kafka
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ src/main/java/.../
β”‚   β”‚   β”œβ”€β”€ ApiRealTimeChangesProducer.java
β”‚   β”‚   β”œβ”€β”€ ApiRealTimeChangesHandler.java
β”‚   β”‚   └── KafkaTopicConfig.java
β”‚   └── src/main/resources/
β”‚       β”œβ”€β”€ application.properties
β”‚       └── application-docker.properties
β”œβ”€β”€ kafka-consumer-api/          # Kafka β†’ Detection β†’ API
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ src/main/java/.../
β”‚   β”‚   β”œβ”€β”€ controller/          # REST endpoints
β”‚   β”‚   β”œβ”€β”€ service/             # Business logic
β”‚   β”‚   β”œβ”€β”€ entity/              # Domain models
β”‚   β”‚   └── persistence/         # Database layer
β”‚   └── src/main/resources/
β”‚       β”œβ”€β”€ application.properties
β”‚       β”œβ”€β”€ application-docker.properties
β”‚       └── db/migration/        # SQL schemas
└── README.md

🐳 Docker Configuration

Services

Service Image Port Description
postgres postgres:15-alpine 5433:5432 Database
kafka apache/kafka:latest 9092:9092 Message broker
producer Custom build - Wikimedia streamer
consumer Custom build 8081:8081 API server

Environment Variables

Create a .env file (see .env.example):

POSTGRES_DB=editwars_detection
POSTGRES_USER=editwar_user
POSTGRES_PASSWORD=your_secure_password

Useful Commands

# View logs for specific service
docker-compose logs -f consumer

# Rebuild single service
docker-compose up --build consumer

# Access PostgreSQL
docker exec -it editwars-postgres psql -U editwar_user -d editwars_detection

# Check Kafka topics
docker exec -it editwars-kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

🎯 Key Features

  • βœ… Real-time processing - Processes Wikipedia edits as they happen
  • βœ… Pattern recognition - Sophisticated conflict detection algorithm
  • βœ… Reactive architecture - Non-blocking I/O with Spring WebFlux
  • βœ… Database persistence - PostgreSQL with JPA/Hibernate
  • βœ… RESTful API - Comprehensive endpoints with pagination
  • βœ… Containerized - One-command deployment with Docker Compose
  • βœ… Test-driven - Extensive test coverage
  • βœ… Production-ready - Error handling, logging, health checks

πŸ› Troubleshooting

Port already in use?

# PostgreSQL conflict (if running locally)
# Change docker-compose.yml: "5433:5432" instead of "5432:5432"

# Or stop local PostgreSQL
sudo systemctl stop postgresql

No events appearing?

# Check producer logs
docker-compose logs -f producer

# Verify Kafka is receiving messages
docker exec -it editwars-kafka /opt/kafka/bin/kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic wikimedia-stream-api \
  --from-beginning

Database schema issues?

# Reset database (removes all data)
docker-compose down -v
docker-compose up --build

No alerts appearing?

This is normal! Real edit wars are rare (~0.01% of edits). Use test endpoints:

curl -X POST http://localhost:8081/api/test/simulate-edit-war | jq

πŸ“ Technical Highlights

Design Patterns

  • Repository Pattern (data access)
  • Mapper Pattern (DTO conversion)
  • Observer Pattern (event-driven)
  • Builder Pattern (object construction)

Architecture Principles

  • Clean Architecture / Layered Architecture
  • Separation of Concerns
  • Dependency Inversion
  • Single Responsibility

Best Practices

  • Test-Driven Development (TDD)
  • Spring Profiles for environment configuration
  • Docker multi-stage builds
  • Health checks for container orchestration

πŸ“„ License

MIT License - See LICENSE file for details

πŸ‘€ Author

Eugene Paitoo

LinkedIn


⭐ Star this repo if you find it useful!

This project demonstrates real-time stream processing, event-driven architecture, containerization, and production-grade Java development practices.

Releases

No releases published

Packages

No packages published