Skip to content

SmartDrive-Platform/smartdrive-file-indexing-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”„ SMARTDRIVE - File Indexing Service

SMARTDRIVE File Indexing
Version
Java Spring Boot Elasticsearch AWS SQS
Real-time File Indexing & Search Pipeline Service

πŸ“‹ Table of Contents


🎯 Overview

The SMARTDRIVE File Indexing Service is a high-performance, event-driven service that processes file upload events and indexes them in Elasticsearch for fast search capabilities. It acts as the bridge between file storage and search services, ensuring that all uploaded files are properly indexed and searchable.

Key Capabilities

  • πŸ“₯ SQS Message Processing: Asynchronous processing of file upload events
  • πŸ” Elasticsearch Indexing: Real-time indexing of file metadata
  • πŸ”— AI Service Integration: Trigger AI metadata generation
  • πŸ“Š Data Pipeline: Reliable event processing with error handling
  • πŸ” Search Integration: Seamless integration with search service

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   File Storage  β”‚    β”‚   AWS SQS       β”‚    β”‚   File Indexing β”‚
β”‚   Service       │───►│   (Message      │───►│   Service       β”‚
β”‚   (Upload)      β”‚    β”‚   Queue)        β”‚    β”‚   (Consumer)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   AI Service    │◄───│   HTTP Client   │◄───│   AI Integrationβ”‚
β”‚   (Metadata)    β”‚    β”‚   (Async)       β”‚    β”‚   (Trigger)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                       β–Ό
                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚   Elasticsearch β”‚
                                              β”‚   (Index)       β”‚
                                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Service Components

  • SQS Message Listener: Processes file upload events from SQS
  • File Document Repository: Manages Elasticsearch operations
  • AI Service Integration: Triggers metadata generation
  • Index Management: Handles Elasticsearch index operations
  • Health Monitoring: Service health and status monitoring

✨ Features

πŸ“₯ Event Processing

  • SQS Message Consumption: Reliable message processing with acknowledgment
  • Event Deserialization: JSON message parsing and validation
  • Error Handling: Dead letter queue for failed messages
  • Retry Logic: Automatic retry for transient failures

πŸ” Indexing Capabilities

  • Real-time Indexing: Immediate indexing of uploaded files
  • Metadata Storage: Comprehensive file metadata storage
  • Search Optimization: Optimized index mapping for fast queries
  • Index Management: Automatic index creation and mapping

πŸ”— AI Integration

  • Metadata Generation: Trigger AI service for content analysis
  • Async Processing: Non-blocking AI service calls
  • Result Integration: Merge AI metadata with file data
  • Error Resilience: Graceful handling of AI service failures

πŸ“Š Monitoring & Observability

  • Message Processing Metrics: Track processing rates and errors
  • Indexing Performance: Monitor indexing speed and success rates
  • Health Checks: Comprehensive service health monitoring
  • OpenTelemetry Integration: Distributed tracing and metrics

πŸ› οΈ Tech Stack

Core Framework

  • Java 17: Modern Java with enhanced performance
  • Spring Boot 3.2: Rapid application development framework
  • Spring Cloud AWS: AWS service integration
  • Spring Data Elasticsearch: Elasticsearch integration

Messaging & Storage

  • AWS SQS: Message queuing service
  • Elasticsearch 8.x: Search and analytics engine
  • Jackson: JSON serialization/deserialization

Resilience & Monitoring

  • OpenTelemetry: Observability framework
  • Micrometer: Application metrics
  • Spring Boot Actuator: Health checks and monitoring
  • Resilience4j: Circuit breaker patterns

Documentation & Testing

  • OpenAPI 3: API documentation
  • Swagger UI: Interactive API documentation
  • JUnit 5: Unit testing framework
  • Testcontainers: Integration testing

πŸš€ Quick Start

Prerequisites

  • Java 17 or higher
  • Docker and Docker Compose
  • AWS SQS queue configured
  • Elasticsearch 8.x

Local Development Setup

  1. Clone the Repository

    git clone <repository-url>
    cd file-indexing-service
  2. Environment Configuration

    # Copy environment template
    cp .env.example .env
    
    # Configure your environment variables
    AWS_ACCESS_KEY_ID=your-access-key
    AWS_SECRET_ACCESS_KEY=your-secret-key
    AWS_REGION=us-east-1
    SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/your-queue
    ELASTICSEARCH_HOST=elasticsearch
    ELASTICSEARCH_PORT=9200
    AI_SERVICE_URL=http://ai-service:8082
  3. Start Dependencies

    # Start Elasticsearch
    docker-compose up -d elasticsearch
  4. Run the Application

    # Using Gradle
    ./gradlew bootRun
    
    # Or using Docker
    docker-compose up file-indexing-service
  5. Verify Installation

    # Health check
    curl http://localhost:8083/actuator/health
    
    # API documentation
    open http://localhost:8083/swagger-ui.html

βš™οΈ Configuration

Application Properties

spring:
  application:
    name: file-indexing-service
  
  elasticsearch:
    uris: http://elasticsearch:9200
  
  cloud:
    aws:
      credentials:
        access-key: ${AWS_ACCESS_KEY_ID}
        secret-key: ${AWS_SECRET_ACCESS_KEY}
      region:
        static: ${AWS_REGION}

app:
  sqs:
    queue-url: ${SQS_QUEUE_URL}
  
  ai-service:
    url: ${AI_SERVICE_URL}
  
  elasticsearch:
    index-name: file_metadata

management:
  endpoints:
    web:
      exposure:
        include: health,metrics,info
  endpoint:
    health:
      show-details: always

Environment Variables

Variable Description Default
AWS_ACCESS_KEY_ID AWS access key Required
AWS_SECRET_ACCESS_KEY AWS secret key Required
AWS_REGION AWS region us-east-1
SQS_QUEUE_URL SQS queue URL Required
ELASTICSEARCH_HOST Elasticsearch host elasticsearch
ELASTICSEARCH_PORT Elasticsearch port 9200
AI_SERVICE_URL AI service URL http://ai-service:8082

πŸ“Š Monitoring & Observability

OpenTelemetry Integration

  • Distributed Tracing: Track message processing across services
  • Metrics Collection: Processing rates and error metrics
  • Log Correlation: Link logs with trace IDs

Health Checks

  • Elasticsearch Health: Connection and index status
  • SQS Health: Queue connectivity and permissions
  • AI Service Health: Integration service status
  • Application Health: Overall service status

Metrics

  • Message Processing: Throughput and error rates
  • Indexing Performance: Indexing speed and success rates
  • AI Integration: Metadata generation success rates

πŸ”’ Security

AWS Security

  • IAM Roles: Least privilege access to SQS and S3
  • Credentials Management: Secure credential handling
  • Network Security: VPC and security group configuration

Data Security

  • Elasticsearch Security: TLS encryption and authentication
  • Message Security: SQS encryption in transit and at rest
  • Input Validation: Comprehensive message validation

πŸ§ͺ Testing

Unit Tests

# Run unit tests
./gradlew test

# Run with coverage
./gradlew test jacocoTestReport

Integration Tests

# Run integration tests
./gradlew integrationTest

Message Processing Tests

# Test SQS message processing
aws sqs send-message \
  --queue-url $SQS_QUEUE_URL \
  --message-body '{"contentId":"test123","fileName":"test.pdf","contentType":"application/pdf"}'

πŸš€ Deployment

Docker Deployment

# Build Docker image
docker build -t smartdrive-file-indexing-service .

# Run container
docker run -p 8083:8083 \
  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
  -e SQS_QUEUE_URL=$SQS_QUEUE_URL \
  smartdrive-file-indexing-service

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-indexing-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: file-indexing-service
  template:
    metadata:
      labels:
        app: file-indexing-service
    spec:
      containers:
      - name: file-indexing-service
        image: smartdrive/file-indexing-service:latest
        ports:
        - containerPort: 8083
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: access-key-id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: secret-access-key
        - name: SQS_QUEUE_URL
          value: "$(SQS_QUEUE_URL)"

Production Considerations

  • High Availability: Multiple replicas across zones
  • Auto-scaling: Horizontal pod autoscaling based on queue depth
  • Resource Limits: CPU and memory constraints
  • Monitoring: Prometheus and Grafana integration
  • Logging: Centralized log aggregation

🀝 Contributing

We welcome contributions to the SMARTDRIVE File Indexing Service! Please follow these guidelines:

Development Workflow

  1. Fork the Repository
  2. Create a Feature Branch: git checkout -b feature/amazing-feature
  3. Make Your Changes: Follow coding standards and add tests
  4. Run Tests: Ensure all tests pass
  5. Submit a Pull Request: Provide clear description and context

Code Standards

  • Java Code Style: Follow Google Java Style Guide
  • Documentation: Update README and API docs
  • Testing: Maintain >80% code coverage
  • Security: Follow security best practices

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ†˜ Support

Getting Help

  • Documentation: Check this README and API docs
  • Issues: Report bugs and feature requests on GitHub
  • Discussions: Join community discussions
  • Email: [email protected]

Community


Built with ❀️ by the SMARTDRIVE Team
Empowering intelligent file indexing and search

About

SmartDrive File Indexing Service - Elasticsearch indexing with SQS message processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published