- Overview
- Architecture
- Features
- Tech Stack
- Quick Start
- Configuration
- Monitoring & Observability
- Security
- Testing
- Deployment
- Contributing
- License
- Support
The SMARTDRIVE File Indexing Service is a high-performance, event-driven service that processes file upload events and indexes them in Elasticsearch for fast search capabilities. It acts as the bridge between file storage and search services, ensuring that all uploaded files are properly indexed and searchable.
- π₯ SQS Message Processing: Asynchronous processing of file upload events
- π Elasticsearch Indexing: Real-time indexing of file metadata
- π AI Service Integration: Trigger AI metadata generation
- π Data Pipeline: Reliable event processing with error handling
- π Search Integration: Seamless integration with search service
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β File Storage β β AWS SQS β β File Indexing β
β Service βββββΊβ (Message βββββΊβ Service β
β (Upload) β β Queue) β β (Consumer) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β AI Service ββββββ HTTP Client ββββββ AI Integrationβ
β (Metadata) β β (Async) β β (Trigger) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Elasticsearch β
β (Index) β
βββββββββββββββββββ
- SQS Message Listener: Processes file upload events from SQS
- File Document Repository: Manages Elasticsearch operations
- AI Service Integration: Triggers metadata generation
- Index Management: Handles Elasticsearch index operations
- Health Monitoring: Service health and status monitoring
- SQS Message Consumption: Reliable message processing with acknowledgment
- Event Deserialization: JSON message parsing and validation
- Error Handling: Dead letter queue for failed messages
- Retry Logic: Automatic retry for transient failures
- Real-time Indexing: Immediate indexing of uploaded files
- Metadata Storage: Comprehensive file metadata storage
- Search Optimization: Optimized index mapping for fast queries
- Index Management: Automatic index creation and mapping
- Metadata Generation: Trigger AI service for content analysis
- Async Processing: Non-blocking AI service calls
- Result Integration: Merge AI metadata with file data
- Error Resilience: Graceful handling of AI service failures
- Message Processing Metrics: Track processing rates and errors
- Indexing Performance: Monitor indexing speed and success rates
- Health Checks: Comprehensive service health monitoring
- OpenTelemetry Integration: Distributed tracing and metrics
- Java 17: Modern Java with enhanced performance
- Spring Boot 3.2: Rapid application development framework
- Spring Cloud AWS: AWS service integration
- Spring Data Elasticsearch: Elasticsearch integration
- AWS SQS: Message queuing service
- Elasticsearch 8.x: Search and analytics engine
- Jackson: JSON serialization/deserialization
- OpenTelemetry: Observability framework
- Micrometer: Application metrics
- Spring Boot Actuator: Health checks and monitoring
- Resilience4j: Circuit breaker patterns
- OpenAPI 3: API documentation
- Swagger UI: Interactive API documentation
- JUnit 5: Unit testing framework
- Testcontainers: Integration testing
- Java 17 or higher
- Docker and Docker Compose
- AWS SQS queue configured
- Elasticsearch 8.x
-
Clone the Repository
git clone <repository-url> cd file-indexing-service
-
Environment Configuration
# Copy environment template cp .env.example .env # Configure your environment variables AWS_ACCESS_KEY_ID=your-access-key AWS_SECRET_ACCESS_KEY=your-secret-key AWS_REGION=us-east-1 SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/your-queue ELASTICSEARCH_HOST=elasticsearch ELASTICSEARCH_PORT=9200 AI_SERVICE_URL=http://ai-service:8082
-
Start Dependencies
# Start Elasticsearch docker-compose up -d elasticsearch
-
Run the Application
# Using Gradle ./gradlew bootRun # Or using Docker docker-compose up file-indexing-service
-
Verify Installation
# Health check curl http://localhost:8083/actuator/health # API documentation open http://localhost:8083/swagger-ui.html
spring:
application:
name: file-indexing-service
elasticsearch:
uris: http://elasticsearch:9200
cloud:
aws:
credentials:
access-key: ${AWS_ACCESS_KEY_ID}
secret-key: ${AWS_SECRET_ACCESS_KEY}
region:
static: ${AWS_REGION}
app:
sqs:
queue-url: ${SQS_QUEUE_URL}
ai-service:
url: ${AI_SERVICE_URL}
elasticsearch:
index-name: file_metadata
management:
endpoints:
web:
exposure:
include: health,metrics,info
endpoint:
health:
show-details: always
Variable | Description | Default |
---|---|---|
AWS_ACCESS_KEY_ID |
AWS access key | Required |
AWS_SECRET_ACCESS_KEY |
AWS secret key | Required |
AWS_REGION |
AWS region | us-east-1 |
SQS_QUEUE_URL |
SQS queue URL | Required |
ELASTICSEARCH_HOST |
Elasticsearch host | elasticsearch |
ELASTICSEARCH_PORT |
Elasticsearch port | 9200 |
AI_SERVICE_URL |
AI service URL | http://ai-service:8082 |
- Distributed Tracing: Track message processing across services
- Metrics Collection: Processing rates and error metrics
- Log Correlation: Link logs with trace IDs
- Elasticsearch Health: Connection and index status
- SQS Health: Queue connectivity and permissions
- AI Service Health: Integration service status
- Application Health: Overall service status
- Message Processing: Throughput and error rates
- Indexing Performance: Indexing speed and success rates
- AI Integration: Metadata generation success rates
- IAM Roles: Least privilege access to SQS and S3
- Credentials Management: Secure credential handling
- Network Security: VPC and security group configuration
- Elasticsearch Security: TLS encryption and authentication
- Message Security: SQS encryption in transit and at rest
- Input Validation: Comprehensive message validation
# Run unit tests
./gradlew test
# Run with coverage
./gradlew test jacocoTestReport
# Run integration tests
./gradlew integrationTest
# Test SQS message processing
aws sqs send-message \
--queue-url $SQS_QUEUE_URL \
--message-body '{"contentId":"test123","fileName":"test.pdf","contentType":"application/pdf"}'
# Build Docker image
docker build -t smartdrive-file-indexing-service .
# Run container
docker run -p 8083:8083 \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e SQS_QUEUE_URL=$SQS_QUEUE_URL \
smartdrive-file-indexing-service
apiVersion: apps/v1
kind: Deployment
metadata:
name: file-indexing-service
spec:
replicas: 3
selector:
matchLabels:
app: file-indexing-service
template:
metadata:
labels:
app: file-indexing-service
spec:
containers:
- name: file-indexing-service
image: smartdrive/file-indexing-service:latest
ports:
- containerPort: 8083
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-access-key
- name: SQS_QUEUE_URL
value: "$(SQS_QUEUE_URL)"
- High Availability: Multiple replicas across zones
- Auto-scaling: Horizontal pod autoscaling based on queue depth
- Resource Limits: CPU and memory constraints
- Monitoring: Prometheus and Grafana integration
- Logging: Centralized log aggregation
We welcome contributions to the SMARTDRIVE File Indexing Service! Please follow these guidelines:
- Fork the Repository
- Create a Feature Branch:
git checkout -b feature/amazing-feature
- Make Your Changes: Follow coding standards and add tests
- Run Tests: Ensure all tests pass
- Submit a Pull Request: Provide clear description and context
- Java Code Style: Follow Google Java Style Guide
- Documentation: Update README and API docs
- Testing: Maintain >80% code coverage
- Security: Follow security best practices
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check this README and API docs
- Issues: Report bugs and feature requests on GitHub
- Discussions: Join community discussions
- Email: [email protected]
- GitHub: SMARTDRIVE File Indexing Service
- Discord: Join our community server
- Blog: Latest updates and tutorials
Empowering intelligent file indexing and search