This Java application provides functionality to clean old objects from an AWS S3 bucket or copy/sync objects between S3 buckets based on a time threshold.
It supports two main modes:
- Cleaning old objects
- Copying recent objects (with optional metadata synchronization)
The project is built with Gradle, containerized using Docker, and includes a GitHub Actions workflow for CI/CD.
- S3Cleaner: Deletes objects from an S3 bucket older than a specified threshold (in seconds).
- S3Copier: Copies recent objects from a source S3 bucket to a target bucket or synchronizes metadata for objects newer than the threshold.
- Configurable via environment variables.
- Handles pagination for large S3 buckets.
- Robust error handling and logging using Java's
java.util.logging
. - Supports folder-specific operations within buckets.
- Built with Gradle and containerized with Docker.
- Automated CI/CD pipeline with GitHub Actions for building, testing, and deploying Docker images.
- Java: JDK 17
- Gradle: Version compatible with the
build.gradle
configuration (uses Gradle wrapper) - Docker: For building and running the containerized application
- AWS SDK for Java: Version 2.20.0 (managed by Gradle)
- AWS Credentials: Configured via environment variables for the source bucket
- Target Bucket Credentials (if copying): Access key, secret key, and endpoint URL for the target bucket
- GitHub Actions: For CI/CD (optional)
software.amazon.awssdk:s3:2.20.0
software.amazon.awssdk:apache-client:2.20.0
org.slf4j:slf4j-jdk14:1.7.30
(runtime)- Test dependencies:
junit-jupiter:5.8.1
,mockito-core:3.6.0
git clone <repository-url>
cd <repository-directory>
The project uses:
- Java 17 toolchain
- Dependencies for AWS SDK, SLF4J, JUnit, and Mockito
- Shadow plugin for creating a fat JAR (
app-all.jar
) - Application plugin with main class
com.procure.thg.cockroachdb.App
To sync dependencies:
./gradlew build
Set the required environment variables based on the operation mode (cleaning or copying).
export AWS_ENDPOINT_URL="https://s3.eu-west-1.amazonaws.com"
export BUCKET_NAME="my-source-bucket"
export THRESHOLD_SECONDS="86400" # 1 day
export FOLDER="my-folder/" # optional
No additional variables required.
export ENABLE_MOVE="true"
export TARGET_AWS_ACCESS_KEY_ID="your-target-access-key"
export TARGET_AWS_SECRET_ACCESS_KEY="your-target-secret-key"
export TARGET_AWS_ENDPOINT_URL="https://target-s3-endpoint.com"
export TARGET_BUCKET_NAME="my-target-bucket"
export TARGET_FOLDER="target-folder/" # optional
export COPY_METADATA="true" # optional
./gradlew shadowJar
The output JAR will be located at build/libs/app-all.jar
.
The provided Dockerfile creates a container based on openjdk:17-jdk-slim
:
- Copies the fat JAR (
app-all.jar
) and anentrypoint.sh
script - Sets the working directory to
/app
- Makes the entrypoint script executable
docker build -t s3-object-manager .
docker run --rm \
-e AWS_ENDPOINT_URL="https://s3.eu-west-1.amazonaws.com" \
-e BUCKET_NAME="my-source-bucket" \
-e THRESHOLD_SECONDS="86400" \
-e ENABLE_MOVE="true" \
-e TARGET_AWS_ACCESS_KEY_ID="your-target-access-key" \
-e TARGET_AWS_SECRET_ACCESS_KEY="your-target-secret-key" \
-e TARGET_AWS_ENDPOINT_URL="https://target-s3-endpoint.com" \
-e TARGET_BUCKET_NAME="my-target-bucket" \
-e TARGET_FOLDER="target-folder/" \
-e COPY_METADATA="true" \
s3-object-manager
Note: Ensure
entrypoint.sh
exists and contains:
#!/bin/bash
java -jar /app/app.jar
You can run the application using one of the following methods:
./gradlew run
java -jar build/libs/app-all.jar
See Docker run command above.
-
Cleaning Mode (
ENABLE_MOVE=false
or unset):- Deletes objects older than
THRESHOLD_SECONDS
from the source bucket (optionally withinFOLDER
)
- Deletes objects older than
-
Copying Mode (
ENABLE_MOVE=true
):- If
COPY_METADATA=true
: Synchronizes metadata for objects newer thanTHRESHOLD_SECONDS
- If
COPY_METADATA=false
: Copies objects newer thanTHRESHOLD_SECONDS
to the target bucket
- If
App.java
: Main entry point, initializes S3 clients, and orchestrates cleaning or copyingS3Cleaner.java
: Deletes old objects from the source bucketS3Copier.java
: Copies objects or syncs metadata between buckets
The project includes a GitHub Actions workflow (Java CI with Gradle) for automated building, testing, and deployment:
- Push/pull requests to the
main
branch - Tags like
v*.*.*
- Build: Sets up JDK 17, builds the project with Gradle, creates a Docker image, and pushes it to GitHub Container Registry (GHCR) for non-PR events
- Dependency Submission: Generates and submits a dependency graph for Dependabot alerts
contents:read
,packages:write
,id-token:write
(for image signing and registry access)
REGISTRY
:ghcr.io
IMAGE_NAME
: Derived fromgithub.repository
GITHUB_TOKEN
: For registry authentication
To use the workflow:
- Ensure your repository is set up on GitHub
- Store the
entrypoint.sh
andDockerfile
in the app directory (as referenced by the workflow) - Push changes to trigger the workflow
- Uses
java.util.logging
with SLF4J binding (slf4j-jdk14
) - Logs key events (client initialization, object processing, errors) to the console
- Customize logging via a
logging.properties
file or system properties
- Validates environment variables, throwing
IllegalArgumentException
for missing/invalid values - Handles S3 operation exceptions (e.g.,
NoSuchKeyException
, network errors) with appropriate logging - Ensures S3 clients are closed in a
finally
block to prevent resource leaks
- Region: Hardcoded to
EU_WEST_1
inApp.java
. Update if needed. - Timeouts: S3 clients use 6000-second socket/connection timeouts.
- Path Style: Enabled for compatibility with non-AWS S3 endpoints (e.g., MinIO, Ceph)
- Metadata Sync: Uses
CopyObject
withMetadataDirective.REPLACE
for metadata updates - Docker: Assumes an
entrypoint.sh
script exists. Example:
#!/bin/bash
java -jar /app/app.jar
Issue | Solution |
---|---|
Missing Environment Variables | Verify all required variables are set |
Permission Issues | Ensure AWS credentials have appropriate permissions (s3:ListBucket , s3:GetObject , s3:DeleteObject , s3:PutObject ) |
Docker Build Fails | Check entrypoint.sh and ensure directory structure is correct |
Workflow Errors | Verify GITHUB_TOKEN and GHCR setup |
This project is licensed under the MIT License. See the LICENSE
file for details.