Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 writer #434

Merged
merged 4 commits into from
Aug 7, 2024
Merged

S3 writer #434

merged 4 commits into from
Aug 7, 2024

Conversation

chesswithmihir
Copy link
Contributor

@chesswithmihir chesswithmihir commented Aug 2, 2024

Feature for Singer ability to write to S3

Description

Giving Singer the ability to write log messages directly to S3 with the below configurations. Singer should expose a new set of configurations that provide users the ability to send their logs to S3 through Singer. There are two thresholds that trigger the upload to S3: time-based uploads (every X seconds) and file size limits (once we have seen X megabytes of log files) whichever occurs first.

Configuration for S3 writer

writer.type=s3
writer.s3.maxFileSizeMB=50 ← max(50, maxFileSizeMB), default 50
writer.s3.minUploadTimeInSeconds=5 ← max(30, minUploadTimeInSeconds), default 30
writer.s3.fileNameFormat=myFileNameFormat ← suffix right before a timestamp required
writer.s3.bucket=myBucket ← AWS bucket name required
writer.s3.keyPrefix=myKeyPrefix ← AWS keyPrefix name required
writer.s3.maxRetries=3 ← retry logic (if uploads unsuccessful), default 5
writer.s3.bufferDir="tmp/singer/s3" ← directory for buffered files that will be uploaded to s3, default"tmp/singer/s3"

@chesswithmihir chesswithmihir requested a review from a team as a code owner August 2, 2024 04:16
Copy link
Contributor

@vahidhashemian vahidhashemian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this feature.
Lgtm overall. Just a few small comments.
Also, please add necessary documentation and quickstart guide.

@jeffxiang jeffxiang dismissed their stale review August 2, 2024 15:08

I mean to request changes.

config.setMinUploadTimeInSeconds(MIN_UPLOAD_TIME_IN_SECONDS);
}

// maxRetries = 5 if inputted value is less than 0, else inputted value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inputted -> input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, thanks Vahid!

The current design employs two thresholds, a timer-based (in seconds) and file-sized threshold (in megabytes). The writer will upload the log file to S3 when either of the thresholds is met first.
For example, if 50MB of log messages is reached before 5 seconds, the writer will upload the log file to S3 because the file size threshold has been exceeded. If 5 seconds is reached before 50MB, the writer will upload the log file to S3 because the timer threshold has expired.

## Example writer configuration in Singer Log Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is "Singer Log Configuration" more accurate or "Singer Logstream Configuration"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch Vahid, singer Log Stream Configuration is more descriptive. Thank you!

@chesswithmihir chesswithmihir merged commit 31d5da8 into pinterest:master Aug 7, 2024
1 check passed
@chesswithmihir chesswithmihir deleted the s3_writer branch August 7, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants