Skip to content

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.

License

Notifications You must be signed in to change notification settings

widdix/s3-getobject-accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2f0cd94 · Feb 24, 2025

History

73 Commits
Jul 23, 2024
Feb 21, 2025
Apr 11, 2023
Apr 14, 2023
Mar 24, 2023
Feb 24, 2025
Feb 19, 2025
Feb 24, 2025
Feb 24, 2025
Feb 24, 2025
Dec 17, 2024

Repository files navigation

S3 GetObject Accelerator

Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.

We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: {concurrency: 64}.

Installation

npm i s3-getobject-accelerator

Examples

Compact

const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');

pipeline(
  download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
  createWriteStream('/tmp/test'),
  (err) => {
    if (err) {
      console.error('something went wrong', err);
    } else {
      console.log('done');
    }
  }
);

More verbose

Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:

const {download} = require('s3-getobject-accelerator');

const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});

d.on('part:downloading', ({partNo}) => {
  console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
  console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
  console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
  console.log('part written to disk', partNo);
});

d.meta((err, metadata) => {
  if (err) {
    console.error('something went wrong', err);
  } else {
    if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
      console.error('file is larger than 1 TiB');
    } else {
      d.file('/tmp/test', (err) => {
        if (err) {
          console.error('something went wrong', err);
        } else {
          console.log('done');
        }
      });
    }
  }
});

API

download(s3source, options)

  • s3source <Object>
    • bucket <string>
    • key <string>
    • version <string> (optional)
  • options <Object>
    • partSizeInMegabytes <number> (optional, defaults to uploaded part size)
    • concurrency <number>
    • requestTimeoutInMilliseconds <number> Maxium time for a request to complete from start to finish (optional, defaults to 300,000, 0 := no timeout)
    • resolveTimeoutInMilliseconds <number> Maximum time for a DNS query to resolve (optional, defaults to 3,000, 0 := no timeout)
    • connectionTimeoutInMilliseconds <number> Maximum time for a socket to connect (optional, defaults to 3,000, 0 := no timeout)
    • readTimeoutInMilliseconds <number> Maxium time to read the response body (optional, defaults to 300,000, 0 := no timeout)
    • dataTimeoutInMilliseconds <number> Maxium time between two data events while reading the response body (optional, defaults to 3,000, 0 := no timeout)
    • writeTimeoutInMilliseconds <number> Maxium time to write the request body (optional, defaults to 300,000, 0 := no timeout)
    • v2AwsSdkCredentials <AWS.Credentials> (optional)
    • endpointHostname <string> (optional, defaults to s3.${region}.amazonaws.com)
    • agent <https.Agent> (optional)
  • Returns:

AWS credentials & region

AWS credentials are fetched in the following order:

  1. options.v2AwsSdkCredentials
  2. Environment variables
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN (optional)
  1. IMDSv2

AWS region is fetched in the following order:

  1. Environment variable AWS_REGION
  2. IMDSv2

Considerations