Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeated "Read timed out" errors when recovering a large sized shards from S3 repository #149

Open
asileon opened this issue Nov 27, 2014 · 21 comments

Comments

@asileon
Copy link

asileon commented Nov 27, 2014

When restoring a large size index (150GB splitted to 5 shards) from S3, "Read timed out" errors are raised from S3 input stream repeatedly.

This issue is somewhat of a duplicates of elastic/elasticsearch#8280,
which led me to test the recovery process using elasticsearch 1.4.0 & AWS plugin 2.4.1.
The test has failed using a large range of 'max_retries' values.

error log:

2014-11-27 13:51:10,337][WARN ][indices.cluster          ] [NODE] [INDEX][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][2] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] sending failed shard for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] received shard failed for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,128][WARN ][indices.cluster          ] [NODE] [INDEX][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][1] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] sending failed shard for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] received shard failed for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
@tlrx
Copy link
Member

tlrx commented Dec 2, 2014

With ES 1.4.x and plugin AWS 2.4.1, we have retries at Snapshot/Restore level in ES, at plugin AWS level (max_retries), and internally in the AWS SDK used by the plugin (max_retries if set, otherwise it will try each request 3 times). Setting a large number of retries ends up with max_retries * max_retries maximum tries for each request. With such number of retries and if read timeouts happen, I think this could be network or DNS issues.

@peillis
Copy link

peillis commented Dec 4, 2014

I've experienced the same problem and very often but with no so big snapshots, in fact much smaller around 15MB. And with ES 1.4.1 and plugin AWS 2.4.1 looks like the timeout just takes more time to arise, but it's the same:

[2014-12-03 23:33:28,753][WARN ][indices.cluster          ] [Morgan Le Fay] [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed to restore snapshot [147d7c391b022b844221cf146b1a53f4_ts20141203022859724080]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:834)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-12-03 23:33:28,755][WARN ][cluster.action.shard     ] [Morgan Le Fay] [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] sending failed shard for [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0], node[OjXfmbZyQoS_4gjAtnbzZQ], [P], restoring[10-0-2-11:147d7c391b022b844221cf146b1a53f4_ts20141203022859724080], s[INITIALIZING], indexUUID [i3RzXa6WQ3KLTFfdhyWQAA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed recovery]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] restore failed]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed to restore snapshot [147d7c391b022b844221cf146b1a53f4_ts20141203022859724080]]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-12-03 23:33:28,755][WARN ][cluster.action.shard     ] [Morgan Le Fay] [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] received shard failed for [147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0], node[OjXfmbZyQoS_4gjAtnbzZQ], [P], restoring[10-0-2-11:147d7c391b022b844221cf146b1a53f4_ts20141203022859724080], s[INITIALIZING], indexUUID [i3RzXa6WQ3KLTFfdhyWQAA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed recovery]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] restore failed]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] failed to restore snapshot [147d7c391b022b844221cf146b1a53f4_ts20141203022859724080]]; nested: IndexShardRestoreFailedException[[147d7c391b022b844221cf146b1a53f4_ts20141203022504796377][0] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]

@JoeZ99
Copy link

JoeZ99 commented Dec 4, 2014

elastic/elasticsearch#8280

for the sake of reference

@grantr
Copy link

grantr commented Feb 27, 2015

I encountered this yesterday while restoring a 20TB index with 128 shards from S3. S3 was plenty fast, but frequent timeouts (probably due to saturating our AWS Direct Connect link) caused shard recovery to reset immediately. No shards ever recovered successfully in several hours. We were using ES 1.4.2 and cloud-aws 2.4.1.

@tlrx
Copy link
Member

tlrx commented Apr 3, 2015

elasticsearch-cloud-aws 2.4.1 uses AWS SDK version 1.7.13. In AWS SDK v1.8.10.2 a bug causing socket timeouts has been fixed as indicated in the changelog:

Fix the bug where a service request is not properly reset before passing to the request signer. This bug would cause the SDK to hang (until socket timeout) whenever a service request is retried.

The version 2.5.0 of the plugin uses AWS SDK version 1.9.3, so please let us know if the bug still exist once you upgrade elasticsearch & the plugin to a newer version.

@dadoonet
Copy link
Member

dadoonet commented Apr 3, 2015

@tlrx The plan is also to release a new version of AWS for 1.4 containing also this fix.

@grantr
Copy link

grantr commented Apr 15, 2015

Still seeing timeouts with ES 1.5.1 and cloud-aws 2.5.0.

[2015-04-15 16:25:17,545][WARN ][cluster.action.shard     ] [node1] [index1][7] received shard failed for [index1][7], node[fTHoEW4FRs-i7nZlFf4g6g], [P], restoring[repo1:snapshot1], s[INITIALIZING], indexUUID [CBV4O9eVRW2bkMMW2QStEQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[index1][7] failed recovery]; nested: IndexShardRestoreFailedException[[index1][7] restore failed]; nested: IndexShardRestoreFailedException[[index1][7] failed to restore snapshot [index1]]; nested: IndexShardRestoreFailedException[[index1][7] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]

These are coming in pretty regularly, same as with 1.4.4 and 2.4.1. This is a 20TB index with 128 shards. The restore is ongoing, but so far there have been 4 timeouts and zero recovered shards.

@grantr
Copy link

grantr commented Apr 16, 2015

All the primary shards did eventually recover after about 19 hours. During this time there were 90 socket timeouts.

@asileon
Copy link
Author

asileon commented Apr 20, 2015

Tested restore with ES 1.5.0 & AWS plugin 2.5.0.
While restoring 1TB index with 10 shards, got 3 read timed out errors but eventually the cluster successfully restored.

@tlrx
Copy link
Member

tlrx commented Apr 20, 2015

@grantr @asileon thanks a lot for your feedback! I really think that AWS SDK 1.8.10 is better when dealing with those request timeouts. However we are thinking of implementing a way to throttle the number of S3 requests when snapshoting indices. Nothing concrete right now but this is on our plan.

@esetnik
Copy link

esetnik commented May 5, 2015

@tlrx are you sure AWS plugin 2.5.0 uses AWS SDK version 1.9.3? I just installed the plugin sudo bin/plugin -i elasticsearch/elasticsearch-cloud-aws/2.5.0 and looked at the jars:

$ ls plugins/cloud-aws/aws*
plugins/cloud-aws/aws-java-sdk-core-1.9.23.jar  plugins/cloud-aws/aws-java-sdk-kms-1.9.23.jar
plugins/cloud-aws/aws-java-sdk-ec2-1.9.23.jar   plugins/cloud-aws/aws-java-sdk-s3-1.9.23.jar

Whereas the latest available from amazon is 1.9.33

I'm having the same issues as everyone else. Running ES 1.5.2 and AWS plugin 2.5.0 and facing persistent read timeouts from s3 restoring a 350GB index for the last 24 hrs.

@tlrx
Copy link
Member

tlrx commented May 5, 2015

@esetnik it is version 1.9.23, there's a typo in my comment.

@esetnik
Copy link

esetnik commented May 5, 2015

Do you have any suggested workarounds for other ways to recover from s3? I can download the files manually but i'm not sure how to serve them locally to the recovery process.

@tlrx
Copy link
Member

tlrx commented May 6, 2015

@esetnik I did not try it but I think that you can download the files (including all files snapshot-, metadata- and all folders) from S3 and store them on a local disk. Then you can try to create a new FS repository that points to the root folder.

@asileon
Copy link
Author

asileon commented May 6, 2015

i tried that workaround and it works.
you can use aws s3 sync (http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html) to download the files.
the only limit is that you can run this process only on a cluster with a single node

@tlrx
Copy link
Member

tlrx commented May 6, 2015

the only limit is that you can run this process only on a cluster with a single node

Can you elaborate please?

@asileon
Copy link
Author

asileon commented May 6, 2015

sorry i should have been more clear. and i could be wrong.
but since we don't know how the files are distributed between multiple nodes when stored local, all snapshot data should be copied to each node.

so what i should have wrote is that: i think the limit is every node should be able to store the whole snapshot data

@dadoonet
Copy link
Member

dadoonet commented Dec 1, 2015

I wonder if we should try to add more options to the client we are using.

http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/section-client-configuration.html

Like what we are doing in elastic/elasticsearch#15080 for azure.

@imacube
Copy link

imacube commented Dec 2, 2015

I had a problem with elasticsearch 1.7.1 using AWS plugin 2.7.1 restoring about 300 GB of snapshots. My cluster is running in a VPC and was accessing S3 via a t1.micro NAT instance. The restore was going to take a few days.

The NAT instance was a bottleneck on my network bandwidth. I created a separate cluster in the same VPC but put them into a public subnet. Now S3 was accessed through an Internet Gateway, thus bypassing the NAT instance. Then I saw network I/O and restore times inline with what I would have expected from S3--about 3-4 hours to restore the data.

@dadoonet
Copy link
Member

dadoonet commented Dec 2, 2015

@imacube That sounds a good thing to add in docs. Do you want to contribute there?

@ppf2
Copy link
Member

ppf2 commented Apr 18, 2018

I wonder if we should try to add more options to the client we are using.
http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/section-client-configuration.html
Like what we are doing in elastic/elasticsearch#15080 for azure.

Closing the loop here (timeout settings for AWS, please track this ticket): elastic/elasticsearch#15854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants