repeated "Read timed out" errors when recovering a large sized shards from S3 repository

When restoring a large size index (150GB splitted to 5 shards) from S3, "Read timed out" errors are raised from S3 input stream repeatedly.

This issue is somewhat of a duplicates of https://github.com/elasticsearch/elasticsearch/issues/8280,
which led me to test the recovery process using elasticsearch 1.4.0 & AWS plugin 2.4.1.
The test has failed using a large range of 'max_retries' values.

error log:

```
2014-11-27 13:51:10,337][WARN ][indices.cluster          ] [NODE] [INDEX][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][2] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] sending failed shard for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] received shard failed for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,128][WARN ][indices.cluster          ] [NODE] [INDEX][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][1] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] sending failed shard for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] received shard failed for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

repeated "Read timed out" errors when recovering a large sized shards from S3 repository #149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

repeated "Read timed out" errors when recovering a large sized shards from S3 repository #149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions