Open
Description
When restoring a large size index (150GB splitted to 5 shards) from S3, "Read timed out" errors are raised from S3 input stream repeatedly.
This issue is somewhat of a duplicates of elastic/elasticsearch#8280,
which led me to test the recovery process using elasticsearch 1.4.0 & AWS plugin 2.4.1.
The test has failed using a large range of 'max_retries' values.
error log:
2014-11-27 13:51:10,337][WARN ][indices.cluster ] [NODE] [INDEX][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][2] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] restore failed
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] failed to restore snapshot [2014_11_27]
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] Failed to recover index
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
at java.io.InputStream.read(InputStream.java:101)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
... 6 more
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard ] [NODE] [INDEX][2] sending failed shard for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard ] [NODE] [INDEX][2] received shard failed for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,128][WARN ][indices.cluster ] [NODE] [INDEX][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] restore failed
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] failed to restore snapshot [2014_11_27]
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] Failed to recover index
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at java.security.DigestInputStream.read(DigestInputStream.java:161)
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
at java.io.InputStream.read(InputStream.java:101)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
... 6 more
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard ] [NODE] [INDEX][1] sending failed shard for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard ] [NODE] [INDEX][1] received shard failed for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
Metadata
Metadata
Assignees
Labels
No labels