Skip to content

repeated "Read timed out" errors when recovering a large sized shards from S3 repository #149

Open
@asileon

Description

@asileon

When restoring a large size index (150GB splitted to 5 shards) from S3, "Read timed out" errors are raised from S3 input stream repeatedly.

This issue is somewhat of a duplicates of elastic/elasticsearch#8280,
which led me to test the recovery process using elasticsearch 1.4.0 & AWS plugin 2.4.1.
The test has failed using a large range of 'max_retries' values.

error log:

2014-11-27 13:51:10,337][WARN ][indices.cluster          ] [NODE] [INDEX][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][2] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][2] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] sending failed shard for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:10,346][WARN ][cluster.action.shard     ] [NODE] [INDEX][2] received shard failed for [INDEX][2], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][2] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][2] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][2] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][2] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,128][WARN ][indices.cluster          ] [NODE] [INDEX][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [INDEX][1] failed recovery
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] restore failed
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
    ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] failed to restore snapshot [2014_11_27]
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
    at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
    ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [INDEX][1] Failed to recover index
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
    at sun.security.ssl.InputRecord.read(InputRecord.java:509)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at java.security.DigestInputStream.read(DigestInputStream.java:161)
    at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
    at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
    at java.io.InputStream.read(InputStream.java:101)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:833)
    at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
    ... 6 more
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] sending failed shard for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]
[2014-11-27 13:51:47,130][WARN ][cluster.action.shard     ] [NODE] [INDEX][1] received shard failed for [INDEX][1], node[txxoLNwnSmWM1tb6o2bvdw], [P], restoring[s3_repository:2014_11_27], s[INITIALIZING], indexUUID [Jvd3cMsHRdevp0JWyb5Iag], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[INDEX][1] failed recovery]; nested: IndexShardRestoreFailedException[[INDEX][1] restore failed]; nested: IndexShardRestoreFailedException[[INDEX][1] failed to restore snapshot [2014_11_27]]; nested: IndexShardRestoreFailedException[[INDEX][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions