You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When restoring a cluster with sstableloader, there seems to be some issues with fqdn
Our cluster is EC2-based and for each node, we have 2 network interfaces :
one for the internal node (provided by default by AWS)
one we attach and use for cluster communication (so that if a node is terminated, it is replaced and attached to the same previous network interface)
When running a restore using sstableloader via this command :
nohup medusa restore-cluster --backup-name=dev-14-05-2020-full --table=whatever.customer_190416_bob --seed-target=ip-172-29-180-216.eu-west-1.compute.internal
We have the following output
[2020-05-28 09:54:49,508] INFO: Monitoring provider is noop
[2020-05-28 09:54:49,508] INFO: system_auth keyspace will be overwritten with the backup on target nodes
[2020-05-28 09:54:50,090] INFO: Ensuring the backup is found and is complete
[2020-05-28 09:54:50,124] INFO: Restore will happen "In-Place", no new hardware is involved
[2020-05-28 09:54:50,154] INFO: Tokenmap is differently distributed. Extra items: {'[truncated list of tokens]'}
[2020-05-28 09:54:50,156] INFO: Starting cluster restore...
[2020-05-28 09:54:50,156] INFO: Working directory for this execution: /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139
[2020-05-28 09:54:50,156] INFO: About to restore on ip-172-29-180-216.eu-west-1.compute.internal using {'source': ['ip-172-29-181-241.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,156] INFO: About to restore on ip-172-29-182-200.eu-west-1.compute.internal using {'source': ['ip-172-29-181-163.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,156] INFO: About to restore on ip-172-29-182-219.eu-west-1.compute.internal using {'source': ['ip-172-29-180-194.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,156] INFO: About to restore on ip-172-29-181-207.eu-west-1.compute.internal using {'source': ['ip-172-29-180-186.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,156] INFO: About to restore on ip-172-29-180-190.eu-west-1.compute.internal using {'source': ['ip-172-29-182-156.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,157] INFO: About to restore on ip-172-29-181-208.eu-west-1.compute.internal using {'source': ['ip-172-29-182-138.eu-west-1.compute.internal'], 'seed': False} as backup source
[2020-05-28 09:54:50,157] INFO: This will delete all data on the target nodes and replace it with backup dev-14-05-2020-full.
[2020-05-28 09:54:50,157] INFO: target seeds : []
[2020-05-28 09:54:50,157] INFO: Restoring schema on the target cluster
[2020-05-28 10:09:03,865] INFO: (Re)creating schema for keyspace whatever
[2020-05-28 10:09:26,094] INFO: Restoring data on ip-172-29-180-216.eu-west-1.compute.internal...
[2020-05-28 10:09:26,094] INFO: Restoring data on ip-172-29-182-200.eu-west-1.compute.internal...
[2020-05-28 10:09:26,094] INFO: Restoring data on ip-172-29-182-219.eu-west-1.compute.internal...
[2020-05-28 10:09:26,094] INFO: Restoring data on ip-172-29-181-207.eu-west-1.compute.internal...
[2020-05-28 10:09:26,095] INFO: Restoring data on ip-172-29-180-190.eu-west-1.compute.internal...
[2020-05-28 10:09:26,095] INFO: Restoring data on ip-172-29-181-208.eu-west-1.compute.internal...
[2020-05-28 10:09:26,097] INFO: Executing "nohup sh -c "mkdir /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139; cd /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139 && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name dev-14-05-2020-full --temp-dir /opt/cassandra/data/tmp --use-sstableloader --table whatever.customer_190416_bob"" on all nodes.
[2020-05-28 10:11:08,607] INFO: Job executing "nohup sh -c "mkdir /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139; cd /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139 && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name dev-14-05-2020-full --temp-dir /opt/cassandra/data/tmp --use-sstableloader --table whatever.customer_190416_bob"" ran and finished with errors on following nodes: ['ip-172-29-180-190.eu-west-1.compute.internal', 'ip-172-29-180-216.eu-west-1.compute.internal', 'ip-172-29-181-207.eu-west-1.compute.internal', 'ip-172-29-181-208.eu-west-1.compute.internal', 'ip-172-29-182-200.eu-west-1.compute.internal', 'ip-172-29-182-219.eu-west-1.compute.internal']
[2020-05-28 10:11:08,608] INFO: [ip-172-29-180-216.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: ip-172-29-180-216.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: [ip-172-29-182-200.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: ip-172-29-182-200.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: [ip-172-29-182-219.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: ip-172-29-182-219.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: [ip-172-29-181-207.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,608] INFO: ip-172-29-181-207.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,609] INFO: [ip-172-29-180-190.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,609] INFO: ip-172-29-180-190.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,609] INFO: [ip-172-29-181-208.eu-west-1.compute.internal] nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,609] INFO: ip-172-29-181-208.eu-west-1.compute.internal-stdout: nohup: ignoring input and appending output to ‘nohup.out’
[2020-05-28 10:11:08,610] ERROR: Some nodes failed to restore. Exiting
[2020-05-28 10:11:08,610] ERROR: This error happened during the cluster restore: Some nodes failed to restore. Exiting
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/cassandra_medusa-0.7.0.dev0-py3.7.egg/medusa/restore_cluster.py", line 72, in orchestrate
restore.execute()
File "/usr/local/lib/python3.7/site-packages/cassandra_medusa-0.7.0.dev0-py3.7.egg/medusa/restore_cluster.py", line 146, in execute
self._restore_data()
File "/usr/local/lib/python3.7/site-packages/cassandra_medusa-0.7.0.dev0-py3.7.egg/medusa/restore_cluster.py", line 350, in _restore_data
raise Exception(err_msg)
Exception: Some nodes failed to restore. Exiting
On each node, we can find the following log in /opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139/stderr
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-266-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-289-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-308-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-313-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-318-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-319-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-320-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-321-big-Data.db /opt/apache-cassandra-3.11.2/data/tmp/medusa-restore-15a118af-88e7-4bbe-a73a-adee71a43150/whatever/customer_190416_bob-4bf290b057b011ea8398d99fe6b0f187/mc-342-big-Data.db to [/172.29.181.207, /172.29.180.190, /172.29.182.200, /172.29.182.219, /172.29.181.208, ip-172-29-180-160.eu-west-1.compute.internal/172.29.180.160]
ERROR 10:15:51,429 [Stream #345045f0-a0cc-11ea-b7b4-f76b1362a6ed] Streaming error occurred on session with peer 172.29.180.160
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_252]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_252]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_252]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:645) ~[na:1.8.0_252]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_252]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:282) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:269) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:263) [apache-cassandra-3.11.2.jar:3.11.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_252]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.2.jar:3.11.2]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_252]
[ Then streaming occurs and finish]
[ When reaching 100%, it fails with ]
WARN 10:21:34,756 [Stream #345045f0-a0cc-11ea-b7b4-f76b1362a6ed] Stream failed
Streaming to the following hosts failed:
[ip-172-29-180-160.eu-west-1.compute.internal/172.29.180.160]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:682)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:532)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:114)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98)
... 1 more
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:682)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:532)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
at java.lang.Thread.run(Thread.java:748)
This happen because of the command sstableloader run locally.
It uses the local internal fqdn instead of using the fqdn provided in the node medusa configuration file.
If we run the command manually and provide the fqdn that is in the config, it runs flawlessly.
Project board link
When restoring a cluster with sstableloader, there seems to be some issues with fqdn
Our cluster is EC2-based and for each node, we have 2 network interfaces :
When running a restore using sstableloader via this command :
nohup medusa restore-cluster --backup-name=dev-14-05-2020-full --table=whatever.customer_190416_bob --seed-target=ip-172-29-180-216.eu-west-1.compute.internal
We have the following output
On each node, we can find the following log in
/opt/cassandra/data/tmp/medusa-job-4c6dbf69-a756-4b98-b477-962a36980139/stderr
This happen because of the command sstableloader run locally.
It uses the local internal fqdn instead of using the fqdn provided in the node medusa configuration file.
If we run the command manually and provide the fqdn that is in the config, it runs flawlessly.
I'm wondering if that could be fixed in 12e0532
Otherwise, I opened a PR linked below
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-80
The text was updated successfully, but these errors were encountered: