[xmatch] Timeout can cause the job to crash #318

JulienPeloton · 2023-06-27T06:48:17Z

Job aborted due to stage failure: Task 7 in stage 2.0 failed 4 times, most recent failure: Lost task 7.3 in stage 2.0 (TID 83) (vm-75183.lal.in2p3.fr executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/anaconda/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/anaconda/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/anaconda/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/anaconda/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/opt/anaconda/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f8d284f6350>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/site-packages/requests/adapters.py", line 499, in send
    timeout=timeout,
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/opt/anaconda/lib/python3.7/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='cdsxmatch.u-strasbg.fr', port=80): Max retries exceeded with url: /xmatch/api/v1/sync (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8d284f6350>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
    process()
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/worker.py", line 596, in process
    serializer.dump_stream(out_iter, outfile)
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 273, in dump_stream
    return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream)
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 81, in dump_stream
    for batch in iterator:
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 266, in init_stream_yield_batches
    for series in iterator:
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/worker.py", line 450, in mapper
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/worker.py", line 450, in <genexpr>
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/worker.py", line 105, in <lambda>
    verify_result_type(f(*a)), len(a[0])), arrow_return_type)
  File "/opt/spark-3/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
    return f(*args, **kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/fink_science/xmatch/processor.py", line 131, in cdsxmatch
    files={'cat1': table}
  File "/opt/anaconda/lib/python3.7/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='cdsxmatch.u-strasbg.fr', port=80): Max retries exceeded with url: /xmatch/api/v1/sync (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8d284f6350>: Failed to establish a new connection: [Errno 110] Connection timed out'))

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:517)
	at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:99)
	at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:49)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:470)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.ContextAwareIterator.hasNext(ContextAwareIterator.scala:39)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.execution.python.BatchIterator.hasNext(ArrowEvalPythonExec.scala:39)
	at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:103)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

The text was updated successfully, but these errors were encountered:

JulienPeloton added this to Fink science May 13, 2024

JulienPeloton moved this to Bug in Fink science May 13, 2024

JulienPeloton closed this as completed by moving to Bug in Fink science May 13, 2024

JulienPeloton reopened this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xmatch] Timeout can cause the job to crash #318

[xmatch] Timeout can cause the job to crash #318

JulienPeloton commented Jun 27, 2023

[xmatch] Timeout can cause the job to crash #318

[xmatch] Timeout can cause the job to crash #318

Comments

JulienPeloton commented Jun 27, 2023