API call from deployment to deployment hangs forever

Hi,

I'm having an issue that I don't get locally, it happens in the following scenario:

- I have two cog models deployed on Replicate (as `Deployments`)
- one of them at some point calls the other (see snippet below)
- they were built and deployed using `cog==0.13.7` , `replicate==1.0.4` , and the `cog CLI 0.14.3`, `python 3.11`, `ubuntu==22.04`


Here's how I call one deployment from the other:

```python
from replicate.helpers import base64_encode_file

vectorizer_deployment = replicate.deployments.get(VECTORIZER_DEPLOYMENT)


with open(img_path, "rb") as f:
        b64 = base64_encode_file(f)
prediction = vectorizer_deployment.predictions.create(
            input={"images": [b64_images]} ,
        )
logger.debug(f"{prediction.id=}")
prediction.wait() # this line hangs forever after 30-ish GET requests
```

The called deployment does complete the inference, and I can see the status as `succeeded` on Replicate.
In the logs of the calling deployment, I can see about 30-ish GET requests, all looking like `INFO:httpx:HTTP Request: GET https://api.replicate.com/v1/predictions/7atmc23wmsrga0cp7ag9y5s6pm "HTTP/1.1 200 OK"`

I have investigated the replicate python client source code, I can see that the `prediction.wait()` method calls the '.reload()' method which itselfs performs the GET requests.
I've tried increasing the env var `REPLICATE_POLL_INTERVAL` but to no effect.

Strange thing is, as said above, locally it works! ie:
- when I run locally in python the main endpoint everything works well (I run like predictor.predict(...) )
- when I run locally with `cog predict -i ...`, inference goes through, but at the end after my inference completes I get this error log:
`{"logger": "cog.server.worker", "timestamp": "2025-04-15T19:11:52.878929Z", "exception": "Traceback (most recent call last):\n  File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 299, in _consume_events\n    self._consume_events_inner()\n  File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 337, in _consume_events_inner\n    ev = self._events.recv()\n         ^^^^^^^^^^^^^^^^^^^\n  File \"/root/.pyenv/versions/3.11.10/lib/python3.11/multiprocessing/connection.py\", line 251, in recv\n    return _ForkingPickler.loads(buf.getbuffer())\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: URLPath.__init__() missing 3 required keyword-only arguments: 'source', 'filename', and 'fileobj'", "severity": "ERROR", "message": "unhandled error in _consume_events"}`

So far I'm clueless as to why everything suddenly hangs, making all my project useless. I guess it's due to the deployed environment.


@zeke @erbridge @meatballhat @aron @mattt 

thanks for your help






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API call from deployment to deployment hangs forever #424

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API call from deployment to deployment hangs forever #424

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions