Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/v0.25.5 504 Gateway Timeout Error #158

Open
JOSHMT0744 opened this issue Aug 21, 2024 · 0 comments
Open

bug/v0.25.5 504 Gateway Timeout Error #158

JOSHMT0744 opened this issue Aug 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@JOSHMT0744
Copy link

Describe the bug
When using v0.25.5 of unstructured-client on vscode, on processing PDFs of more than 1 page with "hi_res", I consistently receive INFO: Failed to process a request due to API server error with status code 504. and consequently:

INFO: Server message - <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

To Reproduce

import os
from unstructured_client import UnstructuredClient
from unstructured_client.models import shared
from unstructured_client.models.errors import SDKError

os.environ['UNSTRUCTURED_API_KEY'] = "<MY_API_KI>"
os.environ['UNSTRUCTURED_API_URL'] = "<MY_API_URL>"

client_obj = UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
    server_url=os.getenv("UNSTRUCTURED_API_URL"),
)

filename = "./data/kenwood_en.pdf"
file = open(filename, "rb")
req = shared.PartitionParameters(
    # Note that this currently only supports a single file
    files=shared.Files(
        content=file.read(),
        file_name=filename,
    ),
    chunking_strategy="by_title",
    max_characters=1024,
    split_pdf_page=True,
    split_pdf_allow_failed=True
)

try:
    res = client_obj.general.partition(request=req)
    print(res.elements[0])
except SDKError as e:
    print(e)

Expected behavior
After 2 minutes, it will always throw the error:

INFO: Preparing to split document for partition.
INFO: Starting page number set to 1
INFO: Allow failed set to 1
INFO: Concurrency level set to 5
INFO: Splitting pages 1 to 40 (40 total)
INFO: Determined optimal split size of 8 pages.
INFO: Partitioning 5 files with 8 page(s) each.
INFO: Partitioning set #1 (pages 1-8).
INFO: Partitioning set #2 (pages 9-16).
INFO: Partitioning set #3 (pages 17-24).
INFO: Partitioning set #4 (pages 25-32).
INFO: Partitioning set #5 (pages 33-40).
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 25
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 17
INFO: HTTP Request: POST <MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 9
INFO: HTTP Request: POST<MY_API_URL> "HTTP/1.1 504 Gateway Time-out"
ERROR: Failed to send request for page 1
WARNING: Failed to partition set #1, its elements will be omitted in the final result.
WARNING: Failed to partition set #2, its elements will be omitted in the final result.
WARNING: Failed to partition set #3, its elements will be omitted in the final result.
WARNING: Failed to partition set #4, its elements will be omitted in the final result.
WARNING: Failed to partition set #5, its elements will be omitted in the final result.
INFO: Failed to process a request due to API server error with status code 504. Attempting retry number 1 after sleep.
INFO: Server message - <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

And then it will go about the retry strategy, which I presume is the one defined in general.py.
This loop of 504s continues again and again.
I have tried adjusting the RetryConfig in my Client and general.Partition, but can't seem to make it make a difference to when and how my program fails.

Environment Info
I am running this in a Jupyter notebook in VSCode, within a venv.

Additional Info
The pdf I used to reproduce this example is here
Would anyone have a solution, or could help guide me as to whether this is a me issue or a bug?

@JOSHMT0744 JOSHMT0744 added the bug Something isn't working label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant