-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to upload dataset using hub_sdk? #963
Comments
👋 Hello @anata404, thank you for raising an issue about Ultralytics HUB 🚀! It seems you’re encountering an issue while using the SDK for uploading a dataset. An Ultralytics engineer will assist you soon 😊! If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE) including the following details to help us debug?
If this is a ❓ Question, sharing more context, including your dataset, model details, or anything specific that you're trying to achieve, would help us provide the most relevant response. We appreciate your patience while we review this. Thank you for using Ultralytics HUB 🚀! |
Hello! It sounds like you're on the right track with using the Ultralytics HUB-SDK to upload a dataset, and it's great that you've already verified the dataset with Potential Cause of the IssueThe error message indicates that the
Steps to Resolve1. Verify Dataset IDEnsure the # List all datasets
dataset_list = client.dataset_list(page_size=10)
for dataset in dataset_list.results:
print(dataset) 2. Verify File PathDouble-check that the import os
file_path = "<Dataset File>"
if os.path.isfile(file_path):
print("File exists.")
else:
print("File does not exist. Check your file path.") 3. Update HUB-SDKEnsure you're using the latest version of the HUB-SDK to avoid bugs that might have already been resolved. You can upgrade it using: pip install --upgrade ultralytics-hub 4. Updated Code ExampleHere’s how you can upload the dataset with additional logging to help debug any issues: from hub_sdk import HUBClient
# Authenticate with your API key
credentials = {"api_key": "<YOUR-API-KEY>"}
client = HUBClient(credentials)
# Replace with your actual Dataset ID and file path
dataset_id = "<Dataset ID>"
file_path = "<Dataset File>"
# Verify file existence
import os
if not os.path.isfile(file_path):
raise FileNotFoundError(f"The file {file_path} does not exist. Please check your file path.")
# Select and upload the dataset
dataset = client.dataset(dataset_id)
response = dataset.upload_dataset(file=file_path)
# Check response
if response:
print("Dataset uploaded successfully:", response.json())
else:
print("Dataset upload failed.") 5. Debugging the ErrorIf the issue persists, enable logging for more detailed information: import logging
logging.basicConfig(level=logging.DEBUG) This will provide additional context to pinpoint the issue. If the Issue PersistsIf you've verified the dataset ID, file path, and SDK version but still encounter the issue, it might be a server-side or SDK-specific problem. In such cases:
Feel free to follow up here if you need further assistance. The Ultralytics community and team are here to help! 🚀 |
@pderrenger Thanks for your reply. 🙏 After applying your updated code, I got these logs:
|
And this is my full code:
|
Thank you for sharing your complete code and debugging logs! Based on the information provided, it seems like the issue lies in the Potential Causes and Resolutions1. File Upload IssueThe
2. Server-Side or API ErrorThe error message indicates a failure in the upload process. This might be related to:
To confirm, try creating a new dataset and uploading the file to it: # Create a new dataset
dataset_metadata = {"meta": {"name": "Test Dataset"}}
new_dataset = client.dataset()
new_dataset.create_dataset(dataset_metadata)
# Upload the dataset file to the newly created dataset
response = new_dataset.upload_dataset(file=file_path)
if response:
print("Dataset uploaded successfully:", response.json())
else:
print("Dataset upload failed.") 3. SDK VersionMake sure you are using the latest version of the HUB-SDK, as older versions might have unresolved bugs. Update it with: pip install --upgrade ultralytics-hub 4. Debugging the Upload ProcessEnable debug logging (as you've done) and inspect the full trace to see if there are additional details about the failure. You can also modify the # Manually edit `hub_sdk/modules/datasets.py` (if possible)
def upload_dataset(self, file: str = None) -> Optional[Response]:
try:
# Existing code
except Exception as e:
print(f"Error during upload: {e}") # Add this for more context
self.logger.error(f"Failed to upload dataset for {self.name}({self.id}): {str(e)}") 5. Alternative Method for UploadIf the issue persists, consider using the Ultralytics HUB Web Interface to manually upload the dataset. If it succeeds, this confirms the SDK-specific issue. Next StepsIf none of the above resolves the issue, please:
Feel free to follow up with the results, and we’ll continue troubleshooting! The Ultralytics team and community are here to support you. 🚀 |
Upon examining the source code, I discovered the specific endpoint for uploading datasets.
I know now it's because of file size:
My zip file is 8.98MB. Could you please specify the maximum file size allowed? The information isn't found in the documentation. How can I upload larger files? While all operations via the Web UI are successful, I'd like to manage the entire process through the API/SDK. |
Thank you for the detailed follow-up! It's great to see you've identified the root cause of the issue. The 1. Maximum File Size LimitThe current file size limit for uploading datasets via the API is typically 10 MB. However, this can vary depending on server configurations. Your file size of 8.98 MB is close to the limit, and some additional overhead during the upload process (e.g., encoding or metadata) might push it over the limit, leading to the error. 2. Uploading Larger FilesFor dataset files larger than the limit, there are alternative methods to handle the upload: Option 1: Use the Web InterfaceThe Ultralytics HUB Web UI allows for uploading larger files without encountering the same restrictions as the API. Since you've confirmed it works for your file, this is a quick and reliable solution if you're okay with using the Web UI for this specific step. Option 2: Use a Pre-Signed URLFor larger files, the HUB-SDK supports uploading via a pre-signed URL, which bypasses the API's direct upload limits. Here's how you can do it programmatically:
Option 3: Split the DatasetIf pre-signed URL uploads are not feasible, you could split your dataset into smaller parts, upload them separately, and then merge them on the server or via the Web UI. However, this is more complex and less ideal. 3. Improving DocumentationYou're correct that the maximum file size limit is not explicitly mentioned in the current documentation. I'll pass this feedback to the Ultralytics team to ensure this information is included in future updates. Clear documentation on file size limits and pre-signed URL uploads would certainly help users like yourself! 4. Next StepsHere’s what I recommend:
Let me know if you need further assistance with the pre-signed URL or any other part of the process. The Ultralytics team and community are here to support you! 🚀 |
@pderrenger Thanks for the informative reply. A pre-signed URL is a great approach. However, when I tried I got the error:
My hub_sdk version is:
I searched the whole organization of Ultralytics, I couldn't find the function: |
Thank you for pointing this out! It seems that the Current Status of Pre-Signed URL UploadsIn the current version of the Alternative Solutions1. Uploading Large Files via the Web UIThe Ultralytics HUB Web Interface supports uploading larger files without encountering file size restrictions. While you mentioned a preference for SDK-based workflows, using the Web UI for this specific step ensures seamless uploads for larger datasets. 2. Using the API DirectlyAlthough not available in the SDK, the Ultralytics API supports pre-signed URL uploads. You can leverage the API directly to request a pre-signed URL and upload the file. Here’s an example approach:
Unfortunately, as there’s no explicit documentation for this in the SDK or API docs currently, I recommend reaching out directly via Ultralytics HUB Discussions to confirm the exact endpoint for pre-signed URL generation. 3. Split Your DatasetIf you prefer using the SDK and your dataset is close to the file size limit, you can split your dataset into smaller parts, upload them individually, and then merge them on the server. While this is more cumbersome, it can be a temporary workaround. Future ImprovementsYour feedback about missing functionality is valuable! I’ll ensure this is flagged with the Ultralytics team for potential inclusion in future SDK updates. A Next StepsFor now, I recommend:
Feel free to follow up here if you have further questions or need additional clarification. The Ultralytics team and community are always here to help! 🚀 |
@pderrenger Thanks a lot for your patient reply. You said
and
I'm not quite following. Isn't there an existing API endpoint already? Perhaps something like:
I tried using url from dataset:
But I got an 403 error:
Apparently this URL is intended for downloading, not uploading. What additional actions can I take? |
Thank you for your thoughtful follow-up and for experimenting with potential approaches! You're correct that the URL retrieved through Clarifying the Pre-Signed URL for UploadsCurrently, the Ultralytics HUB-SDK ( The error Recommended Actions1. Web UI for Dataset UploadsFor now, the easiest and most reliable way to upload larger datasets is via the Ultralytics HUB Web Interface. This bypasses file size limits and ensures successful uploads. While I understand your preference for a programmatic solution via the SDK, this remains the best option until upload-specific pre-signed URL functionality is added. 2. Feedback for SDK EnhancementAs you’ve correctly identified a gap in the SDK, I recommend submitting a feature request on the Ultralytics HUB GitHub Discussions or Issues page. This will allow the Ultralytics team to prioritize adding functionality for generating pre-signed URLs for uploads in a future release. Here’s an example of how you might phrase the feature request:
3. Alternative SolutionsWhile waiting for SDK enhancements, here’s how you can manage your workflow programmatically:
4. Verify SDK UpdatesKeep your SDK updated using: pip install --upgrade ultralytics-hub Future updates may include enhancements for dataset uploads. Next StepsSince the current SDK doesn’t support upload-specific pre-signed URLs, I suggest:
If you have further questions or need assistance with any of these steps, feel free to ask! The Ultralytics team and community are always here to help. 🚀 |
@pderrenger Thanks again for your kindness reply. I triggered a feature request #971 |
You're very welcome, and thank you for taking the initiative to create a feature request at #971! 🎉 This will help the Ultralytics team and community prioritize adding support for pre-signed URL generation or other solutions to handle larger dataset uploads via the SDK. In the meantime, if you have any additional questions or need further clarification on current workflows, feel free to ask—I'm here to help. 🚀 Thanks again for contributing to improving the Ultralytics ecosystem! |
Search before asking
Question
I want to go through the whole process using Python SDK. But when I'm using code from the official doc: https://docs.ultralytics.com/hub/sdk/dataset/#upload-dataset
I got error:
Additional
I checked the zip file using the following code, it's OK
The text was updated successfully, but these errors were encountered: