-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Since the requests
package seem to exhaust system RAM as default behavior, I think some api should pass stream = True
that allows chunked download. Current implementation hardcode as stream=None
(equivalent to False) and this can make the user's system unstable when downloading large datasets.
settings = self._session.merge_environment_settings(http_request.url, {}, None, None, None) |
The download_file
method in KaggleApi
class tries to support chunked downloads but I am not sure this code works as expected because the downloading would be considered complete at this point.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Line 2181 in b97668b
for data in response.iter_content(chunk_size): |
And I think the current usage of the kaggle.http_client()
outside of the with self.build_kaggle_client() as kaggle:
statement is not recommended because resource managed by kaggle
object might be closed outside the with
statement.
with self.build_kaggle_client() as kaggle:
...
download_file(..., kaggle.http_client(), ...)
ex.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Lines 1187 to 1196 in b97668b
with self.build_kaggle_client() as kaggle: | |
request = ApiDownloadDataFileRequest() | |
request.competition_name = competition | |
request.file_name = file_name | |
response = kaggle.competitions.competition_api_client.download_data_file(request) | |
url = response.history[0].url | |
outfile = os.path.join(effective_path, url.split('?')[0].split('/')[-1]) | |
if force or self.download_needed(response, outfile, quiet): | |
self.download_file(response, outfile, kaggle.http_client(), quiet, not force) |