diff --git a/docs/community/faq.rst b/docs/community/faq.rst index 2804b1d3de..842f1e000d 100644 --- a/docs/community/faq.rst +++ b/docs/community/faq.rst @@ -8,14 +8,26 @@ This part of the documentation answers common questions about Requests. Encoded Data? ------------- -Requests automatically decompresses gzip-encoded responses, and does -its best to decode response content to unicode when possible. +Requests automatically decompresses gzip-encoded responses when you access +``Response.text`` or ``Response.content``:: + + >>> r = requests.get('https://httpbin.org/gzip') + >>> r.content # Automatically decompressed + b'{"gzipped": true, ...}' When either the `brotli `_ or `brotlicffi `_ package is installed, requests also decodes Brotli-encoded responses. +However, when using ``stream=True`` and accessing ``Response.raw``, automatic +decompression is **not** performed by default. If you need decompressed data +when streaming, you must explicitly enable it:: + + >>> r = requests.get('https://httpbin.org/gzip', stream=True) + >>> r.raw.decode_content = True + >>> data = r.raw.read() # Now decompressed + You can get direct access to the raw response (and even the socket), -if needed as well. +if needed. See :ref:`body-content-workflow` for more details on streaming responses. Custom User-Agents? diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 2ff0c7dfbf..329120984e 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -321,6 +321,34 @@ Alternatively, you can read the undecoded body from the underlying urllib3 :class:`urllib3.HTTPResponse ` at :attr:`Response.raw `. +.. note:: **Automatic decompression with streaming** + + When using ``stream=True``, responses with ``Content-Encoding: gzip`` or ``deflate`` + are **not** automatically decompressed when accessing ``Response.raw``. This differs + from the normal behavior where ``Response.content`` and ``Response.text`` provide + decompressed data automatically. + + To enable automatic decompression when reading from ``Response.raw``:: + + >>> r = requests.get('https://httpbin.org/gzip', stream=True) + >>> r.raw.decode_content = True # Enable automatic decompression + >>> data = r.raw.read() # Returns decompressed data + + This is particularly important when passing ``Response.raw`` to parsers that expect + decompressed data:: + + >>> import json + >>> r = requests.get('https://api.example.com/data.json.gz', stream=True) + >>> r.raw.decode_content = True + >>> data = json.load(r.raw) # Works correctly with decompressed stream + + # Or with XML parsers: + >>> import lxml.etree + >>> r = requests.get('https://api.example.com/large.xml.gz', stream=True) + >>> r.raw.decode_content = True + >>> for event, elem in lxml.etree.iterparse(r.raw): + ... process(elem) + If you set ``stream`` to ``True`` when making a request, Requests cannot release the connection back to the pool unless you consume all the data or call :meth:`Response.close `. This can lead to