A more memory-efficient `HttpResponseProvider`

https://github.com/scrapinghub/scrapy-poet/blob/master/scrapy_poet/page_input_providers.py#L165-L180

Currently, the `HttpResponseProvider` creates a new `HttpResponse` instance each time it's called:

```python
class HttpResponseProvider(PageObjectInputProvider, CacheDataProviderMixin):
    """This class provides ``web_poet.page_inputs.HttpResponse`` instances."""

    provided_classes = {HttpResponse}
    name = "response_data"

    def __call__(self, to_provide: Set[Callable], response: Response):
        """Builds a ``HttpResponse`` instance using a Scrapy ``Response``"""
        return [
            HttpResponse(
                url=response.url,
                body=response.body,
                status=response.status,
                headers=HttpResponseHeaders.from_bytes_dict(response.headers),
            )
        ]
```

From [another thread](https://github.com/scrapinghub/web-poet/pull/103#discussion_r1027635862):

> Suppose the average HTML size for a particular website is 256 KB. Let's also suppose that we have 12 POs that we need to support in our MultiLayoutPage subclass. This means that for every multi layout PO instance, it holds at least 256 KB * 12 = 3 MB in memory. Assuming that we're parsing at a rate of 10 pages per second, then we're holding at least 3 MB * 10 pages = 30 MB of memory per second.

It's not a crucial issue for now but it can certainly be made more efficient by having the provider return the same `HttpResponse` instance given a response identifier. `HttpResponseProvider` already inherits from `CacheDataProviderMixin`. Perhaps we can use an in-memory cache to determine if we can return the same instance instead of creating a new one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A more memory-efficient `HttpResponseProvider` #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A more memory-efficient HttpResponseProvider #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

A more memory-efficient `HttpResponseProvider` #95