scrapy-plugins · zloidemon · Mar 23, 2021 · Mar 23, 2021 · Mar 23, 2021 · Mar 26, 2021
diff --git a/LICENSE b/LICENSE
@@ -1,4 +1,4 @@
-Copyright 2020 Scrapinghub
+Copyright (c) 2021 Zyte Group Ltd
 
 Redistribution and use in source and binary forms, with or without modification,
 are permitted provided that the following conditions are met:

diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
-# Scrapy Middleware for Crawlera Simple Fetch API
-[![actions](https://github.com/scrapy-plugins/scrapy-crawlera-fetch/workflows/Build/badge.svg)](https://github.com/scrapy-plugins/scrapy-crawlera-fetch/actions)
-[![codecov](https://codecov.io/gh/scrapy-plugins/scrapy-crawlera-fetch/branch/master/graph/badge.svg)](https://codecov.io/gh/scrapy-plugins/scrapy-crawlera-fetch)
+# Scrapy Middleware for Zyte Smart Proxy Manager Simple Fetch API
+[![actions](https://github.com/scrapy-plugins/scrapy-zyte-proxy-fetch/workflows/Build/badge.svg)](https://github.com/scrapy-plugins/scrapy-zyte-proxy-fetch/actions)
+[![codecov](https://codecov.io/gh/scrapy-plugins/scrapy-zyte-proxy-fetch/branch/master/graph/badge.svg)](https://codecov.io/gh/scrapy-plugins/scrapy-zyte-proxy-fetch)
 
 This package provides a Scrapy
 [Downloader Middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html)
 to transparently interact with the
-[Crawlera Fetch API](https://doc.scrapinghub.com/crawlera-fetch-api.html).
+[Zyte Smart Proxy Manager Fetch API](https://docs.zyte.com/smart-proxy-manager/fetch-api.html).
 
 
 ## Requirements
@@ -18,70 +18,70 @@ to transparently interact with the
 
 Not yet available on PyPI. However, it can be installed directly from GitHub:
 
-`pip install git+ssh://[email protected]/scrapy-plugins/scrapy-crawlera-fetch.git`
+`pip install git+ssh://[email protected]/scrapy-plugins/scrapy-zyte-proxy-fetch.git`
 
 or
 
-`pip install git+https://github.com/scrapy-plugins/scrapy-crawlera-fetch.git`
+`pip install git+https://github.com/scrapy-plugins/scrapy-zyte-proxy-fetch.git`
 
 
 ## Configuration
 
-Enable the `CrawleraFetchMiddleware` via the
+Enable the `SmartProxyManagerFetchMiddleware` via the
 [`DOWNLOADER_MIDDLEWARES`](https://docs.scrapy.org/en/latest/topics/settings.html#downloader-middlewares)
 setting:
 
 ```
 DOWNLOADER_MIDDLEWARES = {
-    "crawlera_fetch.CrawleraFetchMiddleware": 585,
+    "zyte_proxy_fetch.SmartProxyManagerFetchMiddleware": 585,
 }
 ```
 
 Please note that the middleware needs to be placed before the built-in `HttpCompressionMiddleware`
 middleware (which has a priority of 590), otherwise incoming responses will be compressed and the
-Crawlera middleware won't be able to handle them.
+Smart Proxy Manager middleware won't be able to handle them.
 
 ### Settings
 
-* `CRAWLERA_FETCH_ENABLED` (type `bool`, default `False`). Whether or not the middleware will be enabled,
-    i.e. requests should be downloaded using the Crawlera Fetch API. The `crawlera_fetch_enabled` spider
+* `ZYTE_PROXY_FETCH_ENABLED` (type `bool`, default `False`). Whether or not the middleware will be enabled,
+    i.e. requests should be downloaded using the Smart Proxy Manager Fetch API. The `zyte_proxy_fetch_enabled` spider
     attribute takes precedence over this setting.
 
-* `CRAWLERA_FETCH_APIKEY` (type `str`). API key to be used to authenticate against the Crawlera endpoint
+* `ZYTE_PROXY_FETCH_APIKEY` (type `str`). API key to be used to authenticate against the Smart Proxy Manager endpoint
     (mandatory if enabled)
 
-* `CRAWLERA_FETCH_URL` (Type `str`, default `"http://fetch.crawlera.com:8010/fetch/v2/"`).
-    The endpoint of a specific Crawlera instance
+* `ZYTE_PROXY_FETCH_URL` (Type `str`, default `"http://fetch.crawlera.com:8010/fetch/v2/"`).
+    The endpoint of a specific Smart Proxy Manager instance
 
-* `CRAWLERA_FETCH_RAISE_ON_ERROR` (type `bool`, default `True`). Whether or not the middleware will
+* `ZYTE_PROXY_FETCH_RAISE_ON_ERROR` (type `bool`, default `True`). Whether or not the middleware will
     raise an exception if an error occurs while downloading or decoding a request. If `False`, a
     warning will be logged and the raw upstream response will be returned upon encountering an error.
 
-* `CRAWLERA_FETCH_DOWNLOAD_SLOT_POLICY` (type `enum.Enum` - `crawlera_fetch.DownloadSlotPolicy`,
+* `ZYTE_PROXY_FETCH_DOWNLOAD_SLOT_POLICY` (type `enum.Enum` - `zyte_proxy_fetch.DownloadSlotPolicy`,
     default `DownloadSlotPolicy.Domain`).
     Possible values are `DownloadSlotPolicy.Domain`, `DownloadSlotPolicy.Single`,
     `DownloadSlotPolicydefault` (Scrapy default). If set to `DownloadSlotPolicy.Domain`, please
     consider setting `SCHEDULER_PRIORITY_QUEUE="scrapy.pqueues.DownloaderAwarePriorityQueue"` to
     make better usage of concurrency options and avoid delays.
 
-* `CRAWLERA_FETCH_DEFAULT_ARGS` (type `dict`, default `{}`)
-    Default values to be sent to the Crawlera Fetch API. For instance, set to `{"device": "mobile"}`
+* `ZYTE_PROXY_FETCH_DEFAULT_ARGS` (type `dict`, default `{}`)
+    Default values to be sent to the Smart Proxy Manager Fetch API. For instance, set to `{"device": "mobile"}`
     to render all requests with a mobile profile.
 
 ### Spider attributes
 
-* `crawlera_fetch_enabled` (type `bool`, default `False`). Whether or not the middleware will be enabled.
-    Takes precedence over the `CRAWLERA_FETCH_ENABLED` setting.
+* `zyte_proxy_fetch_enabled` (type `bool`, default `False`). Whether or not the middleware will be enabled.
+    Takes precedence over the `ZYTE_PROXY_FETCH_ENABLED` setting.
 
 ### Log formatter
 
 Since the URL for outgoing requests is modified by the middleware, by default the logs will show
-the URL for the Crawlera endpoint. To revert this behaviour you can enable the provided
+the URL for the Smart Proxy Manager endpoint. To revert this behaviour you can enable the provided
 log formatter by overriding the [`LOG_FORMATTER`](https://docs.scrapy.org/en/latest/topics/settings.html#log-formatter)
 setting:
 
 ```
-LOG_FORMATTER = "crawlera_fetch.CrawleraFetchLogFormatter"
+LOG_FORMATTER = "zyte_proxy_fetch.SmartProxyManagerLogFormatter"
 ```
 
 Note that the ability to override the error messages for spider and download errors was added
@@ -92,7 +92,7 @@ to the `Request.flags` attribute, which is shown in the logs by default.
 ## Usage
 
 If the middleware is enabled, by default all requests will be redirected to the specified
-Crawlera Fetch endpoint, and modified to comply with the format expected by the Crawlera Fetch API.
+Smart Proxy Manager Fetch endpoint, and modified to comply with the format expected by the Smart Proxy Manager Fetch API.
 The three basic processed arguments are `method`, `url` and `body`.
 For instance, the following request:
 
@@ -103,7 +103,7 @@ Request(url="https://httpbin.org/post", method="POST", body="foo=bar")
 will be converted to:
 
 ```python
-Request(url="<Crawlera Fetch API endpoint>", method="POST",
+Request(url="<Smart Proxy Manager Fetch API endpoint>", method="POST",
         body='{"url": "https://httpbin.org/post", "method": "POST", "body": "foo=bar"}',
         headers={"Authorization": "Basic <derived from APIKEY>",
                  "Content-Type": "application/json",
@@ -112,12 +112,12 @@ Request(url="<Crawlera Fetch API endpoint>", method="POST",
 
 ### Additional arguments
 
-Additional arguments could be specified under the `crawlera_fetch.args` `Request.meta` key. For instance:
+Additional arguments could be specified under the `zyte_proxy_fetch.args` `Request.meta` key. For instance:
 
 ```python
 Request(
     url="https://example.org",
-    meta={"crawlera_fetch": {"args": {"region": "us", "device": "mobile"}}},
+    meta={"zyte_proxy_fetch": {"args": {"region": "us", "device": "mobile"}}},
 )
 ```
 
@@ -127,26 +127,26 @@ is translated into the following body:
 '{"url": "https://example.org", "method": "GET", "body": "", "region": "us", "device": "mobile"}'
 ```
 
-Arguments set for a specific request through the `crawlera_fetch.args` key override those
-set with the `CRAWLERA_FETCH_DEFAULT_ARGS` setting.
+Arguments set for a specific request through the `zyte_proxy_fetch.args` key override those
+set with the `ZYTE_PROXY_FETCH_DEFAULT_ARGS` setting.
 
-### Accessing original request and raw Crawlera response
+### Accessing original request and raw Zyte Smart Proxy Manager response
 
 The `url`, `method`, `headers` and `body` attributes of the original request are available under
-the `crawlera_fetch.original_request` `Response.meta` key.
+the `zyte_proxy_fetch.original_request` `Response.meta` key.
 
-The `status`, `headers` and `body` attributes of the upstream Crawlera response are available under
-the `crawlera_fetch.upstream_response` `Response.meta` key.
+The `status`, `headers` and `body` attributes of the upstream Smart Proxy Manager response are available under
+the `zyte_proxy_fetch.upstream_response` `Response.meta` key.
 
 ### Skipping requests
 
-You can instruct the middleware to skip a specific request by setting the `crawlera_fetch.skip`
+You can instruct the middleware to skip a specific request by setting the `zyte_proxy_fetch.skip`
 [Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta)
 key:
 
 ```python
 Request(
     url="https://example.org",
-    meta={"crawlera_fetch": {"skip": True}},
+    meta={"zyte_proxy_fetch": {"skip": True}},
 )
 ```
diff --git a/crawlera_fetch/__init__.py b/crawlera_fetch/__init__.py
diff --git a/setup.py b/setup.py
@@ -6,15 +6,15 @@
 
 
 setuptools.setup(
-    name="scrapy-crawlera-fetch",
+    name="scrapy-zyte-proxy-fetch",
     version="0.0.1",
     license="BSD",
-    description="Scrapy downloader middleware to interact with Crawlera Simple Fetch API",
+    description="Scrapy downloader middleware to interact with Zyte Smart Proxy Manager Fetch API",
     long_description=long_description,
-    author="Scrapinghub",
-    author_email="info@scrapinghub.com",
-    url="https://github.com/scrapy-plugins/scrapy-crawlera-fetch",
-    packages=["crawlera_fetch"],
+    author="Zyte",
+    author_email="opensource@zyte.com",
+    url="https://github.com/scrapy-plugins/scrapy-zyte-proxy-fetch",
+    packages=["zyte_proxy_fetch"],
     classifiers=[
         "Development Status :: 1 - Planning",
         "License :: OSI Approved :: BSD License",

diff --git a/tests/data/__init__.py b/tests/data/__init__.py
@@ -1,6 +1,6 @@
 SETTINGS = {
-    "CRAWLERA_FETCH_ENABLED": True,
-    "CRAWLERA_FETCH_URL": "https://example.org",
-    "CRAWLERA_FETCH_APIKEY": "secret-key",
-    "CRAWLERA_FETCH_APIPASS": "secret-pass",
+    "ZYTE_PROXY_FETCH_ENABLED": True,
+    "ZYTE_PROXY_FETCH_URL": "https://example.org",
+    "ZYTE_PROXY_FETCH_APIKEY": "secret-key",
+    "ZYTE_PROXY_FETCH_APIPASS": "secret-pass",
 }
diff --git a/tests/data/requests.py b/tests/data/requests.py
@@ -15,7 +15,7 @@ def get_test_requests():
         url="https://httpbin.org/anything",
         method="GET",
         meta={
-            "crawlera_fetch": {
+            "zyte_proxy_fetch": {
                 "args": {
                     "render": "no",
                     "region": "us",
@@ -26,19 +26,19 @@ def get_test_requests():
         },
     )
     expected1 = Request(
-        url=SETTINGS["CRAWLERA_FETCH_URL"],
+        url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
         callback=foo_spider.foo_callback,
         method="POST",
         headers={
             "Authorization": basic_auth_header(
-                SETTINGS["CRAWLERA_FETCH_APIKEY"], SETTINGS["CRAWLERA_FETCH_APIPASS"]
+                SETTINGS["ZYTE_PROXY_FETCH_APIKEY"], SETTINGS["ZYTE_PROXY_FETCH_APIPASS"]
             ),
             "Content-Type": "application/json",
             "Accept": "application/json",
             "X-Crawlera-JobId": "1/2/3",
         },
         meta={
-            "crawlera_fetch": {
+            "zyte_proxy_fetch": {
                 "args": {
                     "render": "no",
                     "region": "us",
@@ -72,22 +72,22 @@ def get_test_requests():
     original2 = FormRequest(
         url="https://httpbin.org/post",
         callback=foo_spider.foo_callback,
-        meta={"crawlera_fetch": {"args": {"device": "desktop"}}},
+        meta={"zyte_proxy_fetch": {"args": {"device": "desktop"}}},
         formdata={"foo": "bar"},
     )
     expected2 = FormRequest(
-        url=SETTINGS["CRAWLERA_FETCH_URL"],
+        url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
         method="POST",
         headers={
             "Authorization": basic_auth_header(
-                SETTINGS["CRAWLERA_FETCH_APIKEY"], SETTINGS["CRAWLERA_FETCH_APIPASS"]
+                SETTINGS["ZYTE_PROXY_FETCH_APIKEY"], SETTINGS["ZYTE_PROXY_FETCH_APIPASS"]
             ),
             "Content-Type": "application/json",
             "Accept": "application/json",
             "X-Crawlera-JobId": "1/2/3",
         },
         meta={
-            "crawlera_fetch": {
+            "zyte_proxy_fetch": {
                 "args": {"device": "desktop"},
                 "original_request": request_to_dict(original2, spider=foo_spider),
                 "timing": {"start_ts": mocked_time()},
@@ -116,7 +116,7 @@ def get_test_requests():
             "original": Request(
                 url="https://example.org",
                 method="HEAD",
-                meta={"crawlera_fetch": {"skip": True}},
+                meta={"zyte_proxy_fetch": {"skip": True}},
             ),
             "expected": None,
         }

diff --git a/tests/data/responses.py b/tests/data/responses.py
@@ -15,7 +15,7 @@
 test_responses.append(
     {
         "original": HtmlResponse(
-            url=SETTINGS["CRAWLERA_FETCH_URL"],
+            url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
             status=200,
             headers={
                 "Content-Type": "application/json",
@@ -26,9 +26,9 @@
                 "Connection": "close",
             },
             request=Request(
-                url=SETTINGS["CRAWLERA_FETCH_URL"],
+                url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
                 meta={
-                    "crawlera_fetch": {
+                    "zyte_proxy_fetch": {
                         "timing": {"start_ts": mocked_time()},
                         "original_request": request_to_dict(
                             Request("https://fake.host.com"),
@@ -51,7 +51,7 @@
 test_responses.append(
     {
         "original": HtmlResponse(
-            url=SETTINGS["CRAWLERA_FETCH_URL"],
+            url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
             status=200,
             headers={
                 "Content-Type": "application/json",
@@ -62,9 +62,9 @@
                 "Connection": "close",
             },
             request=Request(
-                url=SETTINGS["CRAWLERA_FETCH_URL"],
+                url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
                 meta={
-                    "crawlera_fetch": {
+                    "zyte_proxy_fetch": {
                         "timing": {"start_ts": mocked_time()},
                         "original_request": request_to_dict(
                             Request("https://httpbin.org/get"),
@@ -97,7 +97,7 @@
 test_responses.append(
     {
         "original": HtmlResponse(
-            url=SETTINGS["CRAWLERA_FETCH_URL"],
+            url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
             status=200,
             headers={
                 "Content-Type": "application/json",
@@ -108,9 +108,9 @@
                 "Connection": "close",
             },
             request=Request(
-                url=SETTINGS["CRAWLERA_FETCH_URL"],
+                url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
                 meta={
-                    "crawlera_fetch": {
+                    "zyte_proxy_fetch": {
                         "timing": {"start_ts": mocked_time()},
                         "original_request": request_to_dict(
                             Request("https://example.org"),
@@ -164,17 +164,17 @@
 test_responses.append(
     {
         "original": HtmlResponse(
-            url=SETTINGS["CRAWLERA_FETCH_URL"],
+            url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
             status=200,
             headers={
                 "Content-Type": "application/json",
                 "Content-Encoding": "gzip",
                 "Date": "Fri, 24 Apr 2020 18:22:10 GMT",
             },
             request=Request(
-                url=SETTINGS["CRAWLERA_FETCH_URL"],
+                url=SETTINGS["ZYTE_PROXY_FETCH_URL"],
                 meta={
-                    "crawlera_fetch": {
+                    "zyte_proxy_fetch": {
                         "timing": {"start_ts": mocked_time()},
                         "original_request": request_to_dict(
                             Request("http://httpbin.org/ip"),