Skip to content

Commit

Permalink
Add user agent strings to crawl recipes page
Browse files Browse the repository at this point in the history
  • Loading branch information
eliasdabbas committed May 3, 2024
1 parent 34fba64 commit 247a56b
Show file tree
Hide file tree
Showing 11 changed files with 351 additions and 3 deletions.
92 changes: 92 additions & 0 deletions advertools/code_recipes/spider_strategies.py

Large diffs are not rendered by default.

Binary file not shown.
Binary file modified docs/_build/doctrees/advertools.spider.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/_modules/advertools/spider.html
Original file line number Diff line number Diff line change
Expand Up @@ -1030,7 +1030,7 @@ <h1>Source code for advertools.spider</h1><div class="highlight"><pre>
<span class="n">custom_settings</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Crawl a website of a list of URLs based on the supplied options.</span>
<span class="sd"> Crawl a website or a list of URLs based on the supplied options.</span>

<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
Expand Down
1 change: 1 addition & 0 deletions docs/_build/html/advertools.code_recipes.html
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ <h2>Submodules<a class="headerlink" href="#submodules" title="Link to this headi
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-do-i-use-a-proxy-while-crawling">How do I use a proxy while crawling?</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-can-i-change-the-default-request-headers">How can I change the default request headers?</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#xpath-expressions-for-custom-extraction">XPath expressions for custom extraction</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#user-agent-strings-for-use-in-crawling">User-agent strings for use in crawling</a></li>
</ul>
</li>
</ul>
Expand Down
252 changes: 252 additions & 0 deletions docs/_build/html/advertools.code_recipes.spider_strategies.html

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/_build/html/advertools.html
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ <h2>Subpackages<a class="headerlink" href="#subpackages" title="Link to this hea
<li class="toctree-l4"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-do-i-use-a-proxy-while-crawling">How do I use a proxy while crawling?</a></li>
<li class="toctree-l4"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-can-i-change-the-default-request-headers">How can I change the default request headers?</a></li>
<li class="toctree-l4"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#xpath-expressions-for-custom-extraction">XPath expressions for custom extraction</a></li>
<li class="toctree-l4"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#user-agent-strings-for-use-in-crawling">User-agent strings for use in crawling</a></li>
</ul>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/advertools.spider.html
Original file line number Diff line number Diff line change
Expand Up @@ -581,7 +581,7 @@ <h2>Spider Custom Settings and Additional Functionality<a class="headerlink" hre
<dl class="py function">
<dt class="sig sig-object py" id="advertools.spider.crawl">
<span class="sig-name descname"><span class="pre">crawl</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">url_list</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_file</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">follow_links</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">allowed_domains</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exclude_url_params</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">include_url_params</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exclude_url_regex</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">include_url_regex</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">css_selectors</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">xpath_selectors</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">custom_settings</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/advertools/spider.html#crawl"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#advertools.spider.crawl" title="Link to this definition"></a></dt>
<dd><p>Crawl a website of a list of URLs based on the supplied options.</p>
<dd><p>Crawl a website or a list of URLs based on the supplied options.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
Expand Down
2 changes: 2 additions & 0 deletions docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ <h2>Online marketing productivity and analysis tools<a class="headerlink" href="
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-do-i-use-a-proxy-while-crawling">How do I use a proxy while crawling?</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-can-i-change-the-default-request-headers">How can I change the default request headers?</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#xpath-expressions-for-custom-extraction">XPath expressions for custom extraction</a></li>
<li class="toctree-l2"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#user-agent-strings-for-use-in-crawling">User-agent strings for use in crawling</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="advertools.crawlytics.html">Crawl Analytics</a><ul>
Expand Down Expand Up @@ -353,6 +354,7 @@ <h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Li
<li class="toctree-l7"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-do-i-use-a-proxy-while-crawling">How do I use a proxy while crawling?</a></li>
<li class="toctree-l7"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#how-can-i-change-the-default-request-headers">How can I change the default request headers?</a></li>
<li class="toctree-l7"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#xpath-expressions-for-custom-extraction">XPath expressions for custom extraction</a></li>
<li class="toctree-l7"><a class="reference internal" href="advertools.code_recipes.spider_strategies.html#user-agent-strings-for-use-in-crawling">User-agent strings for use in crawling</a></li>
</ul>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

0 comments on commit 247a56b

Please sign in to comment.