Skip to content

Commit 0b2645a

Browse files
committed
Add more bots (mostly related to AI crawlers).
1 parent 78d4bb8 commit 0b2645a

File tree

4 files changed

+42
-1
lines changed

4 files changed

+42
-1
lines changed

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Changelog
22

3+
## Unreleased
4+
5+
- Add more bots (mostly related to AI crawlers)
6+
37
## 6.0.0
48

59
- Add `Browser::Base#chromium_based?`.

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,7 @@ information.
432432

433433
## Maintainer
434434

435-
- Nando Vieira - http://nandovieira.com
435+
- Nando Vieira - https://nandovieira.com
436436

437437
## Contributors
438438

bots.yml

+21
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,28 @@ zoombot: ZoomBot
291291
zoominfobot: ZoominfoBot
292292
zyborg: Zyborg
293293

294+
# AI Crawlers
295+
# https://darkvisitors.com
296+
amazonbot: Amazon
297+
anthropic-ai: Anthropic-AI
298+
applebot: Apple
299+
bytespider: TikTok
300+
ccbot: Common Crawl
301+
chatgpt-user: ChatGPT
302+
claude-web: Anthropic-AI
303+
cohere-ai: Cohere
304+
diffbot: Diffbot
305+
facebookbot: Facebook
306+
google-extended: Google
307+
googleother: Google
308+
gptbot: ChatGPT
309+
omgili: Webz.io
310+
perplexitybot: Perplexity
311+
webz.io: Webz.io
312+
youbot: You.com
313+
294314
# Generic lib user agents go here.
315+
httpie: HTTPie
295316
eventmachine httpclient: Ruby http library
296317
go 1.1 package http: Go 1.1 package http
297318
htmlparser: HTMLParser

test/ua_bots.yml

+16
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
---
22
ADLXBOT: "Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)"
33
ADS_TXT_CRAWLER: "AdsTxtCrawler/1.0"
4+
AMAZONBOT: Amazonbot
45
ANDERSPINK: "Mozilla/5.0 (compatible; AndersPinkBot/1.0; +http://anderspink.com/bot.html)"
6+
ANTHROPIC_AI: anthropic-ai
57
APIS_GOOGLE: "APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html)"
68
APPLE_BOT: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1)"
9+
APPLEBOT: Applebot
710
ARCHIVEBOT: "ArchiveTeam ArchiveBot/20190617.01 (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
811
ASK: "Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)"
912
AWS_ELB: ELB-HealthChecker/1.0
@@ -13,25 +16,34 @@ BINGBOT: "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm
1316
BINGPREVIEW: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
1417
BUBING: "BUbiNG (+http://law.di.unimi.it/BUbiNG.html)"
1518
BUZZBOT: "Buzzbot/1.0 (Buzzbot; http://www.buzzstream.com; [email protected])"
19+
BYTESPIDER: Bytespider
20+
CCBOT: CCBot
21+
CHATGPT_USER: ChatGPT-User
1622
CHROME_LIGHTHOUSE: "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3694.0 Mobile Safari/537.36 Chrome-Lighthouse"
1723
CIPACRAWLER: "CipaCrawler/3.0 ([email protected]; http://www.domaincrawler.com/www.example.com)"
24+
CLAUDE_WEB: Claude-Web
25+
CLAUDEBOT: ClaudeBot
1826
CLOUDFLARE: "Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online) AppleWebKit/534.34"
27+
COHERE-AI: cohere-ai
1928
COMMONCRAWL: "CCBot/2.0 (http://commoncrawl.org/faq/)"
2029
COMODO_SSL_CHECKER: "COMODO SSL Checker"
2130
COPYPANTS: "Mozilla/5.0 (compatible; BotPants/1.0; Linux; [email protected]) KHTML/3.5.5 (like Gecko)"
2231
DATAFEEDWATCH: "Datafeedwatch/2.1.x"
2332
DATANYZE: "Mozilla/5.0 (X11; Datanyze; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
2433
DAUMOA: "Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server;) Daumoa 4.0"
34+
DIFFBOT: Diffbot
2535
DOMAINAREANIMATOR: "Domain Re-Animator Bot (http://domainreanimator.com) - [email protected]"
2636
DOT_BOT: "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])"
2737
DUCKDUCKGO: "DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)"
2838
EZPUBLISH: "eZ Publish Link Validator"
2939
FACEBOOK_BOT: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
40+
FACEBOOKBOT: FacebookBot
3041
FYREBOT: "Fyrebot/1.0"
3142
GARLIK: "GarlikCrawler/1.2 (http://garlik.com/, [email protected])"
3243
GERMCRAWLER: "GermCrawler"
3344
GO_1.1_PACKAGE_HTTP: "Go 1.1 package http"
3445
GO_HTTP_CLIENT: "Go-http-client"
46+
GOOGLE-EXTENDED: Google-Extended
3547
GOOGLE_BOT: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
3648
GOOGLE_IMAGE_PROXY: "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)"
3749
GOOGLE_PAGE_SPEED_INSIGHTS: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.4 (KHTML, like Gecko; Google Page Speed Insights) Chrome/22.0.1229 Safari/537.4"
@@ -40,6 +52,7 @@ GOOGLE_SITE_VERIFICATION: Mozilla/5.0 (compatible; Google-Site-Verification/1.0)
4052
GOOGLE_STACKDRIVER_UPTIME_CHECKS: "GoogleStackdriverMonitoring-UptimeChecks"
4153
GOOGLE_STRUCTURED_DATA_TESTING_TOOL2: "Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +http://developers.google.com/structured-data/testing-tool/)"
4254
GOOGLE_STRUCTURED_DATA_TESTING_TOOL: "Mozilla/5.0 (compatible; X11; Linux x86_64; Google-StructuredDataTestingTool; +http://www.google.com/webmasters/tools/richsnippets)"
55+
GPTBOT: GPTBot
4356
GRAPESHOT: "Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +http://www.grapeshot.co.uk/crawler.php)"
4457
HTTRACK: "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
4558
IMPLISENSEBOT: "ImplisenseBot 1.0"
@@ -61,7 +74,9 @@ MSNBOT_MEDIA: "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
6174
NETCRAFT2: Netcraft SSL Server Survey - contact [email protected]
6275
NETCRAFT: Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; [email protected])
6376
NEWRELICPINGER: NewRelicPinger/1.0 (12345)
77+
OMGILI: omgili
6478
PAESSLER: Mozilla/5.0 (compatible; PRTG Network Monitor (www.paessler.com); Windows)
79+
PERPLEXITYBOT: PerplexityBot
6580
PR-CY_RU: Mozilla/5.0 (compatible; PR-CY.RU; + https://a.pr-cy.ru)
6681
PRIVACYAWAREBOT: "Mozilla/5.0 (compatible; PrivacyAwareBot/1.1; +http://www.privacyaware.org)"
6782
PROXIMIC: "Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)"
@@ -98,6 +113,7 @@ YAHOO_SLURP: "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/
98113
YANDEX_DIRECT: "Mozilla/5.0 (compatible; YandexDirect/3.0; +http://yandex.com/bots)"
99114
YANDEX_METRIKA: "Mozilla/5.0 (compatible; YandexMetrika/3.0; +http://yandex.com/bots)"
100115
YANGA: "Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)"
116+
YOUBOT: YouBot
101117
ZABBIX: "Zabbix"
102118
ZOOMBOT: "ZoomBot (Linkbot 1.0 http://suite.seozoom.it/bot.html)"
103119
ZOOMINFOBOT: "ZoominfoBot (zoominfobot at zoominfo dot com)"

0 commit comments

Comments
 (0)