Skip to content

Commit c40c223

Browse files
meddullapedrosousa
andauthored
[Radar] Update url scanner docs to use V2 (#18974)
* [Radar] Update url scanner docs to use V2 * [Radar] Address hyperlint issues * [Radar] Point to the wappalyser fork we're using * [Radar] Add andre as Radar code owner * Apply suggestions from code review Co-authored-by: Pedro Sousa <[email protected]> --------- Co-authored-by: Sofia Cardita <[email protected]> Co-authored-by: Pedro Sousa <[email protected]>
1 parent af6d3fe commit c40c223

File tree

2 files changed

+60
-46
lines changed

2 files changed

+60
-46
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,8 +180,8 @@
180180

181181
# Radar
182182

183-
/src/content/docs/radar/ @meddulla @G4brym @tiagoad @cloudflare/pcx-technical-writing
184-
/src/content/changelogs/radar.yaml @meddulla @G4brym @tiagoad @cloudflare/pcx-technical-writing
183+
/src/content/docs/radar/ @meddulla @G4brym @tiagoad @andre-j3sus @cloudflare/pcx-technical-writing
184+
/src/content/changelogs/radar.yaml @meddulla @G4brym @tiagoad @andre-j3sus @cloudflare/pcx-technical-writing
185185

186186
# Reference architecture
187187

src/content/docs/radar/investigate/url-scanner.mdx

Lines changed: 58 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Once you have the token, and you know your `account_id`, you are ready to make y
2121
To submit a URL to scan, the only required information is the URL to be scanned in the `POST` request body:
2222

2323
```bash
24-
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan" \
24+
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/scan" \
2525
--header "Authorization: Bearer <API_TOKEN>" \
2626
--header "Content-Type: application/json" \
2727
--data '{
@@ -35,21 +35,15 @@ A successful response will have a status code of `200` and be similar to the fol
3535

3636
```json
3737
{
38-
"errors": [],
39-
"messages": [{
40-
"message": "Submission successful"
41-
}],
42-
"result": {
43-
"time": "2022-09-15T00:00:00Z",
44-
"url": "https://www.example.com",
45-
"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
46-
"visibility": "Public"
47-
},
48-
"success": true
38+
"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
39+
"api": "https://api.cloudflare.com/client/v4/accounts/<accountId>/urlscanner/v2/result/095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
40+
"visibility": "public",
41+
"url": "https://www.example.com",
42+
"message": "Submission successful"
4943
}
5044
```
5145

52-
The `result.uuid` property in the response above identifies the scan and will be required when fetching the scan report.
46+
The `uuid` property in the response above identifies the scan and will be required when fetching the scan report.
5347

5448
#### Submit a custom URL Scan
5549

@@ -61,53 +55,58 @@ Here's an example request body with some custom configuration options:
6155
"screenshotsResolutions": [
6256
"desktop", "mobile", "tablet"
6357
],
58+
"customagent": "XXX-my-user-agent",
59+
"referer": "example"
6460
"customHeaders": {
65-
"user-agent": "My-custom-user-agent",
61+
"Authorization": "xxx-token",
6662
},
6763
"visibility": "Unlisted"
6864
}
6965
```
7066

7167
Above, the visibility level is set as `Unlisted`, which means that the scan report won't be included in the [recent scans](https://radar.cloudflare.com/scan#recent-scans) list nor in search results. In effect, only users with knowledge of the scan ID will be able to access it.
7268

73-
There will also be three screenshots taken of the webpage, one per target device type. The [`User-Agent`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) HTTP Header will be set as "My-custom-user-agent". Note that you can set any custom HTTP header, including [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization).
69+
There will also be three screenshots taken of the webpage, one per target device type. The [`User-Agent`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) will be set as "XXX-my-user-agent". Note that you can set any custom HTTP header, including [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization).
7470

7571
### Get scan report
7672

77-
Once the URL Scan submission is made, the current progress can be checked by calling `https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan/{scan_id}`. The `scan_id` will be the `result.uuid` value returned in the previous response.
73+
Once the URL Scan submission is made, the current progress can be checked by calling `https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/result/{scan_id}`. The `scan_id` will be the `uuid` value returned in the previous response.
7874

79-
While the scan is in progress, the HTTP status code will be `202`, once it's finished it will be `200`. Clients are advised to poll every 10-30 seconds.
75+
While the scan is in progress, the HTTP status code will be `404`; once it is finished, it will be `200`. Cloudflare recommends that you poll every 10-30 seconds.
8076

81-
The response will include, among others, the following top properties in `result.scan`:
77+
The response will include, among others, the following top properties:
8278

8379
* `task` - Information on the scan submission.
84-
* `page` - Information pertaining to the primary request (for example, response cookies) and the webpage itself (e.g. console messages).
85-
* `meta` - Meta processors output including detected technologies, categories, rank and others.
86-
* `ips` - IPs contacted.
87-
* `asns` - AS Numbers contacted.
88-
* `geo` - GeoIP information derived from contacted IPs.
89-
* `domains` - Hostnames contacted, including `dns` record information.
90-
* `links` - Outgoing links detected in the DOM.
91-
* `performance` - Timings as given by the [`PerformanceNavigationTiming`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming) interface.
92-
* `certificates` - TLS certificates of HTTP responses.
80+
* `page` - Information pertaining to the primary response, for example IP address, ASN, server, and page redirect history.
81+
* `data.requests` - Request chains involved in the page load.
82+
* `data.cookies` - Cookies set by the page.
83+
* `data.globals` - Non-standard JavaScript global variables.
84+
* `data.console` - Console logs.
85+
* `data.performance` - Timings as given by the [`PerformanceNavigationTiming`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming) interface.
86+
* `meta` - Meta processors output including detected technologies, domain and URL categories, rank, geolocation information, and others.
87+
* `lists.ips` - IPs contacted.
88+
* `lists.asns` - AS Numbers contacted.
89+
* `lists.domains` - Hostnames contacted, including `dns` record information.
90+
* `lists.hashes` - Hashes of response bodies, of the main page HTML structure, screenshots, and favicons.
91+
* `lists.certificates` - TLS certificates of HTTP responses.
9392
* `verdicts` - Verdicts on malicious content.
9493

9594
Some examples of more specific properties include:
9695

9796
* `task.uuid` - ID of the scan.
98-
* `task.effectiveUrl` - URL of the primary request, after all HTTP redirects.
97+
* `task.url` - Submitted URL of the scan. May differ from final URL (`page.url`) if there are HTTP redirects.
9998
* `task.success` - Whether scan was successful or not. Scans can fail for various reasons, including DNS errors.
10099
* `task.status` - Current scan status, for example, `Queued`, `InProgress`, or `Finished`.
101-
* `meta.processors.categories` - Cloudflare categories of the main hostname contacted.
102-
* `meta.processors.securityRiskCategories` - Cloudflare categories, representing a security risk, of the main hostname contacted.
100+
* `meta.processors.domainCategories` - Cloudflare categories of the main hostname contacted.
103101
* `meta.processors.phishing` - What kind of phishing, if any, was detected.
104-
* `meta.processors.rank` - [Cloudflare Radar Rank](http://blog.cloudflare.com/radar-domain-rankings/) of the main hostname contacted.
105-
* `meta.processors.tech` - What kind of technologies were detected as being in use by the website, with the help of [Wappalyzer](https://github.com/wappalyzer/wappalyzer).
102+
* `meta.processors.radarRank` - [Cloudflare Radar Rank](http://blog.cloudflare.com/radar-domain-rankings/) of the main hostname contacted.
103+
* `meta.processors.wappa` - The kind of technologies detected as being in use by the website, with the help of [Wappalyzer](https://github.com/Lissy93/wapalyzer).
104+
* `page.url` - URL of the primary request, after all HTTP redirects.
106105
* `page.country` - GeoIP country name of the main IP address contacted.
107-
* `page.cookies` - Cookies set by the page.
108-
* `page.console` - JavaScript console messages
109-
* `page.js.variables` - Non-standard JavaScript global variables.
110-
* `page.securityViolations` - <GlossaryTooltip term="content security policy (CSP)" link="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">CSP</GlossaryTooltip> or [SRI](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity) violations.
106+
* `page.history` - Main page history, including any HTTP redirects.
107+
* `page.screenshot` - Various hashes of the main screenshot. Can be used to search for sites with similar screenshots.
108+
* `page.domStructHash` - HTML structure hash. Use it to search for sites with similar structure.
109+
* `page.favicon.hash` - MD5 hash of the favicon.
111110
* `verdicts.overall.malicious` - Whether the website was considered malicious *at the time of the scan*. Please check the remaining properties for each subsystem(s) for specific threats detected.
112111

113112
The [Get URL Scan](/api/resources/url_scanner/subresources/scans/methods/get/) API endpoint documentation contains the full response schema.
@@ -116,31 +115,46 @@ To fetch the scan's [screenshots](/api/resources/url_scanner/subresources/scans/
116115

117116
### Search scans
118117

119-
`Public` scans can also be searched for. To search for scans to the hostname `google.com`, use the query parameter `page_hostname=google.com`:
118+
Use a subset of ElasticSearch Query syntax to filter scans. Search results will include `Public` scans and your own `Unlisted` scans.
119+
120+
To search for scans to the hostname `google.com`, use the query parameter `q=page.domain:"google.com"`:
120121

121122
```bash
122-
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan?page_hostname=google.com" \
123+
curl 'https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/search?q=page.domain:google.com' \
123124
--header "Authorization: Bearer <API_TOKEN>"
124125
```
125126

126-
Search results will also include your *own* `Unlisted` scans.
127127

128128
If, instead, you wanted to search for scans that made at least one request to the hostname `cdnjs.cloudflare.com`, for example sites that use a JavaScript library hosted at `cdnjs.cloudflare.com`, use the query parameter `hostname=cdnjs.cloudflare.com`:
129129

130130
```bash
131-
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan?hostname=cdnjs.cloudflare.com" \
131+
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/search?q=domain:cdnjs.cloudflare.com" \
132132
--header "Authorization: Bearer <API_TOKEN>"
133133
```
134134

135-
You can also search for the hash in the URL Scanner API.
135+
Some other example queries:
136+
137+
- `task.url:"https://google.com" OR task.url:"https://www.google.com"`: Search for scans whose submitted URL was either `google.com` or `www.google.com`. URLs must be enclosed in quotes.
138+
- `page.url:"https://google.com" AND NOT task.url:"https://google.com"`: Search for scans to `google.com` whose submitted URL was not `google.com` (that is, sites that redirected to google.com).
139+
- `page.domain:microsoft AND verdicts.malicious:true AND NOT page.domain:microsoft.com`: Malicious scans whose hostname starts with `microsoft`. Would match domains like `microsoft.phish.com`.
140+
- `apikey:me AND date:[2024-01 TO 2024-10]`: Your scans from January 2024 to October 2024.
141+
- `page.domain:(blogspot OR www.blogspot)`: Searches for scans whose main domain starts with `blogspot` or with `www.blogspot`.
142+
- `date:>now-7d AND path:okta-sign-in.min.js`: Scans from the last seven days with any request path that ends with `okta-sign-in.min.js`.
143+
- `page.asn:AS24940 AND hash:-557369673`: Websites hosted in AS24940 where a resource with the given hash was retrieved.
144+
- `hash:8f662c2ce9472ba8d03bfeb8cdae112dbc0426f99da01c5d70c7eb4afd5893ca`: Using the hash at `page.domStructHash` search for other scans with the same HTML structure hash.
136145

137146
Go to [Search URL scans](/api/resources/url_scanner/subresources/scans/methods/list/) in the API documentation for the full list of available options.
138147

139-
Alternatively, you can search for the hash on the [Cloudflare dashboard](https://dash.cloudflare.com/) by selecting your account > **Security Center** > **Investigate** > Enter the hash > Select **Search**.
140148

141-
### Search filters
149+
### Security Center
150+
151+
Alternatively, you can search in the Security Center:
152+
153+
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account.
154+
2. Go to **Security Center** > **Investigate**.
155+
3. Enter your query and select **Search**.
142156

143-
You can search through the URL Scanner [reports](/radar/investigate/url-scanner/#search-filters) and retrieve information filtered by:
157+
In the Security Center, you can retrieve information already pre-filtered by:
144158

145159
- Similar screenshot
146160
- Identical favicon

0 commit comments

Comments
 (0)