Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Radar] Update url scanner docs to use V2 #18974

Merged
merged 5 commits into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,8 @@

# Radar

/src/content/docs/radar/ @meddulla @G4brym @tiagoad @cloudflare/pcx-technical-writing
/src/content/changelogs/radar.yaml @meddulla @G4brym @tiagoad @cloudflare/pcx-technical-writing
/src/content/docs/radar/ @meddulla @G4brym @tiagoad @andre-j3sus @cloudflare/pcx-technical-writing
/src/content/changelogs/radar.yaml @meddulla @G4brym @tiagoad @andre-j3sus @cloudflare/pcx-technical-writing

# Reference architecture

Expand Down
98 changes: 54 additions & 44 deletions src/content/docs/radar/investigate/url-scanner.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Once you have the token, and you know your `account_id`, you are ready to make y
To submit a URL to scan, the only required information is the URL to be scanned in the `POST` request body:

```bash
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan" \
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/scan" \
--header "Authorization: Bearer <API_TOKEN>" \
--header "Content-Type: application/json" \
--data '{
Expand All @@ -35,21 +35,15 @@ A successful response will have a status code of `200` and be similar to the fol

```json
{
"errors": [],
"messages": [{
"message": "Submission successful"
}],
"result": {
"time": "2022-09-15T00:00:00Z",
"url": "https://www.example.com",
"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"visibility": "Public"
},
"success": true
"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"api": "https://api.cloudflare.com/client/v4/accounts/<accountId>/urlscanner/v2/result/095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"visibility": "public"
"url": "https://www.example.com"
meddulla marked this conversation as resolved.
Show resolved Hide resolved
"message": "Submission successful"
}
```

The `result.uuid` property in the response above identifies the scan and will be required when fetching the scan report.
The `uuid` property in the response above identifies the scan and will be required when fetching the scan report.

#### Submit a custom URL Scan

Expand All @@ -61,53 +55,58 @@ Here's an example request body with some custom configuration options:
"screenshotsResolutions": [
"desktop", "mobile", "tablet"
],
"customagent": "XXX-my-user-agent",
"referer": "example"
"customHeaders": {
"user-agent": "My-custom-user-agent",
"Authorization": "xxx-token",
},
"visibility": "Unlisted"
}
```

Above, the visibility level is set as `Unlisted`, which means that the scan report won't be included in the [recent scans](https://radar.cloudflare.com/scan#recent-scans) list nor in search results. In effect, only users with knowledge of the scan ID will be able to access it.

There will also be three screenshots taken of the webpage, one per target device type. The [`User-Agent`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) HTTP Header will be set as "My-custom-user-agent". Note that you can set any custom HTTP header, including [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization).
There will also be three screenshots taken of the webpage, one per target device type. The [`User-Agent`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) will be set as "XXX-my-user-agent". Note that you can set any custom HTTP header, including [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization).

### Get scan report

Once the URL Scan submission is made, the current progress can be checked by calling `https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan/{scan_id}`. The `scan_id` will be the `result.uuid` value returned in the previous response.
Once the URL Scan submission is made, the current progress can be checked by calling `https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/result/{scan_id}`. The `scan_id` will be the `uuid` value returned in the previous response.

While the scan is in progress, the HTTP status code will be `202`, once it's finished it will be `200`. Clients are advised to poll every 10-30 seconds.
While the scan is in progress, the HTTP status code will be `404`, once it's finished it will be `200`. Clients are advised to poll every 10-30 seconds.
meddulla marked this conversation as resolved.
Show resolved Hide resolved

The response will include, among others, the following top properties in `result.scan`:
The response will include, among others, the following top properties:

* `task` - Information on the scan submission.
* `page` - Information pertaining to the primary request (for example, response cookies) and the webpage itself (e.g. console messages).
* `meta` - Meta processors output including detected technologies, categories, rank and others.
* `ips` - IPs contacted.
* `asns` - AS Numbers contacted.
* `geo` - GeoIP information derived from contacted IPs.
* `domains` - Hostnames contacted, including `dns` record information.
* `links` - Outgoing links detected in the DOM.
* `performance` - Timings as given by the [`PerformanceNavigationTiming`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming) interface.
* `certificates` - TLS certificates of HTTP responses.
* `page` - Information pertaining to the primary response, for example IP address, ASN, server and page redirect history.
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `data.requests` - Request chains involved in the page load.
* `data.cookies` - Cookies set by the page.
* `data.globals` - Non-standard JavaScript global variables.
* `data.console` - Console logs.
* `data.performance` - Timings as given by the [`PerformanceNavigationTiming`](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming) interface.
* `meta` - Meta processors output including detected technologies, domain and URL categories, rank, geoip information and others.
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `lists.ips` - IPs contacted.
* `lists.asns` - AS Numbers contacted.
* `lists.domains` - Hostnames contacted, including `dns` record information.
* `lists.hashes` - Hashes of response bodies, of the main page HTML structure, screenshots and favicons.
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `lists.certificates` - TLS certificates of HTTP responses.
* `verdicts` - Verdicts on malicious content.

Some examples of more specific properties include:

* `task.uuid` - ID of the scan.
* `task.effectiveUrl` - URL of the primary request, after all HTTP redirects.
* `task.url` - Submitted URL of the scan. May differ from final URL (`page.url`) if there are HTTP redirects.
* `task.success` - Whether scan was successful or not. Scans can fail for various reasons, including DNS errors.
* `task.status` - Current scan status, for example, `Queued`, `InProgress`, or `Finished`.
* `meta.processors.categories` - Cloudflare categories of the main hostname contacted.
* `meta.processors.securityRiskCategories` - Cloudflare categories, representing a security risk, of the main hostname contacted.
* `meta.processors.domainCategories` - Cloudflare categories of the main hostname contacted.
* `meta.processors.phishing` - What kind of phishing, if any, was detected.
* `meta.processors.rank` - [Cloudflare Radar Rank](http://blog.cloudflare.com/radar-domain-rankings/) of the main hostname contacted.
* `meta.processors.tech` - What kind of technologies were detected as being in use by the website, with the help of [Wappalyzer](https://github.com/wappalyzer/wappalyzer).
* `meta.processors.radarRank` - [Cloudflare Radar Rank](http://blog.cloudflare.com/radar-domain-rankings/) of the main hostname contacted.
* `meta.processors.wappa` - What kind of technologies were detected as being in use by the website, with the help of [Wappalyzer](https://github.com/Lissy93/wapalyzer).
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `page.url` - URL of the primary request, after all HTTP redirects.
* `page.country` - GeoIP country name of the main IP address contacted.
* `page.cookies` - Cookies set by the page.
* `page.console` - JavaScript console messages
* `page.js.variables` - Non-standard JavaScript global variables.
* `page.securityViolations` - <GlossaryTooltip term="content security policy (CSP)" link="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">CSP</GlossaryTooltip> or [SRI](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity) violations.
* `page.history` - Main page history, including any HTTP redirects.
* `page.screenshot` - Various hashes of the main screenshot, can be used to search for sites with similar screenshots.
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `page.domStructHash` - HTML structure hash, use it to search for sites with similar structure.
meddulla marked this conversation as resolved.
Show resolved Hide resolved
* `page.favicon.hash` - MD5 hash of the favicon.
* `verdicts.overall.malicious` - Whether the website was considered malicious *at the time of the scan*. Please check the remaining properties for each subsystem(s) for specific threats detected.

The [Get URL Scan](/api/resources/url_scanner/subresources/scans/methods/get/) API endpoint documentation contains the full response schema.
Expand All @@ -116,31 +115,42 @@ To fetch the scan's [screenshots](/api/resources/url_scanner/subresources/scans/

### Search scans

`Public` scans can also be searched for. To search for scans to the hostname `google.com`, use the query parameter `page_hostname=google.com`:
Use a subset of ElasticSearch Query syntax to filter scans. Search results will include `Public` scans and your *own* `Unlisted` scans.
meddulla marked this conversation as resolved.
Show resolved Hide resolved

To search for scans to the hostname `google.com`, use the query parameter `q=page.domain:"google.com"`:

```bash
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan?page_hostname=google.com" \
curl 'https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/search?q=page.domain:google.com' \
--header "Authorization: Bearer <API_TOKEN>"
```

Search results will also include your *own* `Unlisted` scans.

If, instead, you wanted to search for scans that made at least one request to the hostname `cdnjs.cloudflare.com`, for example sites that use a JavaScript library hosted at `cdnjs.cloudflare.com`, use the query parameter `hostname=cdnjs.cloudflare.com`:

```bash
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/scan?hostname=cdnjs.cloudflare.com" \
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/urlscanner/v2/search?q=domain:cdnjs.cloudflare.com" \
--header "Authorization: Bearer <API_TOKEN>"
```

You can also search for the hash in the URL Scanner API.
Some other example queries:

- `task.url:"https://google.com" OR task.url:"https://www.google.com"` - search for scans whose submitted URL was either `google.com` or `www.google.com`. URLs *must* be enclosed in quotes.
- `page.url:"https://google.com" AND NOT task.url:"https://google.com"` - search for scans to `google.com` whose submitted url was not `google.com` (that is, sites that redirected to google.com)
- `page.domain:microsoft AND verdicts.malicious:true AND NOT page.domain:microsoft.com`: malicious scans whose hostname starts with "microsoft". Would match domains like "microsoft.phish.com"
- `apikey:me AND date:[2024-01 TO 2024-10]`: my scans from 2024 January to 2024 October.
- `page.domain:(blogspot OR www.blogspot)`: Searches for scans whose main domain starts with `blogspot` or with `www.blogspot`
- `date:>now-7d AND path:okta-sign-in.min.js`: scans from the last 7 days with any request path that ends with `okta-sign-in.min.js`
- `page.asn:AS24940 AND hash:-557369673`: Websites hosted in AS24940 where a resource with the given hash was retrieved.
- `hash:8f662c2ce9472ba8d03bfeb8cdae112dbc0426f99da01c5d70c7eb4afd5893ca`: Using the hash at `page.domStructHash` search for other scans with the same HTML structure hash.
meddulla marked this conversation as resolved.
Show resolved Hide resolved

Go to [Search URL scans](/api/resources/url_scanner/subresources/scans/methods/list/) in the API documentation for the full list of available options.

Alternatively, you can search for the hash on the [Cloudflare dashboard](https://dash.cloudflare.com/) by selecting your account > **Security Center** > **Investigate** > Enter the hash > Select **Search**.

### Search filters
### Security Center

Alternatively, you can search in the **Security Center**, by going to the [Cloudflare dashboard](https://dash.cloudflare.com/), selecting your account, and then going to **Security Center** > **Investigate** > Enter query > Select **Search**.
meddulla marked this conversation as resolved.
Show resolved Hide resolved

You can search through the URL Scanner [reports](/radar/investigate/url-scanner/#search-filters) and retrieve information filtered by:
In the Security Center, you can easily retrieve information already pre-filtered by:
meddulla marked this conversation as resolved.
Show resolved Hide resolved

- Similar screenshot
- Identical favicon
Expand Down
Loading