You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
1
# HTTP Archive BigQuery pipeline with Dataform
2
2
3
-
## Tables
3
+
This repo handles the HTTP Archive data pipeline, which takes the results of the monthly HTTP Archive run and saves this to the `httparchive` dataset in BigQuery.
4
+
5
+
## Pipelines
6
+
7
+
The pipelines are run in Dataform service in Google Cloud Platform (GCP) and are kicked off automatically on crawl completion and other events. The code in the `main` branch is used on each triggered pipeline run.
date DATE NOT NULL OPTIONS(description="YYYY-MM-DD format of the HTTP Archive monthly crawl"),
11
+
client STRING NOT NULL OPTIONS(description="Test environment: desktop or mobile"),
12
+
page STRING NOT NULL OPTIONS(description="The URL of the page being tested"),
13
+
is_root_page BOOL OPTIONS(description="Whether the page is the root of the origin."),
14
+
root_page STRING NOT NULL OPTIONS(description="The URL of the root page being tested"),
15
+
rank INT64 OPTIONS(description="Site popularity rank, from CrUX"),
16
+
url STRING NOT NULL OPTIONS(description="The URL of the request"),
17
+
is_main_document BOOL NOT NULL OPTIONS(description="Whether this request corresponds with the main HTML document of the page, which is the first HTML request after redirects"),
18
+
type STRING OPTIONS(description="Simplified description of the type of resource (script, html, css, text, other, etc)"),
19
+
index INT64 OPTIONS(description="The sequential 0-based index of the request"),
20
+
payload JSON OPTIONS(description="JSON-encoded WebPageTest result data for this request"),
21
+
summary JSON OPTIONS(description="JSON-encoded summarization of request data"),
22
+
request_headers ARRAY<STRUCT<
23
+
name STRING OPTIONS(description="Request header name"),
24
+
value STRING OPTIONS(description="Request header value")
25
+
>> OPTIONS(description="Request headers"),
26
+
response_headers ARRAY<STRUCT<
27
+
name STRING OPTIONS(description="Response header name"),
28
+
value STRING OPTIONS(description="Response header value")
0 commit comments