This demo introduces the GUAC GraphQL API. It covers the basic node "noun" and
"verb" types and what they contain. Also, it explores the server-side path
query and demonstrates a client-side search program.
- go
- pip (optional)
- jq (optional)
If you haven't already, clone GUAC to a local directory:
git clone https://github.com/guacsec/guac.git
Also clone GUAC data, this is used as test data for this demo.
git clone https://github.com/guacsec/guac-data.git
The rest of the demo will assume you are in the GUAC directory
cd guac
Build the GUAC binaries using the make
command.
make
For this demo, we will use the guacgql --gql-debug
command, which sets up a
GraphQL endpoint and playground, and runs an in-memory backend to store the GUAC
graph. Run this command in a separate terminal (in the same path) and keep it
running throughout the demo.
bin/guacgql --gql-debug
Note: As the data is stored in-memory, whenever you restart the server, the graph will be empty.
To ingest the data, we will use the help of the guacone
command, which is an
all-in-one utility that can take a collection of files and ingest them into the
GUAC graph.
In your original window, run:
bin/guacone collect files ../guac-data/docs/
This can take a minute or two. This dataset consists of a set of document types:
- SLSA attestations for kubernetes
- Scorecard data for kubernetes repos
- SPDX SBOMs for kubernetes containers
- CycloneDX SBOMs for some popular DockerHub images
The queries for this demo are stored in the demo/queries.gql
file. Running the
demo queries can be done graphically by opening the GraphQL Playground in a web
browser, or using the command line.
The remainder of the demo will have cli commands. If you would like to use the GraphQL Playground instead, use these steps:
-
Open the GraphQL Playground by visiting
http://localhost:8080/
in your web browser. -
Open
demo/queries.gql
in a text editor and copy the full contents. -
Paste the contents in the left pane of the Playground
-
The "Play" button in the top center of the Playground can be used to select which query to run.
To use the command line to run queries, install the gql-cli
tool with pip
,
run:
pip install gql[all]
Note:
- if you are using a shell like
zsh
it will not be able to run the command above properly. Instead, use abash
orsh
shell. - in your system, if you are using pyhton3, you may need to use the
pip3
command instead.
The GUAC graph is queryable using GraphQL. GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data.
For some background reading, visit https://graphql.org/learn/ . Also, the full GUAC schema can be saved with this command:
gql-cli http://localhost:8080/query --print-schema > schema.graphql
A primary type of node in GUAC is a Package. Packages are stored in hierarchical nodes by Type -> Namespace -> Name -> Version. First we will run the below query:
{
packages(pkgSpec: {}) {
type
}
}
This query has an empty pkgSpec
, that means it will match and return all
packages. However we are only requesting the type
field, and not any nested
nodes. Therefore, we will only receive the top-level Type nodes.
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ1 | jq
If you are using the GraphQL playground for the rest of the demo, you would run the query with the name at the end of the command. In this case, it is "PkgQ1".
We receive:
{
"packages": [
{
"type": "alpine"
},
{
"type": "golang"
},
{
"type": "pypi"
},
{
"type": "deb"
},
{
"type": "maven"
},
{
"type": "guac"
}
]
}
These are the top level Types of all the packages we ingested from the
guac-data
repository.
Going one level deeper, let us query for all the Namespaces under the "oci" Type. The query looks like this:
{
packages(pkgSpec: { type: "deb" }) {
type
namespaces {
namespace
}
}
}
Run it with:
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ2 | jq
Output:
{
"data": {
"packages": [
{
"type": "deb",
"namespaces": [
{
"namespace": "ubuntu"
},
{
"namespace": "debian"
}
]
}
]
}
}
Run PkgQ3
to get full package data on any package with name: "libp11-kit0"
.
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ3 | jq
{
"packages": [
{
"id": "1785",
"type": "deb",
"namespaces": [
{
"id": "1786",
"namespace": "debian",
"names": [
{
"id": "1935",
"name": "libp11-kit0",
"versions": [
{
"id": "1936",
"version": "0.23.22-1",
"qualifiers": [
{
"key": "distro",
"value": "debian-11"
},
{
"key": "arch",
"value": "amd64"
},
{
"key": "upstream",
"value": "p11-kit"
}
],
"subpath": ""
},
{
"id": "12712",
"version": "0.23.22-1",
"qualifiers": [
{
"key": "arch",
"value": "arm64"
},
{
"key": "upstream",
"value": "p11-kit"
},
{
"key": "distro",
"value": "debian-11"
}
],
"subpath": ""
}
]
}
]
},
{
"id": "4859",
"namespace": "ubuntu",
"names": [
{
"id": "5124",
"name": "libp11-kit0",
"versions": [
{
"id": "5125",
"version": "0.23.20-1ubuntu0.1",
"qualifiers": [
{
"key": "arch",
"value": "amd64"
},
{
"key": "upstream",
"value": "p11-kit"
},
{
"key": "distro",
"value": "ubuntu-20.04"
}
],
"subpath": ""
},
{
"id": "5406",
"version": "0.24.0-6build1",
"qualifiers": [
{
"key": "arch",
"value": "amd64"
},
{
"key": "upstream",
"value": "p11-kit"
},
{
"key": "distro",
"value": "ubuntu-22.04"
}
],
"subpath": ""
}
]
}
]
}
]
}
]
}
Here you can see the package is a deb
type and found under both the debian
and ubuntu
namespaces. Of the two Name nodes found, there are two Version
nodes underneath.
If you take a look at the query in the demo/queries.gql
file, you will see
that it uses the allPkgTree
fragment. This fragment specifies all the possible
fields in a GUAC package. Fragments can be used to avoid duplicating long lists
of attributes.
We have explored Package nodes, which are called "nouns" in GUAC. Now let's
explore nodes that are called "verbs". IsDependency
is a node that links two
packages, signifying that one package depends on the other. First take look at
the Package node for the consul
container image:
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ4 | jq
{
"packages": [
{
"id": "4",
"type": "oci",
"namespaces": [
{
"id": "5",
"namespace": "docker.io/library",
"names": [
{
"id": "6",
"name": "consul",
"versions": [
{
"id": "7",
"version": "sha256:22ab19cf1326abbfaafec6a14eb68f96e899e88ffe9ce26fa36affcf8ffb582c",
"qualifiers": [
{
"key": "tag",
"value": "latest"
}
],
"subpath": ""
}
]
}
]
}
]
}
]
}
Now we will use a query on IsDependency
to find all the packages that the
consul
container image depends on. The query is looks like this:
{
IsDependency(isDependencySpec: { package: { type: "oci", namespace: "docker.io/library", name: "consul" }}) {
dependentPackage {
type
namespaces {
namespace
names {
name
}
}
}
}
}
cat demo/queries.gql | gql-cli http://localhost:8080/query -o IsDependencyQ1 | jq
{
"IsDependency": [
{
"dependentPackage": {
"type": "golang",
"namespaces": [
{
"namespace": "github.com/azure",
"names": [
{
"name": "azure-sdk-for-go"
}
]
}
]
}
},
{
"dependentPackage": {
"type": "alpine",
"namespaces": [
{
"namespace": "",
"names": [
{
"name": "libcurl"
}
]
}
]
}
},
{
"dependentPackage": {
"type": "golang",
"namespaces": [
{
"namespace": "github.com/sirupsen",
"names": [
{
"name": "logrus"
}
]
}
]
}
},
... many more
We see that the consul
image depends on the logrus
Go package. We can query
the full details of that link as so:
{
IsDependency(isDependencySpec: {
package: { type: "oci", namespace: "docker.io/library", name: "consul" }
dependentPackage: { type: "golang", namespace: "github.com/sirupsen", name: "logrus" }
}) {
...allIsDependencyTree
}
}
cat demo/queries.gql | gql-cli http://localhost:8080/query -o IsDependencyQ2 | jq
{
"IsDependency": [
{
"id": "903",
"justification": "top-level package GUAC heuristic connecting to each file/package",
"versionRange": "",
"package": {
"id": "4",
"type": "oci",
"namespaces": [
{
"id": "5",
"namespace": "docker.io/library",
"names": [
{
"id": "6",
"name": "consul",
"versions": [
{
"id": "7",
"version": "sha256:22ab19cf1326abbfaafec6a14eb68f96e899e88ffe9ce26fa36affcf8ffb582c",
"qualifiers": [
{
"key": "tag",
"value": "latest"
}
],
"subpath": ""
}
]
}
]
}
]
},
"dependentPackage": {
"id": "11",
"type": "golang",
"namespaces": [
{
"id": "900",
"namespace": "github.com/sirupsen",
"names": [
{
"id": "901",
"name": "logrus",
"versions": []
}
]
}
]
},
"origin": "file:///../guac-data/docs/cyclonedx/syft-cyclonedx-docker.io-library-consul:latest.json",
"collector": "FileCollector"
}
]
}
Here are the full details of the dependency link, including the SBOM that declared it.
GUAC has a path
query to find the shortest path between any two nodes. To use
this, we will need to pay attention to the id
field of the nodes we want to
query. For this example we will pick the etcd/client
and etcd/api
Go
packages. Run these two queries to find the id
of the packages.
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ5 | jq
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ6 | jq
Make a note the two id
s printed. The path query will look like this:
{
path(subject: 5809, target: 6721, maxPathLength: 10) {
__typename
... on Package{
...allPkgTree
}
... on IsDependency {
...allIsDependencyTree
}
}
}
However, the query saved in demo/queries.gql
is parameterized so you may pass
in the two ids you found, which may be different. Replace 5809
and 6721
with
the numbers you found.
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PathQ1 -V subject:5809 target:6721 | jq
Note: in the "Playground" there is a section at the bottom to specify "Variables".
Here we see a long output with a chain of nodes from the etcd/client
package
to etcd/api
. First is the client
package node. Then an IsDependency
node
which describes the dependency from the vault
container image to the client
package. Next is the Package
node for the vault
container image. Then
another IsDependency
node which describes the dependency from the vault
container image to the api
package. Finally is the Package
node for the
api
package.
What we have learned is that the vault
container image depends on both the
client
and api
package. This is maybe not the dependency relationship we
were hoping to find.
It is important to understand the limitations of the path GraphQL query in isolation and understand how to use client-side processing to get the desired results. An example of how to do this is in the section below.
The data we have ingested in GUAC is based on the SBOM files in the guac-data
repo, but does not contain any vulnerability information. GUAC has built-in
"certifiers" which can search the ingested data and attach vulnerability data to
them. The OSV certifier will search for OSV vulnerability information. To run
the OSV certifiers, run:
bin/guacone certifier osv --poll=false
The certifier will take a few minutes to run. A vulnerability "noun" node may be queried with a query like this:
{
osv(osvSpec: {osvId: "ghsa-jfh8-c2jp-5v3q"}) {
...allOSVTree
}
}
cat demo/queries.gql | gql-cli http://localhost:8080/query -o OSVQ1 | jq
{
"osv": [
{
"id": "131666",
"osvId": "ghsa-jfh8-c2jp-5v3q"
}
]
}
GUAC doesn't store a lot of information here, just the OSV ID which can be easily cross-referenced.
The "verb" node type that links packages to vulnerabilities is CertifyVuln
. We
can query to see all of these nodes that link packages to the above
vulnerability like so:
{
CertifyVuln(certifyVulnSpec: {vulnerability: {osv: {osvId: "ghsa-jfh8-c2jp-5v3q"}}}) {
...allCertifyVulnTree
}
}
cat demo/queries.gql | gql-cli http://localhost:8080/query -o CertifyVulnQ1 | jq
{
"CertifyVuln": [
{
"id": "131684",
"package": {
"id": "3197",
"type": "maven",
"namespaces": [
{
"id": "12486",
"namespace": "org.apache.logging.log4j",
"names": [
{
"id": "12487",
"name": "log4j-core",
"versions": [
{
"id": "12488",
"version": "2.8.1",
"qualifiers": [],
"subpath": ""
}
]
}
]
}
]
},
"vulnerability": {
"__typename": "OSV",
"id": "131666",
"osvId": "ghsa-jfh8-c2jp-5v3q"
},
"metadata": {
"dbUri": "",
"dbVersion": "",
"scannerUri": "osv.dev",
"scannerVersion": "0.0.14",
"timeScanned": "2023-04-03T16:28:44.835711634Z",
"origin": "guac",
"collector": "guac"
}
}
]
}
This node has the package and vulnerability nodes along with the metadata that records how this link was found.
All of the above examples use a single GraphQL query. However, the query results
are all easily parsed json
that can be interpreted to build powerful scripts.
GUAC has a neighbors
query that will return all the nodes with a relationship
to the specified node. This can be used to search through relationships, finding
the specific type of path you are looking for. The neighbor query also take in a
set of edge filters of which to traverse. However, we are not using that field
in this query.
The neighbors query looks like this:
{
neighbors(node: $nodeId, usingOnly: []) {
__typename
... on Package{
...allPkgTree
}
... on IsDependency {
...allIsDependencyTree
}
}
}
The demo/path.py
file contains a simple Python program to do a breadth-first
search of GUAC nodes, similar to the path
server-side query. However, in
path.py
we have defined a filter()
function that filters the types and
direction of links that are searched. This filter has been written in an attempt
to find "downward" dependency relationships.
# filter is used by bfs to decide weather to search a node or not. In this
# implementation we try to find dependency links between packages
def filter(fromID, fromNode, neighbor):
if neighbor['__typename'] == 'Package':
# From Package -> Package, only search downwards
if fromNode['__typename'] == 'Package':
return containsID(neighbor, fromID)
# From other node type -> Package is ok.
return True
# Only allow IsDependency where the fromID is in the subject package
if neighbor['__typename'] == 'IsDependency':
return containsID(neighbor['package'], fromID)
# Otherwise don't follow path
return False
For Package
->Package
links, they are only followed downward, "PackageName"
-> "PackageVersion". For Package
->IsDependency
links, they are only followed
if the Package is the "Subject" and not the "Object", ie: if the previous
Package node is the one that depends on the newly found Package in the link.
First, run with the previous two ids from the etcd/client
and etcd/api
packages found above.
./demo/path.py 5809 6721
[]
[]
Nothing is found. Before, we saw dependency links from the vault
image to both
of these packages, but now with the filter we have in place, we don't follow the
dependency link "up" to the vault
image. We would normally expect to find a
dependency link between an api and a client of that api, but we simply haven't
ingested an SBOM that contains that link. In this case, only the vault
SBOM
was ingested which referenced both packages.
Now let's explore two more packages, the python
container image, and the
libsqlite3-dev
debian package. Run these queries and node the id
s.
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ7 | jq
cat demo/queries.gql | gql-cli http://localhost:8080/query -o PkgQ8 | jq
Now find the path, your id
s may be different
./demo/path.py 3616 4422
['3616', '3617', '4424', '4422']
[
{
"__typename": "Package",
...
A path is found. The program prints the ids of the path, then the nodes. The
python
"PackageName" is linked to the "PackageVersion", which is a specific
tag of the image. Then, an IsDependency
link has the python
"PackageVersion"
as the "subject" of the link, and the "object" dependentPackage
is the
"PackageName" node of the libsqlite3-dev
package. The last node in the path is
the "PackageName" node of libsqlite3-dev
.
Now you have the knowledge and tools to import data, query nodes an relationships and find interesting discoveries. Share your novel queries and usecases with the community!