Skip to content

prequel-dev/cre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CRE - Common Reliability Enumerations

An open standard for naming, categorizing, and detecting reliability problems

Documentation | Slack | Playground | Mailing List

Unit Tests Unit Tests Unit Tests


Overview

What are CREs?

Common Reliability Enumerations (CREs) are an open, structured standard for naming and categorizing reliability problems found in production systems. CREs represent the collective knowledge of The Open Problem Detection (and Resolution) Community where hundreds of engineers and practitioners across startups, enterprises, and critical infrastructure providers discuss how to share, detect, and mitigate reliability problems.

CREs provide a consistent way to describe reliability problems (cause, impact, and mitigation). The CRE schema and taxonomy enables the sharing of reliability intelligence and gives teams a vocabulary to discuss recurring problems without reinventing the wheel or diagnosing incidents in isolation.

Just as CVEs (Common Vulnerabilities and Exposures) provide a method to classify and share known threats, CREs offer an equivalent standard for reliability problems.

With CREs, you can:

  • Recognize known failure modes before they escalate
  • Correlate similar issues across services, teams, or companies
  • Drive better postmortems, triage, and tooling decisions
  • Contribute your own findings to an evolving, community-backed index

CREs give teams a common framework to identify, compare, and learn from reliability issuesโ€”making patterns visible that were previously siloed or overlooked.

Getting Started

Schema

The Common Relability Enumeration Schema is located in cre-schema.json. Learn more about the CRE specification and rule syntax.

CRE Rules

  • CRE rules are located in the rules/ folder. Each CRE is placed in its own folder.
  • Tags and categories are also located in this folder in the rules/tags subfolder.

Rule Builder

A CRE builder tool ruler is provided to validate CREs and generate a final rules document for a problem detector to consume. The rule builder generates and adds rule hashes derived from the content of the rules. The rule hash will only change if the content of the rule changes. It also validates tag and category references and ensures there are no duplicate IDs.

Check out CONTRIBUTING.md to learn how to build and test your first rule.

Playground

The fastest way to quickly test a rule on data is with the CRE playground. The playground runs as WebAssembly (wasm) in the browser. Data and rules are not sent to an API. No data leaves your browser.

Problem Detector

preq is a free and open community-driven reliability problem detector that runs CREs on data. Use it to develop and test CREs on Linux, macOS, or Windows.

How to contribute

New contributors are encouraged to join the problem detection community add new CREs. Learn how to contribute in CONTRIBUTING.md.

Rule Coverage

Tags

Technology Coverage

The table below lists the technologies targeted by the existing CRE rules and the number of rules that describe each technology.

Technology CRE Count Documentation
nginx 8 https://nginx.org/en/docs/
loki 6 https://grafana.com/docs/loki/latest/
otel-collector 4 https://opentelemetry.io/docs/collector/
kubernetes 4 https://kubernetes.io/docs/home/
aws 4 https://aws.amazon.com/
rabbitmq 4 https://www.rabbitmq.com/documentation.html
redis 4 https://redis.io/docs/
grafana 4 https://grafana.com/docs/
ovn 3 https://www.ovn.org/docs/
datadog 3 https://docs.datadoghq.com/
neutron 2 https://docs.openstack.org/neutron/latest/
openstack 2 https://docs.openstack.org/
keda 2 https://keda.sh/docs/
opentelemetry 2 https://opentelemetry.io/docs/
postgres 2 https://www.postgresql.org/docs/
dns 2 https://en.wikipedia.org/wiki/Domain_Name_System
memcached 2 https://memcached.org/
prometheus 2 https://prometheus.io/docs/
karpenter 2 https://karpenter.sh/docs/
cws 1 https://docs.datadoghq.com/cloud_workload_security/
postgresql 1 https://www.postgresql.org/docs/
nfs 1 https://wiki.linux-nfs.org/wiki/
nvidia 1 https://docs.nvidia.com/
helm 1 https://helm.sh/docs/
temporal 1 https://docs.temporal.io/
slurm 1 https://slurm.schedmd.com/documentation.html
slurmdbd 1 https://slurm.schedmd.com/slurmdbd.html
mysql 1 https://dev.mysql.com/doc/
redis-cli 1 https://redis.io/docs/ui/cli/
kubelet 1 https://kubernetes.io/docs/concepts/architecture/nodes/#kubelet
redis-py 1 https://redis-py.readthedocs.io/en/stable/
spicedb 1 https://spicedb.dev/
celery 1 https://docs.celeryq.dev/en/stable/
kombu 1 https://docs.celeryq.dev/projects/kombu/en/stable/
vpc-cni 1 https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html
csi 1 https://kubernetes-csi.github.io/docs/
terraform 1 https://developer.hashicorp.com/terraform/docs
ovsdb 1 https://docs.openvswitch.org/en/latest/ref/ovsdb/
eks 1 https://docs.aws.amazon.com/eks/
gke 1 https://cloud.google.com/kubernetes-engine/docs/

Join the community!

About

Common Reliability Enumerations (CREs) developed by the community ๐Ÿ“–

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 17