An open standard for naming, categorizing, and detecting reliability problems
Documentation | Slack | Playground | Mailing List
Common Reliability Enumerations (CREs) are an open, structured standard for naming and categorizing reliability problems found in production systems. CREs represent the collective knowledge of The Open Problem Detection (and Resolution) Community where hundreds of engineers and practitioners across startups, enterprises, and critical infrastructure providers discuss how to share, detect, and mitigate reliability problems.
CREs provide a consistent way to describe reliability problems (cause, impact, and mitigation). The CRE schema and taxonomy enables the sharing of reliability intelligence and gives teams a vocabulary to discuss recurring problems without reinventing the wheel or diagnosing incidents in isolation.
Just as CVEs (Common Vulnerabilities and Exposures) provide a method to classify and share known threats, CREs offer an equivalent standard for reliability problems.
With CREs, you can:
- Recognize known failure modes before they escalate
- Correlate similar issues across services, teams, or companies
- Drive better postmortems, triage, and tooling decisions
- Contribute your own findings to an evolving, community-backed index
CREs give teams a common framework to identify, compare, and learn from reliability issuesโmaking patterns visible that were previously siloed or overlooked.
The Common Relability Enumeration Schema is located in cre-schema.json. Learn more about the CRE specification and rule syntax.
- CRE rules are located in the
rules/
folder. Each CRE is placed in its own folder. - Tags and categories are also located in this folder in the
rules/tags
subfolder.
A CRE builder tool ruler
is provided to validate CREs and generate a final rules document for a problem detector to consume. The rule builder generates and adds rule hashes derived from the content of the rules. The rule hash will only change if the content of the rule changes. It also validates tag and category references and ensures there are no duplicate IDs.
Check out CONTRIBUTING.md to learn how to build and test your first rule.
The fastest way to quickly test a rule on data is with the CRE playground. The playground runs as WebAssembly (wasm) in the browser. Data and rules are not sent to an API. No data leaves your browser.
preq is a free and open community-driven reliability problem detector that runs CREs on data. Use it to develop and test CREs on Linux, macOS, or Windows.
New contributors are encouraged to join the problem detection community add new CREs. Learn how to contribute in CONTRIBUTING.md.
The table below lists the technologies targeted by the existing CRE rules and the number of rules that describe each technology.