Documentation & Understanding

Is your service/feature's documentation linked from Cerebro?
Does your service/feature have an architecture diagram which is accurate and current?
Do you review capacity concerns for your service/feature at least every 3 months? N/A - Project is deployed as a sidecar, with varying requirements for the consuming applications

Configuration Management

Arturos not applicable to this project

Are the effects of Arturo features documented in the Arturo page (or linked from it)?
For long term Arturos, does your end-to-end and automated testing cover both pathways of the Arturo?
Have all unused/deprecated configuration options (Arturos/settings, etc) been removed from your owned features/services?

Are your core features tested?
Are there automated acceptance tests in staging for your service? - N/A acceptance tests through consuming applications
Are there automated smoke tests in production for your service? N/A smoke tests through consuming applications
Do you gate stage progression for production deployments based on smoke test outcomes? N/A - Depends on consuming application rollout
Do you have sufficient acceptance test coverage of your service's functionality? N/A - covered by core test functionality

Monitoring is coupled with the consuming applications

Are all monitors for this service/feature documented?
Do all service/feature monitors link to a Runbook, or provide an escalation policy within them?
Do you have monitoring and alerting that follows the REDS pattern?
Have you reviewed your alert thresholds in the last 90 days?
Is your DEPLOY.md accurate?
Is a Datadog service deploy dashboard linked in Cerebro?
Can you rollback a change made in your service/feature in less than 15 minutes?
Does your service's Cerebro page provide a link to live logs/APM?
Do you have good signal-to-noise ratio on your monitors and APM?
Do your error capture mechanisms have good signal-to-noise ratio?
Do you have a documented console role that can be used for running backfills and debugging production issues?

Do you know how to quickly detect problems with your service/feature dependencies?
Are your upstream/downstream dependencies linked in Cerebro under the "Uses" field?
Do you know how to page your upstream/downstream dependency on-call groups?