feat(reports): Cache per-project weekly report metrics#116739
Conversation
|
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
There was a problem hiding this comment.
only frontend change -- auto-added by santry bot
|
just out of curiosity, what's the napkin math on how much cache space this will use? |
good question! there are around 140-150 bytes per entry. Not sure if the number of projects is public 👀 , but since TTL is 14 days, the amount of sets could be 2 * num_of_projects at any given time. so cache space would be 140 * 2 * num_of_projects. It's not trivial, so the alternative is storing in a DB, but that's more overhead/setup |
|
we currently have ~3M projects in US, so that works out to <1 gig of cache usage |
| logger.exception( | ||
| "weekly_reports.cache_project_metrics.failed", | ||
| extra={"organization_id": organization_id, "timestamp": timestamp}, | ||
| ) |
There was a problem hiding this comment.
we might not need this logger?
| @@ -218,15 +242,17 @@ def prepare_organization_report( | |||
| lifecycle.record_halt(WeeklyReportHaltReason.EMPTY_REPORT) | |||
| return | |||
|
|
|||
| # Cache per-project metrics for reuse across report batches | |||
| _cache_project_metrics(ctx, organization_id, timestamp) | |||
There was a problem hiding this comment.
up to you, but maybe this should just call a top-level function in src/sentry/tasks/summaries/weekly_report_cache.py directly
There was a problem hiding this comment.
agree -- moved this logic into src/sentry/tasks/summaries/weekly_report_cache.py
| logger.exception( | ||
| "weekly_reports.cache_project_metrics.failed", | ||
| extra={"organization_id": organization_id, "timestamp": timestamp}, | ||
| ) |
| from sentry.utils import json, metrics, redis | ||
| from sentry.utils.dates import floor_to_utc_day, to_datetime | ||
|
|
||
| CACHE_TTL = timedelta(days=14) |
There was a problem hiding this comment.
can we make this 10 days to prevent overlap of cache usage across 3 weeks?
There was a problem hiding this comment.
also, you can define this as an int in number of seconds, e.g. 10 * 24 * 60 * 60, so that you don't have to do that conversion below
|
|
||
|
|
||
| def _make_cache_key(org_id: int, project_id: int, timestamp: float) -> str: | ||
| return f"{KEY_PREFIX}:{org_id}:{project_id}:{timestamp}" |
There was a problem hiding this comment.
including a timestamp in the cache key doesn't seem necessary, conceptually we should just be caching the latest value we computed for the org ID/project ID
| @@ -218,15 +242,17 @@ def prepare_organization_report( | |||
| lifecycle.record_halt(WeeklyReportHaltReason.EMPTY_REPORT) | |||
| return | |||
|
|
|||
| # Cache per-project metrics for reuse across report batches | |||
There was a problem hiding this comment.
can we put this behind an org feature flag to begin with?
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 59edfc4. Configure here.
| project_metrics[project_id] = { | ||
| "e": project_ctx.accepted_error_count, | ||
| "t": project_ctx.accepted_transaction_count, | ||
| } |
There was a problem hiding this comment.
Stale metrics after empty week
Medium Severity
cache_project_metrics skips Redis writes when check_if_project_is_empty() is true, so a zero-activity week leaves the prior week’s totals in cache. Week-over-week reads then treat old counts as the previous period instead of zero, and a cache hit avoids the intended Snuba fallback on miss.
Reviewed by Cursor Bugbot for commit 59edfc4. Configure here.
There was a problem hiding this comment.
10 day TTL avoids this. Since the report runs weekly, any stale entry expires before the next read
|
|
||
|
|
||
| def cache_project_metrics(ctx: OrganizationReportContext, organization_id: int) -> None: | ||
| project_metrics: dict[int, dict[str, int]] = {} |
There was a problem hiding this comment.
TODO: add feature flag
There was a problem hiding this comment.
NVM -- not adding feature flag. We'll be adding feature flag to the PR that actually uses the cached metrics
…hat weekly_reports_cache is a pure cache layer
shashjar
left a comment
There was a problem hiding this comment.
two non-blocking comments, nice job
| raw = results[i] | ||
| if raw is None: | ||
| metrics.incr("weekly_report.cache.miss") | ||
| else: |
There was a problem hiding this comment.
should we record cache hits as well? another way you could do this would be to have a single metric called weekly_report.cache_read and then a tag on the metric is set to either "miss" or "hit" based on the result, and then DD allows you to break down those timeseries
There was a problem hiding this comment.
this can be part of your next PR btw, doesn't have to block this one
|
|
||
| from sentry.utils import json, metrics, redis | ||
|
|
||
| CACHE_TTL = timedelta(days=10) |
There was a problem hiding this comment.
can just set this as an int, e.g. 10 * 24 * 60 * 60
CACHE_TTL_SEC = 10 * 24 * 60 * 60 # 10 days
Adds cache hit metric to weekly report cache More details on caching found in this [merged PR](#116739)
Resolves [ID-1589](https://linear.app/getsentry/issue/ID-1589/cache-total-project-errors-and-total-project-transactions-for-weekly). Goal: Cache per-project total errors and total transactions during weekly report generation so the frontend can display week-over-week percentage change without re-querying Snuba. **Redis Cache Layer** - Caches total errors and total transactions by Org Id, Project Id (no timestamp) - 10 day TTL --------- Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Adds cache hit metric to weekly report cache More details on caching found in this [merged PR](#116739)


Resolves ID-1589.
Goal: Cache per-project total errors and total transactions during weekly report generation so the frontend can display week-over-week percentage change without re-querying Snuba.
Redis Cache Layer