Fix incident auto-resolve workflows not triggering #5382

sanyo4ever · 2025-10-21T13:04:08Z

Pull Request: Fix incident auto-resolve workflows not triggering
🐛 Problem
When incidents are automatically resolved (via the "resolve when all alerts resolve" setting), workflow triggers with type: incident and events: [updated] were not being executed. This only affected auto-resolution; manual resolution worked correctly.
🔍 Root Cause
The resolve_incident_if_require() method in keep/api/bl/incidents_bl.py (lines 432-472) was updating the incident status in the database but not calling send_workflow_event() to trigger workflows.
Compare with:
change_status() method (line 474+) - triggers workflows ✅
__postprocess_alerts_change() method (line 338) - triggers workflows ✅
resolve_incident_if_require() - missing workflow trigger ❌
✅ Solution
Added three missing operations to resolve_incident_if_require() after the incident status is updated:
Set end_time - Records when the incident was resolved (consistent with manual resolution)
Trigger workflows - Calls send_workflow_event(incident_dto, "updated") to execute incident workflows
Update clients - Calls update_client_on_incident_change() to notify UI via Pusher
📝 Changes
File: keep/api/bl/incidents_bl.py
Lines modified: 460-474
Lines added: 13
🧪 Testing
Before fix:
Created incident from UptimeKuma alert
Enabled "Resolve incident when all alerts resolve"
Alert resolved → Incident auto-resolved → Workflows did not trigger ❌
After fix:
Same scenario → Incident auto-resolved → Workflows triggered successfully ✅
Verified logs show: "Incident auto-resolved, triggering workflows"
📊 Impact
Low risk: Only adds functionality, doesn't change existing behavior
Backward compatible: Existing workflows will now work as expected
Consistent: Makes auto-resolve behave identically to manual resolve
🔗 Related
This ensures parity between manual and automatic incident resolution, allowing users to reliably trigger notifications (Slack, PagerDuty, etc.) when incidents are resolved via automation.

…d flexible handling - Add comprehensive DMARC report detection using multiple indicators (sender, subject, content-type) - Add email type classification (DMARC, SPF, bounce, alert) - Add configurable skip options for DMARC/SPF reports via UI - Handle emails without body content gracefully with fallback messages - Improve error handling and logging for better debugging - Add email_type metadata to all alerts for better tracking Fixes parsing errors for DMARC reports that have no body content.

Backend: - Add get_error_alerts_to_reprocess() helper function to db.py - Add dismiss_error_alert_by_id() helper function to db.py - Add POST /alerts/event/error/reprocess API endpoint - Support reprocessing single alert or all error alerts - Auto-dismiss successfully reprocessed alerts - Return detailed results with success/failure counts Frontend: - Add reprocessErrorAlerts() function to useAlerts hook - Add reprocess buttons to AlertErrorEventModal UI - Add handleReprocessSelected() and handleReprocessAll() handlers - Add loading states and toast notifications - Disable buttons during operations to prevent race conditions This allows users to reprocess failed alert events after code fixes (e.g., DMARC detection improvements). Successfully reprocessed alerts are automatically dismissed from the error alerts list.

Changed skip_dmarc_reports and skip_spf_reports defaults from True to False. DMARC and SPF reports will now create alerts by default. Users can still enable skipping via UI configuration if desired. DMARC reports without body will get message: DMARC Report: {subject} + attachment info

- Add _extract_severity_from_email() method for keyword-based severity detection - Detect critical, high, warning, low, and info severity from subject/body - Assign severity based on email type (DMARC=low, SPF/bounce=warning) - Priority keyword matching: critical > high > warning > low > info Examples: - DMARC reports: low severity (informational) - [SUCCESS] emails: low severity - [ERROR]/[CRITICAL] emails: high/critical severity - [WARNING] emails: warning severity This provides better alert prioritization in the UI with appropriate visual indicators.

- Add _extract_status_from_email() method for keyword-based status detection - Detect resolved, acknowledged, and firing status from subject/body - Support status transitions via email (e.g., resolved notifications) Status mapping: - resolved: resolved, cleared, recovered, fixed, closed, ok now, back to normal - acknowledged: acknowledged, ack, investigating, working on - firing: default for new alerts This allows email alerts to properly reflect their lifecycle status.

…sender Changed source field from email sender address to proper provider source format: - Primary source: mailgun (for source facet filtering) - Secondary source: email sender address (for detailed tracking) This fixes the source counter in alerts feed and allows proper filtering by source=mailgun. Before: source = [[email protected]] After: source = [mailgun, [email protected]]

Reverted the source field change. After review, the original behavior is correct: - source = [email_sender] allows filtering by specific email senders - This is the intended behavior for email-based providers - Users can filter by source to see alerts from specific monitoring systems The source counter showing individual email addresses is intentional. Users can use the email_type field to filter by provider (e.g., email_type=dmarc_report).

Add database script to refresh severity and status for Mailgun alerts that were processed before the intelligent extraction logic was added. Features: - Updates severity using keyword-based detection - Updates status using keyword-based detection - Adds email_type classification if missing - Dry-run mode by default (safe) - Configurable time range (default: 30 days) - Detailed reporting of changes - Error handling for individual alerts Usage: # Dry run (see what would change) python scripts/update_mailgun_alert_metadata.py --tenant-id keep # Actually update python scripts/update_mailgun_alert_metadata.py --tenant-id keep --apply # Check last 7 days only python scripts/update_mailgun_alert_metadata.py --tenant-id keep --days 7 --apply This allows retroactive updates for alerts processed before severity/status extraction improvements were added.

… values Reverted back to original hardcoded severity and status values: - severity = info (hardcoded) - status = firing (hardcoded) Removed: - _extract_severity_from_email() method - _extract_status_from_email() method - update_mailgun_alert_metadata.py script This matches the original Mailgun provider behavior where all email alerts have the same severity/status regardless of content.

Update auto-generated documentation to include new configuration fields: - skip_dmarc_reports: Skip DMARC reports - skip_spf_reports: Skip SPF reports - handle_emails_without_body: Handle emails without body content Generated using: python scripts/docs_render_provider_snippets.py

- Add workflow event trigger in resolve_incident_if_require() - Set end_time when incident auto-resolves - Update client via pusher on auto-resolve - Fixes issue where auto-resolved incidents didn't trigger workflows while manual resolution did

greptile-apps · 2025-10-21T13:04:13Z

Skipped: This PR does not target one of your configured branches: (refactor/2512-nextjs-15)

vercel · 2025-10-21T13:04:13Z

@sanyo4ever is attempting to deploy a commit to the KeepHQ Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2025-10-21T13:04:25Z

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

CLAassistant · 2025-10-21T13:04:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2025-10-21T13:04:27Z

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "Fix incident auto-resolve workflows not triggering". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

sanyo4ever added 12 commits October 16, 2025 10:45

fix(mailgun): Fix KeyError in logging - cannot use name in extra dict

ab8bf07

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Oct 21, 2025

dosubot bot added the Bug Something isn't working label Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incident auto-resolve workflows not triggering #5382

Fix incident auto-resolve workflows not triggering #5382

Uh oh!

sanyo4ever commented Oct 21, 2025

Uh oh!

greptile-apps bot commented Oct 21, 2025

Uh oh!

vercel bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

CLAassistant commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix incident auto-resolve workflows not triggering #5382

Are you sure you want to change the base?

Fix incident auto-resolve workflows not triggering #5382

Uh oh!

Conversation

sanyo4ever commented Oct 21, 2025

Uh oh!

greptile-apps bot commented Oct 21, 2025

Uh oh!

vercel bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

CLAassistant commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants