Skip to content

Fixed duplicate insert in posts table #746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from
Closed

Conversation

rmgpinto
Copy link
Collaborator

@rmgpinto rmgpinto commented May 28, 2025

closes https://linear.app/ghost/issue/PROD-1905

When we recieve a second webhook for the same post, we attempt to save another post with the same uuid - which throws a duplicate key error.

We should really be doing some kind of update on the post, but we don't current support that and it's out of scope of fixing the error.

The fix here moves the logic for handling incoming ghost posts to the PostService, and we check if the post has already been processed, returning an error if it has.

This allows the HTTP handler to respond with a 400 instead of a 500

Copy link

coderabbitai bot commented May 28, 2025

Walkthrough

A new scenario was introduced in the features/5xx-regressions.feature file to test the system's response to receiving duplicate publish webhooks for the same post, ensuring that a 400 response is returned instead of a 5xx error. Corresponding step definitions were added in features/step_definitions/5xx_steps.js to handle the sending of the initial and duplicate publish webhooks. These steps construct and send the webhook payloads with appropriate HMAC signatures, and store the responses in the test context for validation. The webhook handler for post.published events was refactored to use a PostService abstraction instead of direct repository access, with error handling extended to return a 400 status for duplicate posts. A new method handleIncomingGhostPost was added to PostService to encapsulate post creation logic and duplicate detection. The GhostPost interface and CreatePostError type were exported for broader use. Integration tests for handleIncomingGhostPost were added. The route handler in src/app.ts was updated to use the new service-based handler.

Possibly related PRs


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 053eab1 and 77dded2.

📒 Files selected for processing (7)
  • features/5xx-regressions.feature (1 hunks)
  • features/step_definitions/5xx_steps.js (2 hunks)
  • src/app.ts (1 hunks)
  • src/http/api/webhook.ts (4 hunks)
  • src/post/post.entity.ts (2 hunks)
  • src/post/post.service.integration.test.ts (1 hunks)
  • src/post/post.service.ts (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/post/post.entity.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • features/5xx-regressions.feature
  • features/step_definitions/5xx_steps.js
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/app.ts (2)
src/instrumentation.ts (1)
  • spanWrapper (50-62)
src/http/api/webhook.ts (1)
  • createPostPublishedWebhookHandler (43-94)
🔇 Additional comments (4)
src/app.ts (1)

955-955: Good architectural improvement!

The refactoring to use PostService instead of direct repository access improves separation of concerns and encapsulates the business logic properly.

src/post/post.service.integration.test.ts (1)

109-172: Excellent test coverage for the duplicate post handling!

The tests comprehensively cover both successful post creation and the duplicate post detection scenario, which aligns perfectly with the PR objective of fixing duplicate inserts.

src/post/post.service.ts (1)

391-415:

✅ Verification successful

Potential race condition in duplicate check

While the duplicate check logic is sound, there's a race condition between checking for an existing post (line 406) and saving the new post (line 412). Under high concurrency, two requests could both pass the existence check before either completes the save operation, potentially resulting in duplicate inserts.

Since the PR mentions handling duplicate entries on the posts.uuid field, ensure that a unique constraint exists at the database level to prevent this race condition. The application-level check is good for providing a better error message, but the database constraint is essential for data integrity.

Run the following script to verify if a unique constraint exists on the posts.uuid column:


🏁 Script executed:

#!/bin/bash
# Description: Check for unique constraints on posts.uuid column in the database schema

# Search for migration files that might contain the posts table schema
fd -e sql -e js -e ts migration | xargs rg -A 10 -B 5 'CREATE TABLE.*posts|ALTER TABLE.*posts.*uuid|UNIQUE.*uuid|INDEX.*uuid'

# Also check for Knex migration files
fd -e js -e ts . --exec rg -l 'table\.uuid|unique.*uuid|index.*uuid' {} \; | xargs rg -A 5 -B 5 'posts.*table|table.*posts'

Length of output: 7421


Verified unique constraint exists on posts.uuid

The migration 000007_add-posts-table.up.sql already defines uuid CHAR(36) NOT NULL UNIQUE, so the database will enforce uniqueness at insert time. No further changes needed.

src/http/api/webhook.ts (1)

43-94: Clean refactoring with proper error handling!

The refactoring to use PostService and the addition of the 'post-already-exists' error case with a clear user-facing message ("Webhook already processed for this post") properly addresses the duplicate webhook scenario mentioned in the PR objectives.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f77e928 and d64d2a3.

📒 Files selected for processing (1)
  • src/post/post.repository.knex.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build, Test and Push
🔇 Additional comments (1)
src/post/post.repository.knex.ts (1)

688-715: Good approach to handling concurrency issues.

The addition of posts.uuid duplicate detection effectively addresses the concurrency problem described in the PR objectives. The overall error handling structure is well-designed, using transactions appropriately and providing clear feedback to callers through the isDuplicate flag.

Comment on lines 696 to 697
(err.message.includes('ap_id_hash') ||
err.message.includes('posts.uuid'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix logical inconsistency in duplicate error handling.

The error handling logic checks for duplicate entry errors on either ap_id_hash or posts.uuid, but the recovery logic (lines 699-704) only searches for the existing post by ap_id_hash. This creates an inconsistency where UUID-based duplicates might not be properly handled.

Consider this approach to handle both cases:

            if (
                err instanceof Error &&
                'code' in err &&
                err.code === 'ER_DUP_ENTRY' &&
                (err.message.includes('ap_id_hash') ||
                    err.message.includes('posts.uuid'))
            ) {
-                const row = await transaction('posts')
-                    .whereRaw('ap_id_hash = UNHEX(SHA2(?, 256))', [
-                        post.apId.href,
-                    ])
-                    .select('id')
-                    .first();
+                // Try to find by ap_id_hash first (most common case)
+                let row = await transaction('posts')
+                    .whereRaw('ap_id_hash = UNHEX(SHA2(?, 256))', [
+                        post.apId.href,
+                    ])
+                    .select('id')
+                    .first();
+                
+                // If not found by ap_id_hash, try by UUID
+                if (!row) {
+                    row = await transaction('posts')
+                        .where('uuid', post.uuid)
+                        .select('id')
+                        .first();
+                }

                if (row) {
                    return {
                        id: row.id,
                        isDuplicate: true,
                    };
                }
            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(err.message.includes('ap_id_hash') ||
err.message.includes('posts.uuid'))
if (
err instanceof Error &&
'code' in err &&
err.code === 'ER_DUP_ENTRY' &&
(err.message.includes('ap_id_hash') ||
err.message.includes('posts.uuid'))
) {
// Try to find by ap_id_hash first (most common case)
let row = await transaction('posts')
.whereRaw('ap_id_hash = UNHEX(SHA2(?, 256))', [
post.apId.href,
])
.select('id')
.first();
// If not found by ap_id_hash, try by UUID
if (!row) {
row = await transaction('posts')
.where('uuid', post.uuid)
.select('id')
.first();
}
if (row) {
return {
id: row.id,
isDuplicate: true,
};
}
}
🤖 Prompt for AI Agents
In src/post/post.repository.knex.ts around lines 696 to 697, the error handling
checks for duplicates on both ap_id_hash and posts.uuid, but the recovery logic
only handles ap_id_hash duplicates. To fix this, update the recovery logic to
distinguish which duplicate key caused the error and query the existing post
accordingly—use ap_id_hash if that caused the error, or use posts.uuid if that
was the duplicate key—ensuring both cases are properly handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants