Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude posts in own subs from spam detection #1946

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Scroogey-SN
Copy link
Contributor

@Scroogey-SN Scroogey-SN commented Mar 5, 2025

Description

Fix #787

Modify SQL function item_spam to ignore posts made in own subs:

diff --git a/prisma/migrations/20240416181215_item_spam_exclude_bios/migration.sql b/prisma/migrations/20240416181215_item_spam_exclude_bios/migration.sql
index 5ad205c7..1d4930f5 100644
--- a/prisma/migrations/20240416181215_item_spam_exclude_bios/migration.sql
+++ b/prisma/migrations/20240416181215_item_spam_exclude_bios/migration.sql
@@ -1,5 +1,5 @@
--- exclude bios from spam detection
-CREATE OR REPLACE FUNCTION item_spam(parent_id INTEGER, user_id INTEGER, within INTERVAL)
+-- exclude posts in own subs from spam detection
+CREATE OR REPLACE FUNCTION item_spam(parent_id INTEGER, user_id INTEGER, within INTERVAL, sub_name TEXT)
 RETURNS INTEGER
 LANGUAGE plpgsql
 AS $$
@@ -12,16 +12,22 @@ BEGIN
         RETURN 0;
     END IF;

+    IF sub_name IS NOT NULL AND user_id = (SELECT "Sub"."userId" FROM "Sub" WHERE "Sub"."name" = sub_name) THEN
+        RETURN 0;
+    END IF;
+
     SELECT count(*) INTO repeats
     FROM "Item"
+    LEFT JOIN "Sub" ON "Sub"."name" = "Item"."subName"
     WHERE (
         (parent_id IS NULL AND "parentId" IS NULL)
         OR
         ("parentId" = parent_id AND user_id <> (SELECT i."userId" FROM "Item" i WHERE i.id = "Item"."rootId"))
     )
-    AND "userId" = user_id
+    AND "Item"."userId" = user_id
     AND "bio" = 'f'
-    AND created_at > now_utc() - within;
+    AND ("Sub"."name" IS NULL OR "Sub"."userId" <> user_id)
+    AND "Item".created_at > now_utc() - within;

     IF parent_id IS NULL THEN
         RETURN repeats;

I tested by running the replaced function through ./sndev psql and

stackernews=# select item_spam(NULL, 15556, interval '1' hour);

And testing by posting as territory owner in the web interface.

Note that this will only ignore posts made by a territory owner in their own subs. Posts made in others' subs still count. As they should, IMO.

@ekzyis ekzyis self-requested a review March 6, 2025 16:42
@ekzyis ekzyis changed the title Fix issue #787: Exclude posts in own subs from spam detection Exclude posts in own subs from spam detection Mar 6, 2025
Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You found the right spot to fix #787!

I think it's almost there, but I found a bug regarding replies, see comment.

Will review more later, but I wanted to mention this before I go to lunch.

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that since this filters out posts from own territories, it will still apply the spam fee to posts in your own territories if you posted before in other territories.

I think stackers will expect that they never have to pay the spam fee in their own territories. For this, we need to also pass subName to item_spam so we can check if we want to create a post in one of our own territories. If that is the case, we immediately return 0. That should also make the item_spam function easier to read and thus less prone to bugs.

Does this make sense?

@Scroogey-SN
Copy link
Contributor Author

Yes, I agree that's what the territory owner would like, ideally. It's also a more invasive change. The middle ground would be better than nothing (the territory owner is unlikely to spam others' territories).

@ekzyis
Copy link
Member

ekzyis commented Mar 7, 2025

It's also a more invasive change. The middle ground would be better than nothing (the territory owner is unlikely to spam others' territories).

It's a bigger change, yes, but as-is, this does not fix #787 because a territory founder can still pay 10x in their own territories. For that to happen, they only need to post in another territory once and then they will pay 10x in their own territory for the next 10 minutes. I don't consider this to be an edge case.

@Scroogey-SN
Copy link
Contributor Author

What I slightly dislike is how the additional parameter will change the semantics of the function.

Right now, the function tells you how badly a particular user has spammed in the interval.
Some callers will continue to use the function as such.
But by passing a particular sub, the caller asks instead "should the user be penalized for posting here?"

A territory owner who has spammed a foreign territory and subsequently wants to post in his own territory without penalty at the same time

  • has spammed
  • should not be penalized for a new post to in his own territory

So the function now returns two different kinds of information.

@Scroogey-SN
Copy link
Contributor Author

To play devil's advocate:

If territory owners should be able to post to their territories for base cost (no matter how much they spammed), shouldn't the same apply to any user posting their bio (the existing exclusion AND bio = 'f'), too?

Right now, posting your bio doesn't increase the spam counter, but when you have otherwise spammed before, your cost of posting bio multiplies.

Maybe the function should continue to answer only the question "how much has the user spammed in the interval" (where bio posting and territory owner posting in own territories are not spam), but the caller should not always do the POWER multiplication (posting should only do it if the user is not the owner, bio posting should never do it)?

@ekzyis
Copy link
Member

ekzyis commented Mar 13, 2025

Sorry for the late reply.

What I slightly dislike is how the additional parameter will change the semantics of the function. [...] So the function now returns two different kinds of information.

What we want for #787 is to never make founders pay spam fees in their territory. Everything else is secondary and can be updated accordingly.

Right now, posting your bio doesn't increase the spam counter, but when you have otherwise spammed before, your cost of posting bio multiplies.

This is a great point! Bios should not affect spam fees ( ✅ ) and not be affected by them ( ❌ ).

@Scroogey-SN
Copy link
Contributor Author

Now I'm passing sub into item_spam() to return 0 for the owner.

I tested the following sequence:

Login as unrelated user

  1. First post has fee 1
  2. Second post has fee 10
  3. Third post has fee 100
  4. First comment on unrelated post has fee 1
  5. Second comment on same post has fee 10
  6. Third comment on same post has fee 100
  7. Comment on own post has fee 1

Login as user who owns sub 'AGORA'

  1. First post to 'AMA' costs 1
  2. Second post to 'AMA' costs 10
  3. Third post to 'AMA' costs 100
  4. First post to 'AGORA' costs 1
  5. Second post to 'AGORA' costs 1
  6. Third post to 'AGORA' costs 1
  7. First comment to unrelated post in 'AMA' costs 1
  8. Second comment to unrelated post in 'AMA' costs 10
  9. Third comment to unrelated post in 'AMA' costs 100
  10. Comment to own post in 'AMA' costs 1

In short, the behaviour is now as before, except for two things:

  • posts of sub owners in own subs don't count as spam (subsequent posts in foreign subs are not punished)
  • posts and comments of sub owners in own subs always cost base (no matter how much they spammed in foreign subs before)

I believe that's what you asked for.

I wasn't exactly sure if any of the parameters should have been named subName instead of sub, there may be naming conventions to the layers that I mixed up, should be cosmetic, but let me know.

@Scroogey-SN Scroogey-SN requested a review from ekzyis March 14, 2025 18:33
Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review yet because I found some issues just from looking at the code. Will continue review when they are fixed.

@@ -71,7 +71,7 @@ export default forwardRef(function Reply ({
// no lag for itemRepetition
if (!item.mine && me) {
cache.updateQuery({
query: gql`{ itemRepetition(parentId: "${parentId}") }`
query: gql`{ itemRepetition(parentId: "${parentId}", sub: "${sub?.name}") }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this as-is, assuming that was a hint how to solve the other comment.

Copy link
Member

@ekzyis ekzyis Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that you have the same issue with searching for updating the query with a sub named 'undefined' here, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Scroogey-SN Scroogey-SN requested a review from ekzyis March 18, 2025 10:45
Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but the code can be improved, see my two comments.

Comment on lines +41 to +48
const query = parentId && sub
? gql`{ itemRepetition(parentId: "${parentId}", sub: "${sub}") }`
: (parentId
? gql`{ itemRepetition(parentId: "${parentId}") }`
: (sub
? gql`{ itemRepetition(sub: "${sub}") }`
: gql`{ itemRepetition }`
))
Copy link
Member

@ekzyis ekzyis Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other comment, this should use GraphQL variables.

@@ -71,7 +71,7 @@ export default forwardRef(function Reply ({
// no lag for itemRepetition
if (!item.mine && me) {
cache.updateQuery({
query: gql`{ itemRepetition(parentId: "${parentId}") }`
query: gql`{ itemRepetition(parentId: "${parentId}", sub: "${sub?.name}") }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Territory owners posts shouldn't have escalating costs
2 participants