Skip to content

change NIP-01 to only allow a single filter per REQ#1645

Closed
fiatjaf wants to merge 2 commits intomasterfrom
no-batching
Closed

change NIP-01 to only allow a single filter per REQ#1645
fiatjaf wants to merge 2 commits intomasterfrom
no-batching

Conversation

@fiatjaf
Copy link
Member

@fiatjaf fiatjaf commented Dec 13, 2024

In practice the existence of the possibility of multiple filters in a single REQ has no benefit and only serves to hurt performance and increase complexity of codebases.

Anything that can be done with a single REQ and multiple filters can much more easily be done with multiple REQs each with a single filter.

I know this is a breaking change, but not a very drastic one. We can easily transition to this by having clients stop being built with this "feature" in mind, and relays can follow later. If a client sends a REQ with more than one filter to a relay that only supports one that's also ok, the client will just get less results than it expected.

@mikedilger
Copy link
Contributor

I do use this multiple-filter thing, but not in many cases. I'm not sure the benefit of this change outweighs the cost of breaking things. And I'm a bit afraid relays will not like even more subscriptions per connection. Many relays have unwisely low caps on how many subscriptions you can have open at once.

@alexgleason
Copy link
Member

I wish you did this 2 years ago. 😂 Then I would have supported it enthusiastically. Now I have mixed feelings.

@arthurfranca
Copy link
Contributor

What's wrong with that feature is that the limit is individual for each filter. That's why it's not really an OR query but equivalent to multiple REQs, i.e. events end up not being sorted together.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

Many relays have unwisely low caps on how many subscriptions you can have open at once.

This is exactly the point. Relays have these stupid limits and the only reason people use multiple filters per REQ is to get around those limits. It's an unnecessary complication on all both sides and also unfair because today clients that incur in unnecessary complication gain access to higher limits on these relays, meanwhile the entire thing just makes no sense on the relay side, it hurts performance for relays to have to check duplicates and reorder the results of multiple queries together when 100% of the times what clients really want are separate queries.

I'm pretty sure relays will increase their default limits if we pass this. Arguably they would have done it already and limit by filter, not by REQ, already, but haven't done because that was too hard.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

I wish you did this 2 years ago. 😂 Then I would have supported it enthusiastically. Now I have mixed feelings.

Are those mixed feelings stemming from the fact that you have now already spent a lot of programming time implementing convoluted batch-then-segregate-filters logic on the client side?

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

I am not sure if this solves anything.

Keep in mind that having multiple filters within a single subscription does address some problems inherent to Nostr's data model. For instance, to correctly handle EOSE when monitoring multiple event IDs and their associated author information simultaneously (e.g., for author metadata, replies, reactions, etc.), such as in a thread view, each event ID or pubkey-based filter might require its own since and until properties. Since all of these filters relate to the same screen or UI component, it makes sense for them to be part of the same subscription.

If this approach is merged, each of these filters would become an individual subscription, which might consume more resources.

On Amethyst, we found that it is best to adhere to the lowest common limitations, which generally is:

  • The number of different kinds in a single filter must be ≤10
  • The number of filters must be ≤10.
  • The number of subscriptions must be ≤12.
  • A single REQ cannot exceed 65,535 characters, which some follow lists or thread loading operations could easily surpass.
  • Use only one connection per IP.
  • Rate limits about 200ms. Filters can be rotated 5 times per second.

Because we frequently hit these limits—particularly the character limit for REQ—the app often splits and rotates filters very quickly. This rotation, in turn, hits rate limits.

When this happens and Tor is enabled (which is by default), I mitigate the issue by creating two connections to the same relay, one over the clearnet and another over Tor, and splitting the filters between the two. This effectively doubles all limits. It's a hack but it works.

If limits are the issue you are trying to solve, we can work on that. The current PR just shifts things around from filters to subs.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

each event ID or pubkey-based filter might require its own since and until properties

Because they are completely independent, and mashing them up together in the same REQ just leads to confusion.

each of these filters would become an individual subscription

They already are, in a way. Making that explicit is good. I can't see any performance downsides, only upsides.

If limits are the issue you are trying to solve

Not only. I am mostly worried about the perspective of relays handling custom queries in a way that isn't just the dumb "give the user everything he wants" and dealing with custom CLOSED messages. If each filter is its own subscription it becomes straightforward for both the relay and the client to agree on what is allowed and what isn't, what requires AUTH and what doesn't and so on.

The simplest example, today, is NIP-17 recommending that relays reject REQs for messages not addressed to the authenticated user. This is quite simple and straightforward -- but if the client decides to send two filters in the same REQ (one for messages and another for profile metadata, for example; or, who knows) then it becomes an unsolvable quagmire.

The same problem also showed up recently on @staab's implementation of NIP-29 conflicting with my relay assumptions: fiatjaf/relay29#13, and for anyone who expects Nostr relay ecosystem to grow in diversity, specifically relay diversity (like I do and expect and hope) then issues like this will continue to show up.

Maybe in these cases the solution could be just to recommend single filter REQs unless you are pretty sure multiple filters are ok, but then we hit the problem of libraries that implement filter batching by default, which feels wrong to me but it's something the current NIP-01 text encourages; and also the problem of the inconsistency and misguided harshness of filter limits as mentioned above -- so I think it's better to phase out multi-filter slowly if we can agree.

@vitorpamplona
Copy link
Collaborator

I think this "relay diversity" issue goes beyond single filters.

NIP-17 use case

Multi-account clients with NIP-17 downloads require multiple connections to AUTH with a different user. Many relays don't accept multiple connections per IP, so the app has to either rotate (with wasteful since/until +/- 48 hours) or subscribe using Tor with multiple exit nodes, one for each account.

if the client decides to send two filters in the same REQ (one for messages and another for profile metadata, for example; or, who knows) then it becomes an unsolvable quagmire

I do not think this is unsolvable. It is hard, but matching incoming filters, no matter how complex they are, with the authorization the logged-in user has should be solvable. Modifying the filter is hard, but running the filter and then applying a second filter on top of the results to filter what the user can see from the resulting event set is a lot easier.

@nostr-wine had many of these issues last year when building inbox.nostr.wine. Maybe we can learn from those.

batching by default, etc.

I can't see a future where nostr libraries will not rearrange all these filters/subs by default. There is just no way App devs can deal with the complexity of building and splitting filters/subscriptions/connections. We will need to provide higher-level tools and find great ways to convert them back into what each relay operator wants their Clients to do.

If we want to give relays more power to diversify their responses while letting Clients know how to deal with those customizations, we need a new abstraction layer.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

@fiatjaf do you think custom relays should still be called "relays"? As you said, optionality sucks. There should be a pitch somewhere to separate the relay network as we know today, with a simple, well-defined behavior, from new networks of custom relays. Clients could then declare compliance with the main network and/or other networks/schemes.

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

Multi-account clients with NIP-17 downloads require multiple connections to AUTH with a different user. Many relays don't accept multiple connections per IP

This is a bug on the relay though. A relay that deals with AUTH should accept more than one connection per IP as long as these connections are authenticated. The best course of action here is to talk to the relay maintainer.

I do not think this is unsolvable. It is hard, but matching incoming filters, no matter how complex they are, with the authorization the logged-in user has should be solvable. Modifying the filter is hard, but running the filter and then applying a second filter on top of the results to filter what the user can see from the resulting event set is a lot easier.

This can be done, of course, but my point is that it becomes unnecessarily confusing. As the relay you can't just return a simple CLOSED notifying the client of what are your expectations, you have to do an inefficient filtering operation and return just a subset of results -- and for what? The user/client won't get what they expected, they will think there is a bug on their side or on the client side because events are not showing, or they will not perform authentication as expected because the relay didn't return the expected CLOSED / auth-required message -- and so on.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

@fiatjaf do you think custom relays should still be called "relays"? As you said, optionality sucks. There should be a pitch somewhere to separate the relay network as we know today, with a simple, well-defined behavior, from new networks of custom relays. Clients could then declare compliance with the main network and/or other networks/schemes.

It's an interesting idea, although I can't imagine it right now. I think the current idioms we have are enough to power a thousand of "custom relays" already, so if make these things separate entities we risk splitting the network too much into different API interfaces or whatever. But I'm not opposed to it, I want to see what you have in mind.

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

Well, I wasn't expecting all clients to do it, I was expecting clients to hardcode some behaviors specifically tailored to some relays (i.e. DM apps hardcode behavior to NIP-17 relays, group apps to hardcode behavior to NIP-29 relays etc) -- so yeah maybe having more specific "relay APIs" (or whatever you wanna call them) for each use case could be interesting, as long as they are standardized.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

you have to do an inefficient filtering operation and return just a subset of results -- and for what? The user/client won't get what they expected

Don't assume users are not getting what they are expecting, getting mad, etc... The current thinking is the opposite. Relays are always removing/hiding events. Users/Client almost never see what they expect to see. We are already used to have to work with whatever we get from each relay. To me, relays SHOULD take any filter but only return what the user can see, with whatever rule they want to apply. Only CLOSE when there is nothing to return/monitor.

@staab
Copy link
Member

staab commented Dec 13, 2024

I agree that it would probably only result in performance improvements, since in every relay implementation I've seen the relay just iterates through filters and sends the results anyway, keeping the subscription open slightly longer than if each filter had its own. Also, with batching, subscriptions are held open until the last subscription finishes. Breaking into separate subscriptions would result in better concurrency at the very slight cost of more messages being sent back and forth.

Unfortunately, the core architecture of most sophisticated libraries (at least welshman and NDK, and it sounds like Alex has a similar thing going on) is based around splitting/merging, so a lot of work would have to go into taking that out. On the other hand, it would probably make things way simpler. Right now my request flow looks something like this:

  • Make a request
  • Wait 800ms (or so) for other requests to come in
  • Group subscriptions by timeout, auth timeout, and whether to close on eose
  • Union filters for each relay and send to each relay independently
  • Keep track of how many open subscriptions there are and wait for others to close before sending the next one
  • When an event comes in, match it against the caller subscription's filters and pass it along

I've spent an incredible amount of time developing (and troubleshooting) this logic. Sunk cost notwithstanding, it would probably be better for everyone if we moved away from multiple filters per request. It would simplify the process to:

  • Make a request
  • Send to the relay immediately
  • Keep track of how many open subscriptions there are and wait for others to close before sending the next one
  • When an event comes in, pass it along

This would remove the need for the batching timeout, and plenty of performance-intensive and error-prone filter matching/merging.

I originally started batching at @pablof7z's suggestion, specifically to deal with relay.nostr.band's very low concurrent subscription limits (8 subs at a time as I recall). So yeah, batching is almost entirely a workaround for stingy relay limits, although I think there are a few cases in which batching makes sense, like when fetching reactions/replies/zaps for a parent note. Instead of injecting that logic into the data loading code, you can just put it in the leaf component, which simplifies things a lot. Of course, it would not be hard to write batching logic for that case which only results in a single filter, so maybe that's a moot point.

Another good point fiatjaf made in fiatjaf/relay29#13:

If we are to allow all filters even those that we disallow just because they might be grouped together with other filters that might be allowed that means the CLOSED becomes useless

CLOSED runs into the same problem that custom relay policies do, and it's baked into the protocol.

Another thing:

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

I've been saying for a long time that relays are defined by an interface, and differ by policy. Graceful degradation and progressive enhancement through feature support signaling are the way to handle this. NIP 29 (despite the rigidity of the relay29 implementation in that it rejects any non-h-tagged event) only breaks the relay interface in this one place (which is why I had to address the relay implementation rather than work around it), despite its policy being wildly different from most relays on the network. I think this is great.

TL;DR I support clients moving away from multiple filters, although I'd like to see how well that works in practice before really committing to the switch. If this really doesn't cause any problems for clients, relays should watch for multi-filter requests to stop coming in before dropping support.

@alexgleason
Copy link
Member

@fiatjaf Are those mixed feelings stemming from the fact that you have now already spent a lot of programming time implementing convoluted batch-then-segregate-filters logic on the client side?

Yes. I am also depending on it now too. But after looking at my code, it's less than I thought. So, I think it wouldn't be so bad after all.

@staab every relay implementation I've seen the relay just iterates through filters and sends the results anyway

Why would we accept such shoddy implementations? Of course I generate one SQL query for combined filters. https://gitlab.com/soapbox-pub/nostrify/-/blob/main/packages/db/NPostgres.ts?ref_type=heads#L265

@mikedilger
Copy link
Contributor

Unfortunately, the core architecture of most sophisticated libraries (at least welshman and NDK, and it sounds like Alex has a similar thing going on) is based around splitting/merging,

Gossip must be pretty dumb by comparison then. I just ask for what I want, when I want it, and if the relays complain I consider it a relay problem. I don't split or merge. I don't try to limit kinds, filters, or subscriptions to small numbers. I don't rate limit other than timing out for 15 seconds if the relay was rude. Then again, I don't think any of my REQs are unreasonable. By far the most common relay complaint is that I didn't AUTH and it insists that I AUTH just to do a REQ.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 14, 2024

I generate one SQL query for combined filters.

But the SQL database underneath probably just iterates through all the filters independently.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 14, 2024

Gossip must be pretty dumb by comparison then.

This is good to hear because I have never done any of the batching people are doing and understood why people felt that was so necessary, but I was never sure because none of my clients are heavily used. Gossip works basically perfectly though, and the fact that it is so naïve reinforces my beliefs.

It's weird that if I started to hit relay limits so much that I felt the need to start complicating my client code with hacks in order to make basic things work I wouldn't think that was normal and proceed by piling hacks on top of hacks, I would think the protocol was broken and that it needed a serious reboot (because if it doesn't work today, damn, imagine if we had more than 16 users).

@mikedilger
Copy link
Contributor

FYI:

For most relays in most situations, gossip has only 4 subscriptions (FeedFuture, FeedChunk, Augments, and Metadata). Augments gets rewritten everytime you scroll but no faster than once every 5 seconds (that is probably too infrequent I'm gonna speed that up to 2 seconds). It might have multiple FeedChunks open if someone hits "load more" before the previous one was finished.

In coincidences, there may be more than 4 subscriptions: If that relay happens to also be a discovery relay, it may have Discover. If it happens to be your outbox, it will subscribe to your Config. And if it is your nip46 bunker it will subscribe to NIP46.

So in that worst case, that is only 7 subscriptions (with possibly more if you quickly spam the load more button). Every other filter set only gets subscribed if you navigate somewhere else (in which case some of the above ones close): InboxFeed, InboxFuture, DMChannel, Giftwraps, Search, RepliesToById, RepliesToByAddr and FollowersOf. These don't stack much at all, always less than the 7 worst case I already mentioned. This is all defined in gossip-lib/src/filter_set.rs (it used to be spread out but I cleaned it up recently).

Notably there are things gossip could be subscribing to that we choose not to. For example

  • In the feed views you do not see a number counting how many replies a note has. That would require another count subscription.

Copy link
Member

@alexgleason alexgleason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth it.

I always thought it was wrong that you can pass multiple filters in a single REQ. It makes the "limit" property make no sense, and harms your ability to do rate limiting and set timeouts on the relay side.

However, clients still need to request multiple filters sometimes. This responsibility is now shifted to the client. You could argue it always was. But still, clients now need to loop over the filters and possibly aggregate the results. This is actually more powerful for the client, so it probably should have always been a client responsibility.

Finally, it will simplify logs, and simplify queries. By removing this we actually get more predictable outcomes. This will make it easier to improve performance.

@bezysoftware
Copy link
Contributor

Would this change also propagate to COUNT?

@nostrband
Copy link
Collaborator

I like this.

But I don't see a great path to obsolete multi-filter-REQs. Asking for serious rewrites of critical hard-to-debug parts of big clients for the benefit of relays is a very one-sided deal. Maybe clients could ask for higher rate-limits first, and if relays agreed - clients wouldn't need the batching and the use of multi-filter-REQs would just fade away? Without higher rate limits this change won't fly anyway.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 26, 2024

@nostrband you're right, I guess we have to start first with the limits.

If I got it right @hoytech has agreed to change strfry limits to be based on number of filters instead of subscriptions, so 1 subscription with 2 filters will be equivalent to 2 subscriptions with 1 filter each. I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

@nostrband
Copy link
Collaborator

I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

Ok this sounds promising!

@kehiy
Copy link
Contributor

kehiy commented Dec 26, 2024

Would this change also propagate to COUNT?

COUNT seems to be deprecated. nobody uses it.

@kehiy
Copy link
Contributor

kehiy commented Dec 26, 2024

@nostrband you're right, I guess we have to start first with the limits.

If I got it right @hoytech has agreed to change strfry limits to be based on number of filters instead of subscriptions, so 1 subscription with 2 filters will be equivalent to 2 subscriptions with 1 filter each. I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

@fiatjaf we can support it as well in our relay. i need to take a look at the code but it probably makes the process more lightweight. currently, we aggregate queries based on kind and send separate queries to Mongo DB (we still haven't found the best practice for that and even there is the possibility of moving to another database). this makes it more lightweight i think since we make fewer queries (most of the time) and each query will be searched in a smaller collection. not between all events. but i believe this change must still make it better. also, we need to provide this in the nip-11 document and remove the current max filters thing.

@hoytech
Copy link
Contributor

hoytech commented Dec 30, 2024

Yes this is on my TODO list, as soon as I find some spare time. I'm pretty sure we can increase the limits in strfry quite a bit, I just want to do a little bit of testing to make sure, and to choose reasonable default values even for modest hardware.

I think only allowing a single filter per REQ would've been a better/simpler design, but I think changing it at this stage is too big of a backwards incompatibility hit (?)

Copy link
Contributor

@Semisol Semisol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also extend EVENT to be able to signal multiple matching filters as an optimization?

@Semisol
Copy link
Contributor

Semisol commented Jan 4, 2025

Could we also extend EVENT to be able to signal multiple matching filters as an optimization?

this can be negotiated separately actually, so it's out of scope here

@franzaps
Copy link
Contributor

franzaps commented May 30, 2025

If you do the three in one sub, the relay is smart enough to not send duplicates. If I separate, each relay I ping will have to send my users own messages in the last 200 messages 3 times.

Just skimmed through this PR, does this mean you are in favor of keeping multiple filters? @vitorpamplona

In practice the existence of the possibility of multiple filters in a single REQ has no benefit and only serves to hurt performance and increase complexity of codebases.

Anything that can be done with a single REQ and multiple filters can much more easily be done with multiple REQs each with a single filter.

Having multiple filters in a single REQ would allow for db query planning/optimization, as well as preventing sending duplicates – not the case for separate REQs. At first glance this PR seems backwards and actually hurt performance. What am I missing?

Example:

{#d: id1, authors: pk1, limit: 1}, {#d: id2, authors: pk2, limit: 1}, {#d: id3, authors: pk3, limit: 1}

If all filters have a limit: 1 this is understood as "return the latest of each (id1, pk1), (id2, pk2), (id3, pk3)", which could be optimized at the query level (adding a since would be even better). Not to mention serialization, spawning isolates, etc. This kind of optimization is fairly common unless I have been living on another planet.

In my case it's checking for latest releases. It could be for 10 apps, but more likely for 200.

Sending multiple separate REQs will cross the whole stack without giving the database a chance.

If I wanted to keep watching for updates, in the one REQ with multiple filters case I need to keep one subscription open - versus 200.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 30, 2025

How can such query be optimized? Each of these (id, pk) tuples will be an entirely independent index scan anyway.

By spawning isolates you mean each filter will be handled in its own lightweight thread? That's what Khatru does already and having independent REQs is natural to that.

What do you mean by "serialization" in this context?

@franzaps
Copy link
Contributor

franzaps commented May 30, 2025

How can such query be optimized? Each of these (id, pk) tuples will be an entirely independent index scan anyway.

I don't know enough about database internals but conceptually it's better to have the complete picture when planning a query than not.

This was just one example. How do you account for Vitor's example where duplicate events may be sent?

By spawning isolates you mean each filter will be handled in its own lightweight thread? That's what Khatru does already and having independent REQs is natural to that.

It depends on the language and the architecture. Again, this is conceptual, it's better to fetch 200 things at once that 200 calls to fetch one thing each. No matter how small there is overhead.

If I wanted to keep watching for updates, in the one REQ with multiple filters case I need to keep one subscription open - versus 200.

Thoughts on that?

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented May 31, 2025

I am currently against this PR.

Multiple filters per REQ is useful when queries are expected to return lots of duplicated responses between filters and to reduce the EOSE cache in the client when a single EOSE date can be used for multiple filters, which is quite common (listening to all references - not only from NIPs that use e-tag references - to a note, for instance).

That being said:

  1. Most dumb relays do send duplications even within the same sub request.
  2. We have historically abused this feature mostly due to low limits from current relays. For instance, many relays still limit a maximum number of kinds to 10 in a single filter, which requires amethyst to use 2-3 filters with just a different kind list in each. It would be annoying if we had to do that with full REQs and keep EOSEs for each group of kinds.

The move to a smaller number of filters in favor of full REQs is good. But limiting to just 1 filter is bad.

@vitorpamplona
Copy link
Collaborator

I also agree that relays should be able to optimize queries that have multiple filters. Each filter is just a large OR clause and we know from SQL that those are some of the easiest to optimize since they tend to reuse the same indexes multiple times. But frankly I don't think we have a single dev that knows how to analyze and create different execution plans based on the incoming filters.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

This was just one example. How do you account for Vitor's example where duplicate events may be sent?

Of course that is an advantage, but it depends on the real world usage of that stuff. I have not made one big beautiful client full of stuff like Amethyst to know, but I've made a ton of microclients and integrations and I don't remember encountering a situation in which I would want deduplicated query responses from different filters.

Deduplicating on the relay side is costly, you have to run through all results you got from all the different internal queries and compare them. No matter how you do it it's an extra cost. It's definitely worth it if you're throwing away a lot of stuff that then you don't have to send to the client, but if you have a ton of events and end up not throwing away any or just a few then you wasted work there. (In SQL terms that's why UNION ALL is faster than UNION.)

This is all being done in pure C++ on strfry's codebase. On Khatru I forgot to do it at first but later I decided it wasn't worth it, but maybe I'm wrong.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

Each filter is just a large OR clause and we know from SQL that those are some of the easiest to optimize since they tend to reuse the same indexes multiple times.

What do you mean? Are they the easiest to optimize in comparison to what? I may be wrong but I'm pretty sure any database, when faced with a "ids":["a", "b", "c"] (or, equivalently, WHERE id = "a" OR id = "b" OR id = "c") will have to do 3 index lookups on an "id" index. This is what strfry, my eventstore implementations, pocket and probably nostrdb do and I don't think there is another way.

I don't think we have a single dev that knows how to analyze and create different execution plans based on the incoming filters.

I'm curious about what you mean here exactly, but I'll say what I know about "query planning": in all the dedicated Nostr event stores I looked the "planning" is hardcoded based on the existing indexes and the attributes in the filter. Something like:

if filter.ids: use "ids" index
if filter.kinds and filter.authors: use "pubkey-kind-created_at" index
if filter.tags["e"] or filter.tags["E"]: use "tags-created_at" index
if filter.kinds: use "kinds-created_at" index
if filter.authors: use "pubkey-created_at" index
if filter.tags: use "tags-created_at" index
else: use "created_at" index

But still a single filter is broken in multiple lookups in each of these indexes if the filter has more than one value in the attribute that will be consulted.

As far as I understand SQL databases do the same but since they can't hardcode the schema they keep statistics about each index that was created by the user and consult these at runtime before deciding which one to use for each query if the query specify multiple conditions (or if it's better to do a full table scan).

In the case of Nostr that is probably an overkill that would slow things down instead of helping (I don't really know, but if you compare the speed of any of these custom Nostr event stores against Postgres or SQLite today the difference is absurd).

@vitorpamplona
Copy link
Collaborator

Picture the kind filter with max 10 kinds that I mentioned:

{#e: [id1, id2, id..n], kinds: [6,16,1984...], since: <date1>}
{#e: [id1, id2, id..n], kinds: [7,9734,..], since: <date1>}

A smart planner can see that e and since are the same and merge both kind lists as one set. Evaluating this and changing to run is faster than running twice.

I think these types of easily improved queries are quite common these days, mostly because of the limits imposed to clients. Everybody just ends up duplicating everything, either in a new REQ or a new filter.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

Yeah, I don't get the point of limiting the number of kinds in a filter like that.

If the intention of the relay is to really limit the number of queries they will have to do internally they will have to do a better job at imposing the limit. It makes no sense to limit the number of kinds but allow more queries with more kinds, they should instead do something like: "only 10 kinds for your IP per hour, after that you can't make any more queries!" (which is stupid, better to not run a relay at this point, but it's an example).

In other words: instead of doing 3 filters each with 10 kinds everybody is better off if the client sent a single filter with 30 kinds and the relay accepted it.

@vitorpamplona
Copy link
Collaborator

Sure, but you missed the point. There are many opportunities to improve query speeds. And in order to do so, filters have to be part of the same REQ.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

Sorry, I did miss that point, I don't see it anywhere at all.

Can you give me some examples of such opportunities?

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented May 31, 2025

Another example is the query @franzaps has. Loading the latest of 50-200 replaceable app events. You could run 50-200 lookups on the entire database if your implementation is dumb or you can plan the query, realize the kind doesn't have many events and load all the software app events by kind first and only then filter by matching authors and dtags as a single set.

If I am looking for the last posts to any of the communities or chats or hashtags or geohashes that my user follows, you are going to get one filter for each with the same since/until params. If your implementation is dumb, you are going to run through one by one. If there is a planner, it would know to merge them all into one OR query and just go through the DB once.

@franzaps
Copy link
Contributor

Having multiple filters allows opportunities to optimize querying and the result set. The examples given here should be enough proof of that.

Removing multiple filters removes those opportunities.

It is false that multiple filters hurt performance. 1 REQ with 50 filters will always cost less or equal than equivalent 50 REQs with 1 filter. You can always write an adapter to convert 1 REQ with 50 filters to 50 REQ with 1 filter, but not the other way round.

(@fiatjaf because I read you mention this several times: If clients use multiple filters to get around limitations, that is a separate issue. As I said, NIP-11 should have a way of clearly expressing rate limiting for filters, which is the actual fix to the problem.)

As for code complexity, that is relative. It might be easier for you. For me, it adds complexity.

All this PR does is removing optionality of which we barely have, while likely also degrading query performance.

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

Another example is the query @franzaps has. Loading the latest of 50-200 replaceable app events. You could run 50-200 lookups on the entire database if your implementation is dumb or you can plan the query, realize the kind doesn't have many events and load all the software app events by kind first and only then filter by matching authors and dtags as a single set.

This is a fair point, but, as I said, such a thing would be so complicated to do that it would be more likely to hurt performance, the time wasted on planning would probably not be compensated by the speed gained in the actual query (and would it be actually faster?, under which parameters?).

And the complexity probably means no one will ever do it anyway, since it makes no sense. Unless we're dealing with massive workloads.

It is false that multiple filters hurt performance.

I never said that, I said it was equivalent.

As for code complexity, that is relative. It might be easier for you. For me, it adds complexity.

This is only because you're thinking too smartly about it and trying to batch things and whatnot. A less smart developer would just do a for loop and make a separate call in each iteration.

All this PR does is removing optionality of which we barely have, while likely also degrading query performance.

You have just ignored all the arguments in favor of this proposal and said: "it does nothing, only hurts".

It's ok to be against it, and I can see your arguments to be against it, but please be reasonable and not just dismiss everything else.

@vitorpamplona
Copy link
Collaborator

This is a fair point, but, as I said, such a thing would be so complicated to do that it would be more likely to hurt performance, the time wasted on planning would probably not be compensated by the speed gained in the actual query (and would it be actually faster?, under which parameters?).

Disagree. It's something super easy to implement and measure over time. The relays are being hit all day long by dumb filters. It's idiotic to not pay attention to it. And since operators can easily measure a filters performance, the feedback loop to making things fast is very natural.

@franzaps
Copy link
Contributor

And the complexity probably means no one will ever do it anyway, since it makes no sense. Unless we're dealing with massive workloads.

Why? It's actually pretty simple to detect a pattern in multiple filters and construct a query from there. An open protocol should leave room for innovation at the edges like in this case.

It is false that multiple filters hurt performance.

I never said that, I said it was equivalent.

You said this man...

In practice the existence of the possibility of multiple filters in a single REQ has no benefit and only serves to hurt performance and increase complexity of codebases.

All this PR does is removing optionality of which we barely have, while likely also degrading query performance.

You have just ignored all the arguments in favor of this proposal and said: "it does nothing, only hurts".

The only argument for this proposal I understood and recognize, which is developers "getting around limitations" I already suggested is a different issue and proposed a solution for.

In my particular case, I am not using multiple filters to get around limitations in that way, but for performance reasons. If you take this away, the experience I want to offer will suffer.

Sorry, but sending 50, 100, 200 separate REQs for just one query is not appealing to me, especially when I want to keep a subscription to updates. Am I supposed to keep 200 subscriptions open?

I am writing all this because I care about interoperability. If you think I'm not being reasonable this will be my last comment.

You know what the alternative is? Silently breaking interoperability. Proposals like this may push developers who never comment here, to find other creative solutions.

  • Use or abuse the NIP-50 search field
  • Custom DVMs
  • Use a "caching server"
  • Using nostr with a proprietary API

@jb55
Copy link
Contributor

jb55 commented May 31, 2025 via email

@fiatjaf
Copy link
Member Author

fiatjaf commented May 31, 2025

Disagree. It's something super easy to implement and measure over time. The relays are being hit all day long by dumb filters. It's idiotic to not pay attention to it. And since operators can easily measure a filters performance, the feedback loop to making things fast is very natural.

You may be right, but I don't think this possibility of future optimization for niche queries is enough justification for keeping the multiple filters. Hard to say.

Sorry, but sending 50, 100, 200 separate REQs for just one query is not appealing to me, especially when I want to keep a subscription to updates. Am I supposed to keep 200 subscriptions open?

Sorry, I think I get most of where you're coming from, and I think you're reasonable, but maybe you're not getting my point. And we're going in circles at this point, so maybe we should stop discussing this for a while.

I'm convinced I don't have a slam dunk argument against the filters, it's mostly a question of degree: does having multiple filters help the protocol in general more than they hurt? I think no, and I tried to say that but this discussion is too confusing.

The only last point I want to hammer again is: will a relay accept a single REQ with 200 filters in it? And keep that subscription open? If yes, then why wouldn't that same relay accept 200 REQs and keep those open? Having relays limit the number of filters they can take rather than REQs doesn't change this.

You know what the alternative is? Silently breaking interoperability. Proposals like this may push developers who never comment here, to find other creative solutions.

You can say that about anything that simple relay filters cannot do, and there are many of such things already. In the past people have requested all sorts of things: different ordering mechanisms for queries, SQL-like JOINs, querying for absence of tags, querying for "less than" or "greater than" tag values, and other stuff I don't remember now.

In most of these cases there is a different, sometimes not obvious, way of doing things that is more in line with Nostr. In some cases it's impossible, then we have to decide if it's worth (and if it's even possible) to shove the new feature into the spec.

EDIT: I think I might be changing my mind on this, but that's contingent on strfry limiting by filters and not by REQ, let's see.

@kehiy
Copy link
Contributor

kehiy commented Jun 6, 2025

@fiatjaf The NIP-45 MUST get updated accordingly in this PR. Otherwise, after the merge, it will make the repo inconsistent.

Copy link
Contributor

@Semisol Semisol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this anymore.

As everyone else said, this overall reduces bandwidth load on the client.

But also, this does allow query optimizations. This is currently NFDB-specific, but could be implemented by any relay.

NFDB currently uses normal indexes, plus composite indexes for (author, tag), (author, kind) and (tag, kind).

What this allows is many queries can be merged in the planning phase, such as {"#p": [user], "kinds": [4]} and {"authors": [user], "kinds": [4]}.

This gets merged into scanning 2 indexes in parallel: (author_kind, 4, user) and (kind_tag, 4, p, user)

Another thing is rate-limits: Nostr.land rate limits by filter complexity. This is how much you scan indexes (even if you do not match them) and a fixed limit on the amount of subscribed indexes.

This is overall a better approach than REQ- or filter-count based, and this NIP is pointless.

@arthurfranca
Copy link
Contributor

If keeping multiple filters we should establish a rule for when there's a limit filter field. E.g. If limit is present on the first filter, the limit is applied to the combined query.

@staab
Copy link
Member

staab commented Jun 6, 2025

We should just leave this alone. I switched to one filter per request in welshman, but this isn't worth the effort on changing, multiple filters are fine.

@mikedilger
Copy link
Contributor

I'm ambivalent on this issue. It seems like a giant non-issue to me. Either way is fine. Neither is more complicated than the other. If there remain multiple filters in a REQ, then the relay can just run each into the same sorted output set getting the union of results. Yes you could in theory micro-optimize first, but figuring that out sounds like a brain-breaker to me.

This week I've been digging into how to get almost the same performance but use far less space by getting rid of the composite indexes and just using simple indexes. This work is still in progress, but this is the idea:

With just 4 single indexes (id, pubkey, kind, and tag). If you specify multiple values for one of these fields, say pubkeys, I use a set of LMDB prefix-range iterators (one per pubkey) each scanning a separate range of the pubkey index, and I use the itertools crate merge function to combine them into a single still-sorted iterator. Then I use the sorted-iter crates intersection operator to combine across fields (pubkeys with kinds). Then I collect that into the output and apply the limit to it. Being iterators the whole way, and being lazy, they don't actually bother scanning anything unless it ended up being needed, but I don't have to code that tricky logic myself anymore I just benefit from other open source libraries.

I hope the performance will be the same, or very close, but the disk space used will be much less.

@fiatjaf
Copy link
Member Author

fiatjaf commented Jun 7, 2025

@mikedilger thank you.

By the way, do you have limits on number of REQs or filters or something on https://github.com/mikedilger/chorus?

@mikedilger
Copy link
Contributor

By the way, do you have limits on number of REQs or filters or something on https://github.com/mikedilger/chorus?

max_subscriptions defaults to 128 per connection. max_connections_per_ip defaults to 5. I throttle to 1 MB per second with 16 MB bursts, again these are defaults that can be changed.

@fiatjaf
Copy link
Member Author

fiatjaf commented Jun 14, 2025

For a while I thought this was going to work fine and most people were liking it, but now it's clear to me that the change won't be easy and the advantages aren't super clear either way. In any case even there are small advantages the change process isn't worth it, specially if most of the advantages can be achieved by having smarter filter limits on relays rather than naïve counting of REQs.

Thank you and sorry for the people who changed their code to forcibly use a single filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.