change NIP-01 to only allow a single filter per REQ#1645
change NIP-01 to only allow a single filter per REQ#1645
REQ#1645Conversation
|
I do use this multiple-filter thing, but not in many cases. I'm not sure the benefit of this change outweighs the cost of breaking things. And I'm a bit afraid relays will not like even more subscriptions per connection. Many relays have unwisely low caps on how many subscriptions you can have open at once. |
|
I wish you did this 2 years ago. 😂 Then I would have supported it enthusiastically. Now I have mixed feelings. |
|
What's wrong with that feature is that the |
This is exactly the point. Relays have these stupid limits and the only reason people use multiple filters per I'm pretty sure relays will increase their default limits if we pass this. Arguably they would have done it already and limit by filter, not by |
Are those mixed feelings stemming from the fact that you have now already spent a lot of programming time implementing convoluted batch-then-segregate-filters logic on the client side? |
|
I am not sure if this solves anything. Keep in mind that having multiple filters within a single subscription does address some problems inherent to Nostr's data model. For instance, to correctly handle EOSE when monitoring multiple event IDs and their associated author information simultaneously (e.g., for author metadata, replies, reactions, etc.), such as in a thread view, each event ID or pubkey-based filter might require its own since and until properties. Since all of these filters relate to the same screen or UI component, it makes sense for them to be part of the same subscription. If this approach is merged, each of these filters would become an individual subscription, which might consume more resources. On Amethyst, we found that it is best to adhere to the lowest common limitations, which generally is:
Because we frequently hit these limits—particularly the character limit for REQ—the app often splits and rotates filters very quickly. This rotation, in turn, hits rate limits. When this happens and Tor is enabled (which is by default), I mitigate the issue by creating two connections to the same relay, one over the clearnet and another over Tor, and splitting the filters between the two. This effectively doubles all limits. It's a hack but it works. If limits are the issue you are trying to solve, we can work on that. The current PR just shifts things around from filters to subs. |
Because they are completely independent, and mashing them up together in the same
They already are, in a way. Making that explicit is good. I can't see any performance downsides, only upsides.
Not only. I am mostly worried about the perspective of relays handling custom queries in a way that isn't just the dumb "give the user everything he wants" and dealing with custom The simplest example, today, is NIP-17 recommending that relays reject The same problem also showed up recently on @staab's implementation of NIP-29 conflicting with my relay assumptions: fiatjaf/relay29#13, and for anyone who expects Nostr relay ecosystem to grow in diversity, specifically relay diversity (like I do and expect and hope) then issues like this will continue to show up. Maybe in these cases the solution could be just to recommend single filter |
|
I think this "relay diversity" issue goes beyond single filters.
Multi-account clients with NIP-17 downloads require multiple connections to AUTH with a different user. Many relays don't accept multiple connections per IP, so the app has to either rotate (with wasteful since/until +/- 48 hours) or subscribe using Tor with multiple exit nodes, one for each account.
I do not think this is unsolvable. It is hard, but matching incoming filters, no matter how complex they are, with the authorization the logged-in user has should be solvable. Modifying the filter is hard, but running the filter and then applying a second filter on top of the results to filter what the user can see from the resulting event set is a lot easier. @nostr-wine had many of these issues last year when building inbox.nostr.wine. Maybe we can learn from those.
I can't see a future where nostr libraries will not rearrange all these filters/subs by default. There is just no way App devs can deal with the complexity of building and splitting filters/subscriptions/connections. We will need to provide higher-level tools and find great ways to convert them back into what each relay operator wants their Clients to do. If we want to give relays more power to diversify their responses while letting Clients know how to deal with those customizations, we need a new abstraction layer. |
|
@fiatjaf do you think custom relays should still be called "relays"? As you said, optionality sucks. There should be a pitch somewhere to separate the relay network as we know today, with a simple, well-defined behavior, from new networks of custom relays. Clients could then declare compliance with the main network and/or other networks/schemes. We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules. |
This is a bug on the relay though. A relay that deals with AUTH should accept more than one connection per IP as long as these connections are authenticated. The best course of action here is to talk to the relay maintainer.
This can be done, of course, but my point is that it becomes unnecessarily confusing. As the relay you can't just return a simple |
It's an interesting idea, although I can't imagine it right now. I think the current idioms we have are enough to power a thousand of "custom relays" already, so if make these things separate entities we risk splitting the network too much into different API interfaces or whatever. But I'm not opposed to it, I want to see what you have in mind.
Well, I wasn't expecting all clients to do it, I was expecting clients to hardcode some behaviors specifically tailored to some relays (i.e. DM apps hardcode behavior to NIP-17 relays, group apps to hardcode behavior to NIP-29 relays etc) -- so yeah maybe having more specific "relay APIs" (or whatever you wanna call them) for each use case could be interesting, as long as they are standardized. |
Don't assume users are not getting what they are expecting, getting mad, etc... The current thinking is the opposite. Relays are always removing/hiding events. Users/Client almost never see what they expect to see. We are already used to have to work with whatever we get from each relay. To me, relays SHOULD take any filter but only return what the user can see, with whatever rule they want to apply. Only CLOSE when there is nothing to return/monitor. |
|
I agree that it would probably only result in performance improvements, since in every relay implementation I've seen the relay just iterates through filters and sends the results anyway, keeping the subscription open slightly longer than if each filter had its own. Also, with batching, subscriptions are held open until the last subscription finishes. Breaking into separate subscriptions would result in better concurrency at the very slight cost of more messages being sent back and forth. Unfortunately, the core architecture of most sophisticated libraries (at least welshman and NDK, and it sounds like Alex has a similar thing going on) is based around splitting/merging, so a lot of work would have to go into taking that out. On the other hand, it would probably make things way simpler. Right now my request flow looks something like this:
I've spent an incredible amount of time developing (and troubleshooting) this logic. Sunk cost notwithstanding, it would probably be better for everyone if we moved away from multiple filters per request. It would simplify the process to:
This would remove the need for the batching timeout, and plenty of performance-intensive and error-prone filter matching/merging. I originally started batching at @pablof7z's suggestion, specifically to deal with relay.nostr.band's very low concurrent subscription limits (8 subs at a time as I recall). So yeah, batching is almost entirely a workaround for stingy relay limits, although I think there are a few cases in which batching makes sense, like when fetching reactions/replies/zaps for a parent note. Instead of injecting that logic into the data loading code, you can just put it in the leaf component, which simplifies things a lot. Of course, it would not be hard to write batching logic for that case which only results in a single filter, so maybe that's a moot point. Another good point fiatjaf made in fiatjaf/relay29#13:
CLOSED runs into the same problem that custom relay policies do, and it's baked into the protocol. Another thing:
I've been saying for a long time that relays are defined by an interface, and differ by policy. Graceful degradation and progressive enhancement through feature support signaling are the way to handle this. NIP 29 (despite the rigidity of the relay29 implementation in that it rejects any non-h-tagged event) only breaks the relay interface in this one place (which is why I had to address the relay implementation rather than work around it), despite its policy being wildly different from most relays on the network. I think this is great. TL;DR I support clients moving away from multiple filters, although I'd like to see how well that works in practice before really committing to the switch. If this really doesn't cause any problems for clients, relays should watch for multi-filter requests to stop coming in before dropping support. |
Yes. I am also depending on it now too. But after looking at my code, it's less than I thought. So, I think it wouldn't be so bad after all.
Why would we accept such shoddy implementations? Of course I generate one SQL query for combined filters. https://gitlab.com/soapbox-pub/nostrify/-/blob/main/packages/db/NPostgres.ts?ref_type=heads#L265 |
Gossip must be pretty dumb by comparison then. I just ask for what I want, when I want it, and if the relays complain I consider it a relay problem. I don't split or merge. I don't try to limit kinds, filters, or subscriptions to small numbers. I don't rate limit other than timing out for 15 seconds if the relay was rude. Then again, I don't think any of my REQs are unreasonable. By far the most common relay complaint is that I didn't AUTH and it insists that I AUTH just to do a REQ. |
But the SQL database underneath probably just iterates through all the filters independently. |
This is good to hear because I have never done any of the batching people are doing and understood why people felt that was so necessary, but I was never sure because none of my clients are heavily used. Gossip works basically perfectly though, and the fact that it is so naïve reinforces my beliefs. It's weird that if I started to hit relay limits so much that I felt the need to start complicating my client code with hacks in order to make basic things work I wouldn't think that was normal and proceed by piling hacks on top of hacks, I would think the protocol was broken and that it needed a serious reboot (because if it doesn't work today, damn, imagine if we had more than 16 users). |
|
FYI: For most relays in most situations, gossip has only 4 subscriptions ( In coincidences, there may be more than 4 subscriptions: If that relay happens to also be a discovery relay, it may have So in that worst case, that is only 7 subscriptions (with possibly more if you quickly spam the load more button). Every other filter set only gets subscribed if you navigate somewhere else (in which case some of the above ones close): Notably there are things gossip could be subscribing to that we choose not to. For example
|
alexgleason
left a comment
There was a problem hiding this comment.
I think it's worth it.
I always thought it was wrong that you can pass multiple filters in a single REQ. It makes the "limit" property make no sense, and harms your ability to do rate limiting and set timeouts on the relay side.
However, clients still need to request multiple filters sometimes. This responsibility is now shifted to the client. You could argue it always was. But still, clients now need to loop over the filters and possibly aggregate the results. This is actually more powerful for the client, so it probably should have always been a client responsibility.
Finally, it will simplify logs, and simplify queries. By removing this we actually get more predictable outcomes. This will make it easier to improve performance.
|
Would this change also propagate to COUNT? |
|
I like this. But I don't see a great path to obsolete multi-filter-REQs. Asking for serious rewrites of critical hard-to-debug parts of big clients for the benefit of relays is a very one-sided deal. Maybe clients could ask for higher rate-limits first, and if relays agreed - clients wouldn't need the batching and the use of multi-filter-REQs would just fade away? Without higher rate limits this change won't fly anyway. |
|
@nostrband you're right, I guess we have to start first with the limits. If I got it right @hoytech has agreed to change strfry limits to be based on number of filters instead of subscriptions, so 1 subscription with 2 filters will be equivalent to 2 subscriptions with 1 filter each. I don't know when he plans to implement that, but once he does I believe he will also increase the default limit. After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly. |
Ok this sounds promising! |
COUNT seems to be deprecated. nobody uses it. |
@fiatjaf we can support it as well in our relay. i need to take a look at the code but it probably makes the process more lightweight. currently, we aggregate queries based on kind and send separate queries to Mongo DB (we still haven't found the best practice for that and even there is the possibility of moving to another database). this makes it more lightweight i think since we make fewer queries (most of the time) and each query will be searched in a smaller collection. not between all events. but i believe this change must still make it better. also, we need to provide this in the nip-11 document and remove the current max filters thing. |
|
Yes this is on my TODO list, as soon as I find some spare time. I'm pretty sure we can increase the limits in strfry quite a bit, I just want to do a little bit of testing to make sure, and to choose reasonable default values even for modest hardware. I think only allowing a single filter per REQ would've been a better/simpler design, but I think changing it at this stage is too big of a backwards incompatibility hit (?) |
Semisol
left a comment
There was a problem hiding this comment.
Could we also extend EVENT to be able to signal multiple matching filters as an optimization?
this can be negotiated separately actually, so it's out of scope here |
Just skimmed through this PR, does this mean you are in favor of keeping multiple filters? @vitorpamplona
Having multiple filters in a single REQ would allow for db query planning/optimization, as well as preventing sending duplicates – not the case for separate REQs. At first glance this PR seems backwards and actually hurt performance. What am I missing? Example: If all filters have a In my case it's checking for latest releases. It could be for 10 apps, but more likely for 200. Sending multiple separate REQs will cross the whole stack without giving the database a chance. If I wanted to keep watching for updates, in the one REQ with multiple filters case I need to keep one subscription open - versus 200. |
|
How can such query be optimized? Each of these By spawning isolates you mean each filter will be handled in its own lightweight thread? That's what Khatru does already and having independent REQs is natural to that. What do you mean by "serialization" in this context? |
I don't know enough about database internals but conceptually it's better to have the complete picture when planning a query than not. This was just one example. How do you account for Vitor's example where duplicate events may be sent?
It depends on the language and the architecture. Again, this is conceptual, it's better to fetch 200 things at once that 200 calls to fetch one thing each. No matter how small there is overhead.
Thoughts on that? |
|
I am currently against this PR. Multiple filters per REQ is useful when queries are expected to return lots of duplicated responses between filters and to reduce the EOSE cache in the client when a single EOSE date can be used for multiple filters, which is quite common (listening to all references - not only from NIPs that use e-tag references - to a note, for instance). That being said:
The move to a smaller number of filters in favor of full REQs is good. But limiting to just 1 filter is bad. |
|
I also agree that relays should be able to optimize queries that have multiple filters. Each filter is just a large OR clause and we know from SQL that those are some of the easiest to optimize since they tend to reuse the same indexes multiple times. But frankly I don't think we have a single dev that knows how to analyze and create different execution plans based on the incoming filters. |
Of course that is an advantage, but it depends on the real world usage of that stuff. I have not made one big beautiful client full of stuff like Amethyst to know, but I've made a ton of microclients and integrations and I don't remember encountering a situation in which I would want deduplicated query responses from different filters. Deduplicating on the relay side is costly, you have to run through all results you got from all the different internal queries and compare them. No matter how you do it it's an extra cost. It's definitely worth it if you're throwing away a lot of stuff that then you don't have to send to the client, but if you have a ton of events and end up not throwing away any or just a few then you wasted work there. (In SQL terms that's why This is all being done in pure C++ on strfry's codebase. On Khatru I forgot to do it at first but later I decided it wasn't worth it, but maybe I'm wrong. |
What do you mean? Are they the easiest to optimize in comparison to what? I may be wrong but I'm pretty sure any database, when faced with a
I'm curious about what you mean here exactly, but I'll say what I know about "query planning": in all the dedicated Nostr event stores I looked the "planning" is hardcoded based on the existing indexes and the attributes in the filter. Something like: But still a single filter is broken in multiple lookups in each of these indexes if the filter has more than one value in the attribute that will be consulted. As far as I understand SQL databases do the same but since they can't hardcode the schema they keep statistics about each index that was created by the user and consult these at runtime before deciding which one to use for each query if the query specify multiple conditions (or if it's better to do a full table scan). In the case of Nostr that is probably an overkill that would slow things down instead of helping (I don't really know, but if you compare the speed of any of these custom Nostr event stores against Postgres or SQLite today the difference is absurd). |
|
Picture the kind filter with max 10 kinds that I mentioned: A smart planner can see that I think these types of easily improved queries are quite common these days, mostly because of the limits imposed to clients. Everybody just ends up duplicating everything, either in a new REQ or a new filter. |
|
Yeah, I don't get the point of limiting the number of kinds in a filter like that. If the intention of the relay is to really limit the number of queries they will have to do internally they will have to do a better job at imposing the limit. It makes no sense to limit the number of kinds but allow more queries with more kinds, they should instead do something like: "only 10 kinds for your IP per hour, after that you can't make any more queries!" (which is stupid, better to not run a relay at this point, but it's an example). In other words: instead of doing 3 filters each with 10 kinds everybody is better off if the client sent a single filter with 30 kinds and the relay accepted it. |
|
Sure, but you missed the point. There are many opportunities to improve query speeds. And in order to do so, filters have to be part of the same REQ. |
|
Sorry, I did miss that point, I don't see it anywhere at all. Can you give me some examples of such opportunities? |
|
Another example is the query @franzaps has. Loading the latest of 50-200 replaceable app events. You could run 50-200 lookups on the entire database if your implementation is dumb or you can plan the query, realize the kind doesn't have many events and load all the software app events by kind first and only then filter by matching authors and dtags as a single set. If I am looking for the last posts to any of the communities or chats or hashtags or geohashes that my user follows, you are going to get one filter for each with the same since/until params. If your implementation is dumb, you are going to run through one by one. If there is a planner, it would know to merge them all into one OR query and just go through the DB once. |
|
Having multiple filters allows opportunities to optimize querying and the result set. The examples given here should be enough proof of that. Removing multiple filters removes those opportunities. It is false that multiple filters hurt performance. 1 REQ with 50 filters will always cost less or equal than equivalent 50 REQs with 1 filter. You can always write an adapter to convert 1 REQ with 50 filters to 50 REQ with 1 filter, but not the other way round. (@fiatjaf because I read you mention this several times: If clients use multiple filters to get around limitations, that is a separate issue. As I said, NIP-11 should have a way of clearly expressing rate limiting for filters, which is the actual fix to the problem.) As for code complexity, that is relative. It might be easier for you. For me, it adds complexity. All this PR does is removing optionality of which we barely have, while likely also degrading query performance. |
This is a fair point, but, as I said, such a thing would be so complicated to do that it would be more likely to hurt performance, the time wasted on planning would probably not be compensated by the speed gained in the actual query (and would it be actually faster?, under which parameters?). And the complexity probably means no one will ever do it anyway, since it makes no sense. Unless we're dealing with massive workloads.
I never said that, I said it was equivalent.
This is only because you're thinking too smartly about it and trying to batch things and whatnot. A less smart developer would just do a for loop and make a separate call in each iteration.
You have just ignored all the arguments in favor of this proposal and said: "it does nothing, only hurts". It's ok to be against it, and I can see your arguments to be against it, but please be reasonable and not just dismiss everything else. |
Disagree. It's something super easy to implement and measure over time. The relays are being hit all day long by dumb filters. It's idiotic to not pay attention to it. And since operators can easily measure a filters performance, the feedback loop to making things fast is very natural. |
Why? It's actually pretty simple to detect a pattern in multiple filters and construct a query from there. An open protocol should leave room for innovation at the edges like in this case.
You said this man...
The only argument for this proposal I understood and recognize, which is developers "getting around limitations" I already suggested is a different issue and proposed a solution for. In my particular case, I am not using multiple filters to get around limitations in that way, but for performance reasons. If you take this away, the experience I want to offer will suffer. Sorry, but sending 50, 100, 200 separate REQs for just one query is not appealing to me, especially when I want to keep a subscription to updates. Am I supposed to keep 200 subscriptions open? I am writing all this because I care about interoperability. If you think I'm not being reasonable this will be my last comment. You know what the alternative is? Silently breaking interoperability. Proposals like this may push developers who never comment here, to find other creative solutions.
|
|
On Fri, May 30, 2025 at 06:07:44PM -0700, fiatjaf_ wrote:
fiatjaf left a comment (nostr-protocol/nips#1645)
> Each filter is just a large OR clause and we know from SQL that those
> are some of the easiest to optimize since they tend to reuse the same
> indexes multiple times.
What do you mean? Are they the easiest to optimize in comparison to
what? I may be wrong but I'm pretty sure any database, when faced with
a `"ids":["a", "b", "c"]` (or, equivalently, `WHERE id = "a" OR id =
"b" OR id = "c"`) will have to do 3 index lookups on an "id" index.
This is what strfry, my eventstore implementations,
[pocket](https://github.com/mikedilger/pocket) and probably nostrdb do
and I don't think there is another way.
nostrdb is pretty bad at handling lots of different kinds and tags
mashed into one filter, in fact it's probably broken. one reason I like
this PR is that it allows for simpler queries, which allows for faster
and more correct results (at least on the nostrdb side since my code
sucks);
> I don't think we have a single dev that knows how to analyze and
> create different execution plans based on the incoming filters.
yeah
I'm curious about what you mean. The query planning in all the
dedicated Nostr event stores I looked is hardcoded based on the
existing indexes and the attributes in the filter. Something like:
```
if filter.ids: use "ids" index
if filter.kinds and filter.authors: use "pubkey-kind-created_at" index
if filter.tags["e"] or filter.tags["E"]: use "tags-created_at" index
if filter.kinds: use "kinds-created_at" index
if filter.authors: use "pubkey-created_at" index
if filter.tags: use "tags-created_at" index
else: use "created_at" index
```
But still a single filter is broken in multiple lookups in each of
these indexes if the filter has more than one value in the attribute
that will be consulted.
here's ours:
// this is rougly similar to the heuristic in strfry's dbscan
if (search) {
return NDB_PLAN_SEARCH;
} else if (ids) {
return NDB_PLAN_IDS;
} else if (relays && kinds && !authors) {
return NDB_PLAN_RELAY_KINDS;
} else if (kinds && authors && authors->count <= 10) {
return NDB_PLAN_AUTHOR_KINDS;
} else if (authors && authors->count <= 10) {
return NDB_PLAN_AUTHORS;
} else if (tags && tags->count <= 10) {
return NDB_PLAN_TAGS;
} else if (kinds) {
return NDB_PLAN_KINDS;
}
It works best when the filter is simple.
Cheers,
jb55
…--
Reply to this email directly or view it on GitHub:
#1645 (comment)
Message ID: ***@***.***>
|
You may be right, but I don't think this possibility of future optimization for niche queries is enough justification for keeping the multiple filters. Hard to say.
Sorry, I think I get most of where you're coming from, and I think you're reasonable, but maybe you're not getting my point. And we're going in circles at this point, so maybe we should stop discussing this for a while. I'm convinced I don't have a slam dunk argument against the filters, it's mostly a question of degree: does having multiple filters help the protocol in general more than they hurt? I think no, and I tried to say that but this discussion is too confusing. The only last point I want to hammer again is: will a relay accept a single REQ with 200 filters in it? And keep that subscription open? If yes, then why wouldn't that same relay accept 200 REQs and keep those open? Having relays limit the number of filters they can take rather than REQs doesn't change this.
You can say that about anything that simple relay filters cannot do, and there are many of such things already. In the past people have requested all sorts of things: different ordering mechanisms for queries, SQL-like JOINs, querying for absence of tags, querying for "less than" or "greater than" tag values, and other stuff I don't remember now. In most of these cases there is a different, sometimes not obvious, way of doing things that is more in line with Nostr. In some cases it's impossible, then we have to decide if it's worth (and if it's even possible) to shove the new feature into the spec. EDIT: I think I might be changing my mind on this, but that's contingent on strfry limiting by filters and not by REQ, let's see. |
|
@fiatjaf The NIP-45 MUST get updated accordingly in this PR. Otherwise, after the merge, it will make the repo inconsistent. |
Semisol
left a comment
There was a problem hiding this comment.
Not sure about this anymore.
As everyone else said, this overall reduces bandwidth load on the client.
But also, this does allow query optimizations. This is currently NFDB-specific, but could be implemented by any relay.
NFDB currently uses normal indexes, plus composite indexes for (author, tag), (author, kind) and (tag, kind).
What this allows is many queries can be merged in the planning phase, such as {"#p": [user], "kinds": [4]} and {"authors": [user], "kinds": [4]}.
This gets merged into scanning 2 indexes in parallel: (author_kind, 4, user) and (kind_tag, 4, p, user)
Another thing is rate-limits: Nostr.land rate limits by filter complexity. This is how much you scan indexes (even if you do not match them) and a fixed limit on the amount of subscribed indexes.
This is overall a better approach than REQ- or filter-count based, and this NIP is pointless.
|
If keeping multiple filters we should establish a rule for when there's a |
|
We should just leave this alone. I switched to one filter per request in welshman, but this isn't worth the effort on changing, multiple filters are fine. |
|
I'm ambivalent on this issue. It seems like a giant non-issue to me. Either way is fine. Neither is more complicated than the other. If there remain multiple filters in a REQ, then the relay can just run each into the same sorted output set getting the union of results. Yes you could in theory micro-optimize first, but figuring that out sounds like a brain-breaker to me. This week I've been digging into how to get almost the same performance but use far less space by getting rid of the composite indexes and just using simple indexes. This work is still in progress, but this is the idea: With just 4 single indexes ( I hope the performance will be the same, or very close, but the disk space used will be much less. |
|
@mikedilger thank you. By the way, do you have limits on number of REQs or filters or something on https://github.com/mikedilger/chorus? |
|
|
For a while I thought this was going to work fine and most people were liking it, but now it's clear to me that the change won't be easy and the advantages aren't super clear either way. In any case even there are small advantages the change process isn't worth it, specially if most of the advantages can be achieved by having smarter filter limits on relays rather than naïve counting of REQs. Thank you and sorry for the people who changed their code to forcibly use a single filter. |
In practice the existence of the possibility of multiple filters in a single
REQhas no benefit and only serves to hurt performance and increase complexity of codebases.Anything that can be done with a single
REQand multiple filters can much more easily be done with multipleREQs each with a single filter.I know this is a breaking change, but not a very drastic one. We can easily transition to this by having clients stop being built with this "feature" in mind, and relays can follow later. If a client sends a
REQwith more than one filter to a relay that only supports one that's also ok, the client will just get less results than it expected.