-
Hey. It's a question, probably, not a bug. For some (really huge, I've already told about that) project, on current master Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments
-
Managed to profile it down to the fact that |
Beta Was this translation helpful? Give feedback.
-
OK, I think I've tracked the problematic line, but still do not understand the cause. Bignum.send :include, OurNamespace::BugnumPatch Then, Rubocop asked us to do it "normal" way: Bignum.include(OurNamespace::BugnumPatch) So, I assume, YARD now "see" the inclusion and tries to resolve it After that, if I'll trust my gathered stats, the next things happened:
What I should look for, now, to resolve the issue?.. |
Beta Was this translation helpful? Give feedback.
-
I assume you're already using Without seeing the code or a reproduction case it's hard to know if what's happening is unexpected behavior or not. Can you provide code or reproduction? |
Beta Was this translation helpful? Give feedback.
-
OK, it took me only two days to invent the solution. What do you think about replacing this: https://github.com/lsegal/yard/blob/master/lib/yard/registry.rb#L303 With something like this (it is utterly naive, just to show an idea): def resolve(namespace, name, inheritance = false, proxy_fallback = false, type = nil)
@@resolved ||= Hash.new { |h, (ns, n, i, pf, t)|
h[[ns, n, i, pf, t]] =
thread_local_resolver.lookup_by_path n,
:namespace => ns, :inheritance => i,
:proxy_fallback => pf, :type => t
}
@@resolved[[namespace, name, inheritance, proxy_fallback, type]]
end It is just a simple memoization trick to make less resolves for the same object over and over again. Results for me:
Am I missing something about it? Is it a good thing to do? |
Beta Was this translation helpful? Give feedback.
-
@lsegal Haven't seen your comment while writing mine, sorry. Simple example: this file: module M
end
class A
end
class B < A
include M
end
Array.include(M) Being the only file in "project", leads to this list of
Considering that in large registries |
Beta Was this translation helpful? Give feedback.
-
That's not necessarily an accurate description of lookup's performance behavior. Your output groups all calls together and omits data about where they are called. It also is completely lacking in timing information (to actually show how expensive it is). Some of those lookup calls might be grouped together, but I imagine most of those are distinct calls. If so, it's not that lookup is expensive as much that it's a heavily used call. Can you add output to show where these are getting called? My guess is we will always need that many lookup calls, though we could optimize performance through caching. |
Beta Was this translation helpful? Give feedback.
-
Yes, I've investigated it much deeper, here just wanted to show the point (of constant re-lookups for the same things).
On this toy example it is hard to show. But that's what RubyProf says of our actual codebase: (It is NOT an entire codebase, just a If you are not familiar with ruby-prof, here is the meaning of rows:
Here is the meaning of columns:
So, what am I missing here? As far as I can see, YARD never tries to memoize anything, but to the time of docs generation we know everything we have, and do not need to constantly re-lookup the things. |
Beta Was this translation helpful? Give feedback.
-
That's correct. YARD doesn't do caching in the resolver lookup. Caching is hard. Caching requires sound (formally provable) invalidation stories. Unfortunately there's no sound invalidation story here.
This part isn't true. There is no distinction between lookups occurring during parsing and during HTML generation. Lookups can (and do) occur during parsing. Every More importantly, though, modifcation can still happen after parsing, so although YARD could make an assumption that the tree is locked after parsing, that's not currently a requirement. That would be a breaking change. Currently, plugins can modify the registry tree after YARD parses, and possibly even during HTML generation if they wanted to. It's probably bad form to do this, but it's currently allowed by the API. The hard part about this in either case (lookups during, or modifications after) is dealing with invalidation. The invalidation story for newly registered or removed objects is fairly straightforward (every create / delete invalidates cache), but this is not the only way to modify the object tree, especially when it comes to inherited lookups. Here's an invalidation story we could catch: include YARD; YARD::CodeObjects
b_class = ClassObject.new(:root, "B")
a_class = ClassObject.new(:root, "A")
obj = Registry.resolve(a_class, "B") # this is b_class
ab_class = ClassObject.new(a_class, "B") # we can invalidate here
obj = Registry.resolve(a_class, "B") # this is ab_class But here's a very simple example of an un-invalidateable case:
Calling These aren't all impossible to solve, but it describes the complexity involved with cache invalidation. Caching is very likely to cause weird edge case caching issues that will be hard to debug. Furthermore, "invalidating the cache" itself is a hard problem. Even if we know to invalidate the cache-- how to effectively invalidate the cache is an equally hard problem. The naive solution would be to just smash the entire cache every time we invalidate. Trying to be "smart" about invalidation is likely to yield just as many edge case bugs. For example, if I add a method to a module, I have to invalidate every lookup that used the module in its lookup chain, so I have to look at all cache objects that contain module or a module/class that includes module at any level of indirection. That's an extremely complex resolution. This level of code complexity is not worth the ~20 second speed bump on the "average" codebase size (I would argue that the average gem is significantly smaller than infoboxer). I agree that 30 min to 4 min is a huge improvement, but "extremely large codebase" is an edge case. It's also an edge case of an edge case, since it's only come up due to a specific formatting you happen to be using (it may even only be specific to projects with heavy module inclusion across lots of files). If This may end up being a YARD plugin idea until there's a better longterm solution for this. |
Beta Was this translation helpful? Give feedback.
-
OK, I see your points. Thanks for clarifications. |
Beta Was this translation helpful? Give feedback.
That's correct. YARD doesn't do caching in the resolver lookup. Caching is hard. Caching requires sound (formally provable) invalidation stories. Unfortunately there's no sound invalidation story here.
This part isn't true. There is no distinction between lookups occurring during parsing and during HTML generation. Lookups can (and do) occur during parsing. Every
P()
call (or Proxy object creation) involves a lookup if the proxy is resolved for a specific reason (e.g. registering a method on a proxy namespace-- th…