-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add peeks for idents #367
base: master
Are you sure you want to change the base?
Add peeks for idents #367
Conversation
Thanks for that rather consequent feature! I'll be able to give feedback and discuss next week. You might have seen we faced some annoying performance issues recently (#365), so we'll have to let it rest a bit to add features that worsen performance. |
Hello, it seems that we have fixed most of the performance issues and Elixir has more resources now.
The problem is, the HTTP frontend cache won't really help us here. Most requests stay there only for a very short time (especially now, that Elixir is DDOSed by bots that scrap for AI datasets). We would probably need to introduce a caching layer just for this, or merge the feature as-is. Do you maybe have any benchmarks that compare load and request times of mainline Elixir and your version? |
Hello! I made 200 requests per endpoint:
And I got the following statistics, for vanilla elixir:
and for my PR:
And this is the distribution I got:
|
It seems that the worst case can be measured in seconds, and that could be a problem. I tried to think what can be done with that. @Daniil159x If you have any ideas, please post them. In my tests, it decreased max request time by ~2-4 times (on the set of URLs you posted) and mean by ~2. Depends on whether the repository was recently gc'd or not. Anecdotally, get-file takes around 10 ms on my machine, meanwhile accessing the file with dulwich usually takes around 1-3 ms. script.sh:
dulwich:
Calling git from subprocess:
I used this script (but with 50 in range argument). Results vary depending on size of the repo, and whether it was recently GCd or not, but worst case scenario with dulwich for many files was always better by at least ~250 ms when compared with just calling git (without script.sh). But to be honest, the first access after creating the repo can take ~100 ms, while the library is in read_packed_refs_with_peeled. I suspect that this can maybe be fixed, since git does not seem to have that problem. We can switch between implementations depending on how many idents have to be scanned.
@tleb I know you are against using alternative git implementations, and I get why, but maybe we could make an exception here. I can understand that calling git is much preferred in update for example, but just for querying tags and blobs it maybe makes sense. Git repo format is documented here, packfiles are here. The first link suggests that git actually tries to keep backward compatibilty with old implementations, see "Git Repository Format Versions". |
elixir/api.py
Outdated
@@ -46,14 +46,15 @@ def on_get(self, req, resp, project, ident): | |||
if version == 'latest': | |||
version = query.query('latest') | |||
|
|||
symbol_definitions, symbol_references, symbol_doccomments = query.query('ident', version, ident, family) | |||
symbol_definitions, symbol_references, symbol_doccomments, peaks = query.query('ident', version, ident, family) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: peaks
elixir/query.py
Outdated
@@ -402,10 +405,37 @@ def get_idents_defs(self, version, ident, family): | |||
symbol_doccomments.append(SymbolInstance(path, docline)) | |||
|
|||
return symbol_definitions, symbol_references, symbol_doccomments | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Redundant whitespace (not the line, but trailing spaces)
@@ -66,7 +66,14 @@ function generateSymbolDefinitionsHTML(symbolDefinitions, project, version) { | |||
result += `<li><a href="/${project}/${version}/source/${sd.path}#L${ln[0]}"><strong>${sd.path}</strong> <em>(as a ${sd.type})</em></a>`; | |||
result += '<ul>'; | |||
for(let n of ln) { | |||
result += `<li><a href="/${project}/${version}/source/${sd.path}#L${n}">line ${n}</a></li>`; | |||
result += `<li><a href="/${project}/${version}/source/${sd.path}#L${n}"><span>line ${n}</span>`; | |||
let srcLine = peeks?.[sd.path]?.[n]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional chaining is supported by 95.5% of browsers. What do you think @tleb? Our current support cutoff is closer to 97, I think (I still don't have a good way to evaluate).
@fstachura I have an idea. I've done some experiments:
batch is 10 times faster |
Yeah, batch mode sounds like a pretty good idea too, I didn't know that it exists. I suspect a lot of the overhead comes from forking so many times during the request. If the endpoint forks only a single time, maybe performance won't suffer that much. |
That
|
a270c3c
to
1c05d62
Compare
Hi! I rewrote the code to use the batch mode of git cat-file. |
Hi!
I found the project very useful. And my usual workflow looks like that: click on some ident and open N new tabs for reference/definition. But in other code view apps I can "peek" source (e. g. https://code.visualstudio.com/docs/editor/editingevolved#_peek, or github).
I have added this feature in single page and dynamic popup:
The changes bring performance impact, but i hope caching handles it.
You can try it on my instance: https://elixir.ravin.gs.