-
Notifications
You must be signed in to change notification settings - Fork 20
Improve PDB dump performance by switching from DIA to raw_pdb #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Update Offsets.h
Added new command `dumpclass`
Add new preprocessor definitions * NOMINMAX * _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR * _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS Include raw_pdb for PDBReader updates
|
Adding a new dependency just to change the way it's dumped it's unnecessary. Not only that, it's a embedded dependency which adds more bloat to the project since MS DIA can dump the pdb completely fine if done right and it's an official tool. Given our working implementation on the In conclusion, everything this PR aims to merge, is already being taken care of :) |
DIA being "fine if done right" ignores its core limitation: it is not thread-safe, and that is inherent to the COM design. Replacing a non-thread-safe SDK which previously dumped ASE in few seconds but ASA in 15-20 minutes with a lightweight parser is not "bloat" it is reliability and a way to scalability :) Calling dumpclass dirty is not accurate - with the cloud dumper offline/broken, a local tool is the only functional option that should be available to everyone while you are trying to lock as much code behind web as possible :) Even after improvements, DIA still ranges 20 seconds to 2 minutes depending on hardware, while raw_pdb consistently dumps in ~5 seconds on pretty much any hardware. Importantly, the raw_pdb implementation in this PR fully mimics DIA’s behavior, producing almost identical output :) |
|
It does not need to be thread safe if it dumps in under 20 seconds tops with just 1 thread. Multi threading when starting up 10 servers can also destroy performance, it's not black or white. Besides, keeping the cache server running means no one needs to dump, and if they do, it would be in no time. You are free to create your own dumper web/desktop though, not that I am asking you not to. And after all, we've already implemented a working implementation without an additional dependency, and we are doing a redump of the core classes. |
|
Single threaded on R9 9950X3D Log::GetLog()->info("Processing structures...");
//auto typesTask = std::async(std::launch::async, [this, &tpiStream, &typeTable]() {
ProcessTypes(tpiStream, typeTable);
//});
Log::GetLog()->info("Processing functions...");
//auto functionsTask = std::async(std::launch::async, [this, &rawPdbFile, &dbiStream, &imageSectionStream]() {
ProcessFunctions(rawPdbFile, dbiStream, imageSectionStream);
//});
Log::GetLog()->info("Processing global variables...");
//auto globalsTask = std::async(std::launch::async, [this, &rawPdbFile, &dbiStream, &imageSectionStream, &typeTable]() {
ProcessGlobalVariables(rawPdbFile, dbiStream, imageSectionStream, typeTable);
//});
// Wait for all tasks to complete
//typesTask.wait();
//functionsTask.wait();
//globalsTask.wait(); |
|
7 vs 34 is a good difference, that will never be noticed given the cache server has most of the uptime on updates. Any other reviewer can shed some light here, but I am in favour of not merging this PR. |
|
I agree and think we stick to the officially support Microsoft tool. With the changes @Pelayori has already put in the fasterpdb branch the time difference is not enough to warrant completely switching systems. I think leaving the cache system in place is still worthy as it still allows for instant starts up the majority of the time. The dumping feature doesn't really belong in the user facing DLL because 99.9% of the people using AsaApi will never use that feature. I'm in favor of using the web dumper or a standalone tool for any dumping needs. |
|
I just requested review of a few more people. |
|
While this is a good commit, I also think that switching libraries is not necessary. |
|
Personally, I prefer Currently we have to include The size difference between having this embedded dependency (a few KiB of text when compiled) compared to the full I also like that it is thread-safe, minimal allocations, and operates on streams, which are perfect for the parser as it more-or-less uses the visitor-pattern to build the dictionaries. Additionally, As-for the concerns of using many threads when spooling up servers, I actually think this can help a lot with that as the library itself uses memory-mapped files, which should allow you to effectively share the same physical memory page across process boundaries. (If the backing data structure is updated to also be stored in mapped memory, we could share the same PDB parse across server instances "for free," but that's more of a hypothetical than reality.) Just my 2¢ |
|
One more thought, because of course there's always one once you hit the "Comment" button. I am also not against abstracting the PDB parser into an interface, and allowing the user to decide which they prefer -- I think if that was provided in the PR there would be significantly less resistance as it would be opt-in and optional. |
|
Having another point of view is always good. I don't oppose the idea of merging it however my main resistance towards the change is the fact that all used offsets (in headers at least) should be the same. Having testing that a merge can be considered. However the dump command is something I consider not correctly added there, and I am willing to open source the cloud dumper because in fed up with being accused of trying to lock code behind closed source web apps. |


dumpclass className