Skip to content

Conversation

@salmonrider
Copy link
Contributor

  • Updated to latest MSVC Toolset (v145)
  • Switched from DIA to raw_pdb dumper implementation (20 minutes -> 5 seconds)
  • Added new command to dump C++ classes and structs dumpclass className
  • Removed Cache Server Implementation

Update Offsets.h
Added new command `dumpclass`
Add new preprocessor definitions
* NOMINMAX
* _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR
* _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS
Include raw_pdb for PDBReader updates
@Pelayori
Copy link
Member

Pelayori commented Dec 4, 2025

Adding a new dependency just to change the way it's dumped it's unnecessary. Not only that, it's a embedded dependency which adds more bloat to the project since MS DIA can dump the pdb completely fine if done right and it's an official tool. Given our working implementation on the fasterpdb branch, supporting this reason (10 secs on good cpus, 2 min on bad ones), I don't think this is any better given all the involved components.
Moreover, embedding developer tools like the dumpclass command it's a dirty way of doing such tool, as opposed as the cloud-like dumper there was (which is getting updated soon and getting up).
On the other hand, while cache may not serve the same purpose now with slower pdb dump times, it's still a way for busy machines or old servers to catch a break and startup inmediately, with the benefit now that they should not wait 40 minutes to dump in the event of an outage.

In conclusion, everything this PR aims to merge, is already being taken care of :)

@salmonrider
Copy link
Contributor Author

salmonrider commented Dec 4, 2025

Adding a new dependency just to change the way it's dumped it's unnecessary. Not only that, it's a embedded dependency which adds more bloat to the project since MS DIA can dump the pdb completely fine if done right and it's an official tool. Given our working implementation on the fasterpdb branch, supporting this reason (10 secs on good cpus, 2 min on bad ones), I don't think this is any better given all the involved components. Moreover, embedding developer tools like the dumpclass command it's a dirty way of doing such tool, as opposed as the cloud-like dumper there was (which is getting updated soon and getting up). On the other hand, while cache may not serve the same purpose now with slower pdb dump times, it's still a way for busy machines or old servers to catch a break and startup inmediately, with the benefit now that they should not wait 40 minutes to dump in the event of an outage.

In conclusion, everything this PR aims to merge, is already being taken care of :)

DIA being "fine if done right" ignores its core limitation: it is not thread-safe, and that is inherent to the COM design. Replacing a non-thread-safe SDK which previously dumped ASE in few seconds but ASA in 15-20 minutes with a lightweight parser is not "bloat" it is reliability and a way to scalability :)

Calling dumpclass dirty is not accurate - with the cloud dumper offline/broken, a local tool is the only functional option that should be available to everyone while you are trying to lock as much code behind web as possible :)

Even after improvements, DIA still ranges 20 seconds to 2 minutes depending on hardware, while raw_pdb consistently dumps in ~5 seconds on pretty much any hardware. Importantly, the raw_pdb implementation in this PR fully mimics DIA’s behavior, producing almost identical output :)

@Pelayori
Copy link
Member

Pelayori commented Dec 4, 2025

It does not need to be thread safe if it dumps in under 20 seconds tops with just 1 thread. Multi threading when starting up 10 servers can also destroy performance, it's not black or white. Besides, keeping the cache server running means no one needs to dump, and if they do, it would be in no time.

You are free to create your own dumper web/desktop though, not that I am asking you not to.

And after all, we've already implemented a working implementation without an additional dependency, and we are doing a redump of the core classes.

@salmonrider
Copy link
Contributor Author

salmonrider commented Dec 4, 2025

FasterPDB branch
image

My raw_pdb PR
image

Single threaded on R9 9950X3D

Log::GetLog()->info("Processing structures...");
//auto typesTask = std::async(std::launch::async, [this, &tpiStream, &typeTable]() {
	ProcessTypes(tpiStream, typeTable);
//});

Log::GetLog()->info("Processing functions...");
//auto functionsTask = std::async(std::launch::async, [this, &rawPdbFile, &dbiStream, &imageSectionStream]() {
	ProcessFunctions(rawPdbFile, dbiStream, imageSectionStream);
//});

Log::GetLog()->info("Processing global variables...");
//auto globalsTask = std::async(std::launch::async, [this, &rawPdbFile, &dbiStream, &imageSectionStream, &typeTable]() {
	ProcessGlobalVariables(rawPdbFile, dbiStream, imageSectionStream, typeTable);
//});

// Wait for all tasks to complete
//typesTask.wait();
//functionsTask.wait();
//globalsTask.wait();

@Pelayori
Copy link
Member

Pelayori commented Dec 4, 2025

7 vs 34 is a good difference, that will never be noticed given the cache server has most of the uptime on updates.

Any other reviewer can shed some light here, but I am in favour of not merging this PR.

@Lethalinjectionx
Copy link
Contributor

I agree and think we stick to the officially support Microsoft tool. With the changes @Pelayori has already put in the fasterpdb branch the time difference is not enough to warrant completely switching systems. I think leaving the cache system in place is still worthy as it still allows for instant starts up the majority of the time. The dumping feature doesn't really belong in the user facing DLL because 99.9% of the people using AsaApi will never use that feature. I'm in favor of using the web dumper or a standalone tool for any dumping needs.

@Pelayori
Copy link
Member

Pelayori commented Dec 4, 2025

I just requested review of a few more people.

@AvtandilJaparidze
Copy link
Contributor

While this is a good commit, I also think that switching libraries is not necessary.

@SubstituteR
Copy link
Contributor

SubstituteR commented Dec 8, 2025

Personally, I prefer raw_pdb over the Microsoft DIA SDK.

Currently we have to include msdia140.dll to ensure that users have access to the libraries needed, which actually could be afoul of the Microsoft software licensing terms.

The size difference between having this embedded dependency (a few KiB of text when compiled) compared to the full msdia140.dll (1.85 MiB) makes the argument about bloat seem silly to me.

I also like that it is thread-safe, minimal allocations, and operates on streams, which are perfect for the parser as it more-or-less uses the visitor-pattern to build the dictionaries.

Additionally, raw_pdb is actively developed, has great documentation, and is actively used by it's developers in their own debugging suite. It being open source (unlike DIA SDK) is another win to me, and being source-compiled means that there is never a chance of being locked out of future improvements due to ABI and closed source issues.

As-for the concerns of using many threads when spooling up servers, I actually think this can help a lot with that as the library itself uses memory-mapped files, which should allow you to effectively share the same physical memory page across process boundaries. (If the backing data structure is updated to also be stored in mapped memory, we could share the same PDB parse across server instances "for free," but that's more of a hypothetical than reality.)

Just my 2¢

@SubstituteR
Copy link
Contributor

One more thought, because of course there's always one once you hit the "Comment" button.

I am also not against abstracting the PDB parser into an interface, and allowing the user to decide which they prefer -- I think if that was provided in the PR there would be significantly less resistance as it would be opt-in and optional.

@Pelayori
Copy link
Member

Pelayori commented Dec 8, 2025

Having another point of view is always good. I don't oppose the idea of merging it however my main resistance towards the change is the fact that all used offsets (in headers at least) should be the same. Having testing that a merge can be considered.

However the dump command is something I consider not correctly added there, and I am willing to open source the cloud dumper because in fed up with being accused of trying to lock code behind closed source web apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants