Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for incorrect thread count reporting on zen based processors #3

Closed
wants to merge 2 commits into from

Conversation

Edward-Bakker
Copy link

@Edward-Bakker Edward-Bakker commented Dec 12, 2020

This fixes issue #2

What this does:

  • Removes the code that changes the thread count to the core count when not based on the Bulldozer architecture

  • I have tested my changes on the Ryzen 5 3600

@pagefault
Copy link

@Edward-Bakker
Copy link
Author

Some reference material from AMD on this matter:

https://gpuopen.com/learn/cpu-core-count-detection-windows/

https://gpuopen.com/wp-content/uploads/2018/05/gdc_2018_sponsored_optimizing_for_ryzen.pdf (see page 25)

Not sure if this was directed at me, but a function that's called getDefaultThreadCount() should not return the value of the number of cores in my opinion. If a developer notices they get worse performance with SMT enabled they should adapt the code to fit their product.

But generally speaking SMT improves the performance of a product, thus should not be disabled by default.

@upiter
Copy link

upiter commented Dec 13, 2020

@samklop you just removed code that checks result of getCpuidFamily.
Your code always returns count = logical which is totally not correct for many legacy CPU Families.

@kasper93
Copy link

kasper93 commented Dec 13, 2020

Your code always returns count = logical which is totally not correct for many legacy CPU Families.

And how possibly it may be "not correct"? Also this sample is not ready to be used "as is", see below.

Some reference material from AMD on this matter:

vast majority of multithreaded games and applications work and scale really well when managing an active thread pool up to the number of logical cores that the processor supports.

a small number of games is that driving a hardware thread pool with more than the number of physical cores can reduce performance

I fully understand why this guidance and sample was created. But as stated "Remember to profile!", this sample is just an example code that should never be used "as is" in production.

It doesn't help that this sample was poorly written, in 2017 with Zen(1) in mind. But the check is only for "Bulldozer" family, and everything else is implicitly treated like Zen(1) with the guidance linked above. Now if you run this code 10 years from now on brand new architecture it will still apply guidance from Zen1 era (and won't use 4-way SMT that we may have). Or even now, does this still holds in Zen3, one may ask?

One cannot expect anything from the future architectures and such optimizations like affinity/thread count need to be done per application and per specific cpu family in mind. Generally using logical core count is universal way and any fine-tuning need to be profiled and not blindly applied.

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

@tannisroot
Copy link

I think GPUOpen overestimates the mental capabilities of game developers if they truly expect all of them to profile this stuff, since even a AAA game developer like CDPR didn't bother to do it.
Plus it honestly makes no sense to default this to the behavior that will affect a very small number of applications (which even this article states https://gpuopen.com/learn/cpu-core-count-detection-windows/).

@baryluk
Copy link

baryluk commented Dec 13, 2020

Please just remove this repository. It is utterly bad.

  1. It is windows specific.
  2. It is slow. It should instead use atomics to initialize value once. (using cpuid + strcmp in potentially hot code is really bad).
  3. There is no way to override it, i.e. using environment variables. (overriding using environment variable would be beneficial for quick profiling using same binary without recompiling, as well give end-users ability to bypass mistakes or wrong assumptions of the game dev).
  4. There is no documentation.
  5. API semantics is undefined.
  6. It is unclear if the function in question is supposed to return number of threads on CPU, or number of threads that application should use by default. These are two different questions.
  7. Library can't make decisions on number of threads to use, as this is purely dependent on workload. I.e. highly vectorized code that does a lot of memory streaming, will easily saturate execution units and memory bandwidth, and nicely utilize caches, and using SMT would actually be harmful due to data and instruction cache, TLB, prefetcher and branch predictor trashing. While highly irregular scalar code with a lot of random loads would benefit from SMT, to hide memory latencies and other stalls.
  8. It disables game developer thinking.
  9. It is short sighted and not updated in years, thus unmaintained.
  10. It is not a library, just a toy program.

Thank you.

@creker
Copy link

creker commented Dec 13, 2020

@kasper93

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

The main point of this code, according to readme and its name, is to detect logical and physical core counts. It does just that. The problem is that it then makes an arbitrary decision to treat some AMD CPUs differently. What this sample code should do is just return the number of cores and that's it. Or at the very least always return logical core count as default thread count because that's the only safe default. Everything else is harmful and misleading. It's naive to think that developers would test code before copying it. Especially when it requires complex testing on different hardware. CDPR is just one recent example of that.

The PR is definitely correct in that it removes the arbitrary logic and just makes the code do what it supposed to be doing - return core counts.

@baryluk
Copy link

baryluk commented Dec 13, 2020

@kasper93

That said this PR is obviously wrong, because main point of this sample code is to show how to select different thread count based on CPU family and you removed it. It is developer job to apply the code to his product properly to maximize performance.

The main point of this code, according to readme and its name, is to detect logical and physical core counts. It does just that. The problem is that it then makes an arbitrary decision to treat some AMD CPUs differently. What this sample code should do is just return the number of cores and that's it. Or at the very least always return logical core count as default thread count because that's the only safe default. Everything else is harmful and misleading. It's naive to think that developers would test code before copying it. Especially when it requires complex testing on different hardware. CDPR is just one recent example of that.

The PR is definitely correct in that it removes the arbitrary logic and just makes the code do what it supposed to be doing - return core counts.

While I understand the sentiment, what you are saying is not exactly correct.

The code has separate value for logical ("threads"), physical ("cores"), and default threads count (which it does one or another depending what it thinks). the "default thread count" is not number of SMT "threads" available on CPU (or CPUs), but rather a number of operating system threads to use on this CPU by default, i.e. for tasks pool queues, etc. There definitively is a place for such functions, but as I mentioned it before it is highly dependent on CPU, workloads, scalability / locking issues, and many more, plus it must have a way of overriding by user from outside of the program.

The sample does already return just cores and just threads. The function in question does a 3rd thing, which is something else.

This repo should be just removed. It doesn't serve any useful purpose to anybody.

A way better approach is to make a blog post, with some examples and a discussion.

@SekiBetu
Copy link

can't imagine how many applications and games be limited on AMD's cpu only because of this mistake👀

@bscout9956
Copy link

can't imagine how many applications and games be limited on AMD's cpu only because of this mistake👀

Simple, fire up any application that makes use of GPUOpen libraries and check whether it's not accounting for "SMT threads" aka the second thread for every core.

@kasper93
Copy link

kasper93 commented Dec 16, 2020

This repo should be just removed. It doesn't serve any useful purpose to anybody.

I totally agree. In my response I was kind of trying to understand (and explain) why this repo even exists, but it should never be used in production as copy-paste snippet.

As a appendix to blog post it is fine, shouldn't be a github repo, but an attachment to said blog post at best.

But guys, it all doesn't matter. There are loads of bad code over the net. Just don't copy it to your application. Think before. It is unfortunate that it's branded under GPUOpen libraries, but what can you do...

@SekiBetu
Copy link

SekiBetu commented Feb 8, 2021

@amdmarco @rys hey, sorry to bother you guys, but it's been 3 months, can someone come here to solve this problem? it's just a simple change to the code.

@rys
Copy link
Contributor

rys commented Feb 8, 2021

The issue with Cyberpunk caused us to revisit what this sample code recommends to developers and we have been addressing it, both with work with developers who have already used the code in production, and in a future update to this code where we want to make it clearer how to work with it and understand what advice it gives.

So we've been working on it, and I'll chase up where we are with a public update. That means we won't merge in this pull request, but we are working on it.

@rys rys closed this Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants