-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve schedlat to report thread/task sched latency #14
Comments
The file states: # Purpose: display % of time a process spent in CPU runqueue
# (scheduling latency) Reading this makes me think it's talking about the entire process (all threads). If I'm reading the kernel implementation right: seq_printf(m, "%llu %llu %lu\n",
(unsigned long long)task->se.sum_exec_runtime,
(unsigned long long)task->sched_info.run_delay,
task->sched_info.pcount); These numbers apply to a single kernel task (thread in userspace parlance). In my quest to figure out how to interpret/use these values, I stumbled upon some other interesting conversations:
It's likely I'm completely wrong though, and glad to be proven so. |
Yep, I currently use I wrote a blog entry about reading the output here:
This tool's main differentiator is that it doesn't require any software installation or root access. For more complete scheduling latency analysis, you'd want to know individual wait durations or some histogram, etc, which eBPF-based tools like |
Delay accounting is cool (and I've tried |
I had no idea delay accounting existed. Thanks for the reference. It's pretty cool, but as you said it requires root, which is a no-go for me in production. I believe
I had seen the article, and the comments in the schedlat python script. Neither explicitly mentions that this only tracks the main thread, and none of the children. This worked for your case because you were investigating mainly single-threaded programs. I think it'd be important to mention this. Thanks! |
Yep, my plan was to quickly implement the thread level summary option but then I ended overengineering the solution a bit - was thinking what would be a good way to lay out/visualize latency of 100+ threads under a single process. Can't have a separate column for each, too wide output... would need to perhaps display only top N threads with worst scheduling latency (but then they might jump around on the screen, etc). Anyway, then moved on to other things... I think I'll just switch to always using /proc/PID/task/TID and it's up to the user to specify the right thread ID (including the thread group leader one) that they want to measure. Typically when I troubleshoot, I want to measure a specific single PID/thread's latency anyway and visualizing many threads "just in case" is less useful for ad-hoc interactive troubleshooting. Perhaps I'll get to it over the holidays! Thanks. |
Simple fix for issue tanelpoder#14
Currently schedlat shows only the process (thread group leader, PID) latency.
Use /proc/PID/task/TID/schedstat instead, to allow reporting a single thread's latency
The text was updated successfully, but these errors were encountered: