Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve schedlat to report thread/task sched latency #14

Open
tanelpoder opened this issue Jan 23, 2021 · 5 comments
Open

Improve schedlat to report thread/task sched latency #14

tanelpoder opened this issue Jan 23, 2021 · 5 comments
Assignees

Comments

@tanelpoder
Copy link
Owner

Currently schedlat shows only the process (thread group leader, PID) latency.

Use /proc/PID/task/TID/schedstat instead, to allow reporting a single thread's latency

@aktau
Copy link

aktau commented Nov 22, 2024

The file states:

# Purpose: display % of time a process spent in CPU runqueue 
#          (scheduling latency)

Reading this makes me think it's talking about the entire process (all threads). If I'm reading the kernel implementation right:

		seq_printf(m, "%llu %llu %lu\n",
		   (unsigned long long)task->se.sum_exec_runtime,
		   (unsigned long long)task->sched_info.run_delay,
		   task->sched_info.pcount);

These numbers apply to a single kernel task (thread in userspace parlance). In my quest to figure out how to interpret/use these values, I stumbled upon some other interesting conversations:

It's likely I'm completely wrong though, and glad to be proven so.

@tanelpoder
Copy link
Owner Author

tanelpoder commented Nov 23, 2024

Yep, I currently use /proc/PID/schedstat so it will report only the thread group leader (PID) task's numbers. For complete picture, you'd actually snapshot and report each thread individually from /proc/PID/task/TID/schedstat. It would be easy to do that in my script, I just never got to it as the current PID approach worked well enough for my use cases (Oracle and Postgres both use a bunch of single-threaded processes).

I wrote a blog entry about reading the output here:

This tool's main differentiator is that it doesn't require any software installation or root access. For more complete scheduling latency analysis, you'd want to know individual wait durations or some histogram, etc, which eBPF-based tools like runqlat give you (if you have root access). My 0x.tools xcapture-bpf also shows "RQ" as a thread state if it's in the runqueue and in the final version I'll add some wait duration sampling/histogram too...

@tanelpoder
Copy link
Owner Author

tanelpoder commented Nov 23, 2024

Delay accounting is cool (and I've tried htop with it too), but it's a quite narrow-scoped feature "hardcoded" in to the kernel and accessing it requires root access, these days using eBPF based approach is the better approach in my opinion.

@aktau
Copy link

aktau commented Nov 23, 2024

Delay accounting is cool (and I've tried htop with it too), but it's a quite narrow-scoped feature "hardcoded" in to the kernel and accessing it requires root access, these days using eBPF based approach is the better approach in my opinion.

I had no idea delay accounting existed. Thanks for the reference. It's pretty cool, but as you said it requires root, which is a no-go for me in production.

I believe /proc/PID/schedstats provides all I need though: runqueue latency for a given thread or process. In the process case, I need to sum the measures myself from all threads, which is a bit annoying and increases overhead. But it can be done.

I wrote a blog entry about reading the output here:

https://tanelpoder.com/posts/schedlat-low-tech-script-for-measuring-cpu-scheduling-latency-on-linux/

I had seen the article, and the comments in the schedlat python script. Neither explicitly mentions that this only tracks the main thread, and none of the children. This worked for your case because you were investigating mainly single-threaded programs. I think it'd be important to mention this. Thanks!

@tanelpoder
Copy link
Owner Author

Yep, my plan was to quickly implement the thread level summary option but then I ended overengineering the solution a bit - was thinking what would be a good way to lay out/visualize latency of 100+ threads under a single process. Can't have a separate column for each, too wide output... would need to perhaps display only top N threads with worst scheduling latency (but then they might jump around on the screen, etc).

Anyway, then moved on to other things... I think I'll just switch to always using /proc/PID/task/TID and it's up to the user to specify the right thread ID (including the thread group leader one) that they want to measure. Typically when I troubleshoot, I want to measure a specific single PID/thread's latency anyway and visualizing many threads "just in case" is less useful for ad-hoc interactive troubleshooting.

Perhaps I'll get to it over the holidays! Thanks.

@tanelpoder tanelpoder self-assigned this Nov 29, 2024
BunningsWarehouseOfficial added a commit to BunningsWarehouseOfficial/0xtools that referenced this issue Dec 6, 2024
BunningsWarehouseOfficial added a commit to BunningsWarehouseOfficial/0xtools that referenced this issue Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants