Improve schedlat to report thread/task sched latency #14

tanelpoder · 2021-01-23T19:13:04Z

Currently schedlat shows only the process (thread group leader, PID) latency.

Use /proc/PID/task/TID/schedstat instead, to allow reporting a single thread's latency

aktau · 2024-11-22T23:06:38Z

The file states:

# Purpose: display % of time a process spent in CPU runqueue 
#          (scheduling latency)

Reading this makes me think it's talking about the entire process (all threads). If I'm reading the kernel implementation right:

		seq_printf(m, "%llu %llu %lu\n",
		   (unsigned long long)task->se.sum_exec_runtime,
		   (unsigned long long)task->sched_info.run_delay,
		   task->sched_info.pcount);

These numbers apply to a single kernel task (thread in userspace parlance). In my quest to figure out how to interpret/use these values, I stumbled upon some other interesting conversations:

It's likely I'm completely wrong though, and glad to be proven so.

tanelpoder · 2024-11-23T03:18:38Z

Yep, I currently use /proc/PID/schedstat so it will report only the thread group leader (PID) task's numbers. For complete picture, you'd actually snapshot and report each thread individually from /proc/PID/task/TID/schedstat. It would be easy to do that in my script, I just never got to it as the current PID approach worked well enough for my use cases (Oracle and Postgres both use a bunch of single-threaded processes).

I wrote a blog entry about reading the output here:

https://tanelpoder.com/posts/schedlat-low-tech-script-for-measuring-cpu-scheduling-latency-on-linux/

This tool's main differentiator is that it doesn't require any software installation or root access. For more complete scheduling latency analysis, you'd want to know individual wait durations or some histogram, etc, which eBPF-based tools like runqlat give you (if you have root access). My 0x.tools xcapture-bpf also shows "RQ" as a thread state if it's in the runqueue and in the final version I'll add some wait duration sampling/histogram too...

tanelpoder · 2024-11-23T03:20:10Z

Delay accounting is cool (and I've tried htop with it too), but it's a quite narrow-scoped feature "hardcoded" in to the kernel and accessing it requires root access, these days using eBPF based approach is the better approach in my opinion.

aktau · 2024-11-23T12:43:26Z

Delay accounting is cool (and I've tried htop with it too), but it's a quite narrow-scoped feature "hardcoded" in to the kernel and accessing it requires root access, these days using eBPF based approach is the better approach in my opinion.

I had no idea delay accounting existed. Thanks for the reference. It's pretty cool, but as you said it requires root, which is a no-go for me in production.

I believe /proc/PID/schedstats provides all I need though: runqueue latency for a given thread or process. In the process case, I need to sum the measures myself from all threads, which is a bit annoying and increases overhead. But it can be done.

I wrote a blog entry about reading the output here:

https://tanelpoder.com/posts/schedlat-low-tech-script-for-measuring-cpu-scheduling-latency-on-linux/

I had seen the article, and the comments in the schedlat python script. Neither explicitly mentions that this only tracks the main thread, and none of the children. This worked for your case because you were investigating mainly single-threaded programs. I think it'd be important to mention this. Thanks!

tanelpoder · 2024-11-23T19:49:43Z

Yep, my plan was to quickly implement the thread level summary option but then I ended overengineering the solution a bit - was thinking what would be a good way to lay out/visualize latency of 100+ threads under a single process. Can't have a separate column for each, too wide output... would need to perhaps display only top N threads with worst scheduling latency (but then they might jump around on the screen, etc).

Anyway, then moved on to other things... I think I'll just switch to always using /proc/PID/task/TID and it's up to the user to specify the right thread ID (including the thread group leader one) that they want to measure. Typically when I troubleshoot, I want to measure a specific single PID/thread's latency anyway and visualizing many threads "just in case" is less useful for ad-hoc interactive troubleshooting.

Perhaps I'll get to it over the holidays! Thanks.

tanelpoder#14 (comment)

Simple fix for issue tanelpoder#14

aktau mentioned this issue Nov 25, 2024

Adding /proc/<pid>/schedstat google/cadvisor#1872

Merged

tanelpoder self-assigned this Nov 29, 2024

BunningsWarehouseOfficial added a commit to BunningsWarehouseOfficial/0xtools that referenced this issue Dec 6, 2024

schedlat comment addressing issue tanelpoder#14

6106f21

tanelpoder#14 (comment)

BunningsWarehouseOfficial added a commit to BunningsWarehouseOfficial/0xtools that referenced this issue Dec 6, 2024

use /proc/PID/task/TID instead of /proc/PID

4893e5a

Simple fix for issue tanelpoder#14

BunningsWarehouseOfficial mentioned this issue Dec 6, 2024

schedlatsys: system-wide schedlat #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve schedlat to report thread/task sched latency #14

Improve schedlat to report thread/task sched latency #14

tanelpoder commented Jan 23, 2021

aktau commented Nov 22, 2024

tanelpoder commented Nov 23, 2024 •

edited

Loading

tanelpoder commented Nov 23, 2024 •

edited

Loading

aktau commented Nov 23, 2024

tanelpoder commented Nov 23, 2024

Improve schedlat to report thread/task sched latency #14

Improve schedlat to report thread/task sched latency #14

Comments

tanelpoder commented Jan 23, 2021

aktau commented Nov 22, 2024

tanelpoder commented Nov 23, 2024 • edited Loading

tanelpoder commented Nov 23, 2024 • edited Loading

aktau commented Nov 23, 2024

tanelpoder commented Nov 23, 2024

tanelpoder commented Nov 23, 2024 •

edited

Loading

tanelpoder commented Nov 23, 2024 •

edited

Loading