Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Raingutter memory metrics #11

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

KJTsanaktsidis
Copy link

Description

I want to make Raingutter measure the amount of memory use shared between the unicorn master and workers. This can be done by reading through /proc/<unicorn_pid>/pagemap, extracting the kernel page frame number for each (resident) mapped into a process, and checking how many other processes that page is mapped into with /proc/kpagecount. Pages that are only mapped one unicorn process are "unshared" memory, whilst pages shared between many unicorn processes are "shared" memory. I think it'll be a really good insight into unicorn memory usage by seeing how much CoW page sharing is actually happening between the prefork workers.

Here's a terrible POC of how to do that scraping: https://gist.github.com/KJTsanaktsidis/61f491efba9baa06d0898ab7f2bd2711

I figured raingutter would be a good place to do this, because -
a) raingutter is already noodling around in /proc
b) you need to be uid 0 to access kernel page frame numbers through /proc, and giving those permissions to raingutter is less scary than giving them to an app itself.

The tricky bit is going to be to find the pids of the unicorns in kubernetes. My approach for this is going to be:

  • Mount host /proc into the raingutter container at /host/proc,
  • Collect the listener socket inode number from netlink,
  • cross-reference in /host/proc to find processes that have this socket inode open in /host/proc/pid/fd/, and are in our network namespace (/host/proc/pid/ns/net).

So far, I have three patches I wanted to run past you:

  • Change the metrics to be uints, rather than float64s - all of the things being measured are actually ints, and I couldn't quite figure out why they were being stored and parsed as floats (other than that the datadog/prometheus APIs at the end ultimately need these nubmers as floats).
  • Which is needed for my next trick,
  • Collect stats via netlink in preference to /proc/net. This was marked as a TODO in the code, and it seemed like a good thing to do while I was in the area. This gets us uint types straight out of the kernel, which is why I needed to twiddle all the metric types in the previous patch
  • Collect the listener socket inode number. This PR doesn't do anything with it yet, but I'll need it for my future plans.

CC

@zendesk/guide-ops

KJ Tsanaktsidis added 6 commits November 17, 2021 17:00
A whole bunch of metrics that should be integers are being represented
as floats. I get that we need to emit floats to the DD/prometheus APIs,
but the conversion should be done at the last possible moment.
I'm going to need this soon so that I can use it to find the unicorn
process automatically. It's available from /proc/net/tcp, so we should
collect it from there.
This is marked as a TODO in the existing parsing code, so it seems like
a good deed to pay forward. It will save a lot of string parsing.
Inside the kernel, the socket inode is stored as a 64-bit value. It is
also printed out as %lu in /proc/net/tcp, which means that on 64-bit
systems, the full 64-bit inode number will be printed out in that file
too.

UNFORTUNATELY, the netlink inet_diag API only exposes the listener
socket inode as a 32 bit number??? It just silently wraps it around if
you have a lot of inodes.

Because of this:
- Internally store the inode in raingutter as uint64, since we can get
  that from /proc
- Accept the wrapped-around value from netlink, since we have no choice
  there
- Mark /proc as the preferred and most correct option for raingutter
  socket stats.
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/CCORESF-534/raingutter_mem_metrics branch from 3f88f89 to 43f8847 Compare November 19, 2021 02:39
KJ Tsanaktsidis added 7 commits November 19, 2021 18:00
We can combine the worker metrics/the socket metrics into a single
mainloop, and run expensive host-proc-trawling pid-finding code less
ofte than then the high-frequence socket stats.
The socket inode method should work everywhere the pgrep one works,
better than the pgrep one works, and in many places where the pgrep one
won't work. Just delete it.
* Do each process in parallel (up to 8)
* Just compare mapped kpfns within this set of processes; don't bother
  looking at /proc/kpagecount.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant