Skip to content

Conversation

@Cireo
Copy link

@Cireo Cireo commented Jun 26, 2025

This was a change that I've been patching locally for a couple of years.

Some details are ommited, but when packets are parsed by logstash using the netflow codec, there are two issues.

storage of templates will slowly leak memory
   The cache ttl is never checked, so it grows without bounds.
   There is a workaround of doing cache cleanup by specifing a cache file.

netflow will stop processing packets in burst
   This is probably not resolved, since it existed before this change.
   However, the same issue of (unbounded recv-q) persist when we use a cache file.

The issue with the cache file is multi-part:

  • data is keyed by template but doesn't get passed metadata host/port
  • this means that sources clobber each other's templates
  • this is also all for the best, or we would store 3GB of data instead of 2MB
  • aside: this could even be resolved by having a multi-part lookup
    template.cache.definitions :: {hash -> template} // 4 * 600 bytes, 2KB
    template.cache.keys :: {key -> hash} // 3k * 30 bytes, 10KB
    and then we could even have the different flow exporters not collide
  • the cache is rewritten on every new template
  • these come constantly (every X minutes times 2k source)
  • when the 2MB cache is rewritten it takes a mutex lock
  • the file just flickers in and out of existence as fast as possible

This grinds the entire processing to a halt. Even when changing the cache to only be rewritten if a new key is present it didn't resolve the core issues.

This was a change that I've been patching locally for a couple of years.

Some details are ommited, but when packets are parsed by
logstash using the netflow codec, there are two issues.
    
    storage of templates will slowly leak memory
       The cache ttl is never checked, so it grows without bounds.
       There is a workaround of doing cache cleanup by specifing a cache file.
    
      netflow will stop processing packets in burst
       This is probably not resolved, since it existed before this change.
       However, the same issue of (unbounded recv-q) persist when we use a cache file.
    
The issue with the cache file is multi-part:
  - data is keyed by template but doesn't get passed metadata host/port
  - this means that sources clobber each other's templates
  - this is also all for the best, or we would store 3GB of data instead of 2MB
  - aside: this could even be resolved by having a multi-part lookup
        template.cache.definitions :: {hash -> template}  // 4 * 600 bytes,  2KB
              template.cache.keys :: {key -> hash}              // 3k * 30 bytes, 10KB
    and then we could even have the different flow exporters not collide
  - the cache is rewritten on every new template
  - these come constantly (every X minutes times 2k source)
  - when the 2MB cache is rewritten it takes a mutex lock
  - the file just flickers in and out of existence as fast as possible

This grinds the entire processing to a halt.  Even when changing the
cache to only be rewritten if a new key is present it didn't resolve the
core issues.
@cla-checker-service
Copy link

cla-checker-service bot commented Jun 26, 2025

💚 CLA has been signed

@Cireo
Copy link
Author

Cireo commented Jun 26, 2025

Please, read and sign the above mentioned agreement if you want to contribute to this project

I did sign this while creating the commit, perhaps a timing issue? Please re-trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant