Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow expressions as file_desc and message_type args #900

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nb-mouse
Copy link

This might be handy to dynamically load a protobuf description files and specifying message type based on payload being transformed.

This implementation allows to pass expressions instead of literal-only (constant) arguments.

Use case:

[sources.protobuf]
type = "stdin"
decoding.codec = "protobuf"
decoding.protobuf.desc_file = "event.desc"
decoding.protobuf.message_type = "EventLog"

[transforms.transform]
type = "remap"
inputs = ["protobuf"]
source = """
message_type = replace(.event.proto_data.type_url, "type.googleapis.com/", "") ?? ""
file_desc = sha1(message_type) + ".desc"
. = {
  "id": .event.id,
  "type": message_type,
  "name": parse_proto(.event.proto_data.value, file_desc, message_type)
} ?? {}
"""

PS. I have a follow-up for recursive handling Any messages when processing source configuration.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @nb-mouse . The reason these parameters are currently required to be static is so that they can just be loaded once at start-up and reused. This change will require the files to be reloaded on each and every event flowing through VRL and are Vector is likely to incur a large performance overhead because of that. If we implemented #137 that could enable something like this while maintaining performance. https://github.com/vectordotdev/vrl/blob/main/DESIGN.md discusses this and other VRL design decisions.

Could you share a bit your use case for using a dynamic proto definition? Would it be possible to have multiple remap transforms where you split the input by type beforehand? Or is the complete list of types not known apriori?

Alternatively, and just speculating, maybe we could relax the static constraints on functions with some sort of opt-in flag for users know are willing to make the trade-off. This would need some more discussion before implementation though.

@nb-mouse
Copy link
Author

@jszwedko, this was a first attempt to handle events with Any messages. Eventually, we have some protos that defines more sub-messages of the Any type and current PR cannot handle that. There are hundreds of different messages and most of them is not possible to collect as single desc file. I was thinking to implement a LRU cache for loaded descriptors shared with several message types. The tricky part how to organize desc file catalog to load a proper desc file for a given message type.

Here is very early draft demonstrating an approach that works with several levels of Any messages: db26ba7

@nb-mouse
Copy link
Author

It comes out that we can combine several desc files into one. So, we can use one desc file defined in source config and recursively attempt to load a desc from it when parsing Any message.

For this particular PR we indeed might introduce an option either to load descriptor at compile time or dynamically. But I don't know how to implement that. Any other functions implements something similar?

@nb-mouse
Copy link
Author

Perhaps, as additional feature we can add an option for the source config for protobuf to specify a directory with desc files and merge them on the Vector startup.

@nb-mouse
Copy link
Author

Instead of extra parameters we might introduce separate proto_parse function. To simplify the logic and explicitly distinct functionality.

@nb-mouse
Copy link
Author

Hopefully, this should fit better:

#901
vectordotdev/vector#20708

The changes above affects only source decoder. Perhaps, we can pass an extra config option to enable/disable this lookup feature.

@jszwedko
Copy link
Member

Thanks for the other PR! We'll take a look at that one.

@pront pront marked this pull request as draft October 18, 2024 14:07
@pront
Copy link
Member

pront commented Nov 19, 2024

Instead of extra parameters we might introduce separate proto_parse function. To simplify the logic and explicitly distinct functionality.

Hello, I came across this while looking at open VRL PRs. I think this idea is good. Also, it would benefit greatly from #137.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants