Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to Raw Serialized Data (Raw Data) in Latest gRPC Version Without Marshaling - Previous Workaround No Longer Available #7794

Closed
mohdjishin opened this issue Oct 29, 2024 · 12 comments
Assignees
Labels
Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. stale Status: Requires Reporter Clarification Type: Question

Comments

@mohdjishin
Copy link
Contributor

In prior gRPC versions, the stats handler provided direct access to raw serialized data through payload.Data in *stats.InPayload. This functionality enabled certain applications to retrieve unmodified byte data directly. While a workaround was previously available — leveraging TagRPC to add values to the context and HandleRPC to access raw bytes for *stats.InPayload — this approach is no longer possible in the latest version.

Issue: With this removal, it is challenging to obtain raw request bytes directly without additional marshaling. The previous workaround, as discussed in @dfawley comment on May 17, 2023 Workaround, no longer works in the latest gRPC version.

The removal of payload.Data occurred in this commit
Screenshot from 2024-10-30 02-26-02

@dfawley
Copy link
Member

dfawley commented Oct 30, 2024

Apologies for the breakage! This field needed to be removed for performance reasons as we no longer create a full contiguous copy of the serialized data (except within the default proto codec, unfortunately, but if they ever improve the proto library to take advantage of split input buffers, we would start using it ASAP).

It's possible we could add the data back as a mem.BufferSlice instead of a flat buffer. However, another reason we thought it was a good idea to remove it was because it passes potentially sensitive user data to something whose purpose was supposed to be basic telemetry, and that can make it easier to create privacy problems.

Let me talk with the other gRPC leads (for Java/C++) and see what kind of options they might offer for something like this. It could be the case that it's just not possible without doing things more manually -- i.e. not using the generated code to register your method handlers, and have a custom codec that just can pass the bytes through instead.

@dfawley dfawley assigned dfawley and unassigned eshitachandwani Oct 30, 2024
@mohdjishin
Copy link
Contributor Author

Thank you, @dfawley , for the insights and explanation regarding the changes. The removal of payload.Data does impact applications like digital signature verification is essential. In our scenario, we sign the serialized raw data, and upon receiving it, we verify the signature against the exact bytes to ensure the data’s authenticity and integrity before processing it.

With the recent changes, we no longer have a straightforward way to access this raw data through *stats.InPayload, making it challenging to perform our verification step effectively

@dfawley
Copy link
Member

dfawley commented Oct 30, 2024

I spoke with the leads from Java & C++. They don't have any easy way to do exactly what you want, either. I'd probably recommend using the generic interface, which is what you'd probably want to do in the other languages. Or if you want something that takes a bit of setting up and a lot of magical looking stuff, keep reading...

Before that, can you explain your use case a little more? What are the reasons you are signing data and checking signatures at the application level, instead of using something like mTLS so that both sides know they are talking to a trusted party? It seems like that's really what you want to be using instead, or else you could validate specific fields within the payloads instead.

Regardless, the stats handler feels like the wrong place to do what you're doing. It was never intended for this use case at all. And the gRPC-Go interceptors have a very strange design that unfortunately doesn't lend itself to this kind of use case.
#1805, if it's ever prioritized, should definitely consider a new type of interceptor instead that passes payload bytes to the interceptor instead of the serialized message interface.


The other potential approach: registering the service by taking the generated <ServiceName>_ServiceDesc in the service's generated package (in the pb.go file), and modifying the Handler field(s) to point to a different implementation of the handler that has a custom type for the request messages for which you're trying to access the raw bytes.

E.g. using our route guide example:

var RouteGuide_ServiceDesc = grpc.ServiceDesc{

Step 1: define a custom type to hold the message and the bytes:

type specialProto struct {
	msg proto.Message
	rawBytes []byte
}

Step 2: implement a custom version of the "proto" codec to handle this type:

func init() {
	encoding.RegisterCodecV2(myProtoCodec{CodecV2: encoding.GetCodecV2("proto")})
}

type myProtoCodec {
	encoding.CodecV2 // embed the real proto codec for simplicity
}

// Override Unmarshal to handle specialProto.
// Note: this could be optimized, but gets more complicated.
func (m *myProtoCodec) Unmarshal(data mem.BufferSlice, v any) error {
	sp, ok := v.(*specialProto)
	if !ok {
		// Not a special proto; fall back.
		return m.CodecV2.Unmarshal(data, v)
	}

	// Special proto: call the original codec on the message field, then set rawBytes.
	if err := m.CodecV2.Unmarshal(data, sp.msg); err != nil {
		return err
	}
	sp.rawBytes = data.Materialize()
	return nil
}

Step 3: create something that can alter the way RPCs are handled to pass this specialProto type to the codec instead of directly passing the proto message:

// interceptingHandler creates a new method handler that intercepts the codec to provide a different type to it.
func interceptingHandler(origMethodHandler grpcMethodHandler) grpcMethodHandler {

	return func(srv any, ctx context.Context, dec func(any) error, interceptor UnaryServerInterceptor) (any, error) {
		// Substitute the decoding function.  When the real handler calls it, it will pass the proper proto
		// message type in "in".  We wrap that and pass it to the real decode function (the codec).
		dec := func(in any) error {
			spProto := &specialProto{msg: in}
			if err := dec(spProto); err != nil { // call the real codec on the special type.
				return nil, err
			}
			return spProto, nil
		}

		// Now call the original handler -- in our example that's "routeguide._RouteGuide_GetFeature_Handler"
		// https://github.com/grpc/grpc-go/blob/d66fc3a1efa1dfb33dfedf9760528f1ac2b923b6/examples/route_guide/routeguide/route_guide_grpc.pb.go#L210C6-L210C36
		return origMethodHandler(srv, ctx, dec, interceptor)
	}

}

// grpc should likely be exporting this; please file a bug if you'd like.
type grpcMethodHandler func(srv any, ctx context.Context, dec func(any) error, interceptor UnaryServerInterceptor) (any, error)

Step 4: modify the service descriptor and register and implement the service:

func main() {
	// Modify ServiceDesc to intercept whatever methods you want (here we do all of them)
	sd := routeguide.RouteGuide_ServiceDesc
	sd.HandlerType = (*any)(nil) // unfortunate, but necessary to appease some internal checks we have.
	for i := range sd.Methods {
		sd.Methods[i].Handler = interceptingHandler(sd.Methods[i].Handler)  // could be limited to only specific methods as needed.
	}
	// Intercepting streams is similar, but different.

	s := grpc.NewServer(...)
	s.RegisterService(sd, myServiceImpl{})
}

type myServiceImpl struct {}

func (myServiceImpl) GetFeature(ctx context.Context, in *specialProto) (*routeguide.Feature, error) {
	req, ok := in.msg.(*routeguide.Point)
	if !ok {
		return nil, status.Errorf(codes.Internal, "error decoding request message") // should never happen, but probably better than a panic
	}
	// in.rawBytes contains the raw bytes
}

That's a lot of hoops to jump through, though, so I'd just recommend directly creating your own ServiceDesc and implementing your own handler instead of making something that intercepts other handlers.

@mohdjishin
Copy link
Contributor Author

Thank you for the detailed guidance and potential solutions. Given the current limitations with the interceptor design, would it be possible to provide access to mem.BufferSlice directly within interceptors instead of proto.Message? This access would be invaluable for workflows that require raw serialized data, particularly in complex designs where exact byte-level accuracy is critical and additional marshaling may not be feasible. @dfawley

arjan-bal pushed a commit that referenced this issue Nov 11, 2024
@purnesh42H purnesh42H added the Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. label Nov 11, 2024
@dfawley
Copy link
Member

dfawley commented Nov 19, 2024

@mohdjishin do you have a proposal for how we could change things to do that?

@eshitachandwani
Copy link
Member

@mohdjishin Are you still actively looking into this?

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Dec 15, 2024
@purnesh42H
Copy link
Contributor

@mohdjishin are you still working on it?

@github-actions github-actions bot removed the stale label Dec 17, 2024
@mohdjishin
Copy link
Contributor Author

I'm not exactly sure how we can achieve this. The previous approach was working perfectly. Alternatively, we could consider using some grpc.DialOption function to register a custom type. Based on this, we could internally map the raw data in the context.

@purnesh42H purnesh42H assigned mohdjishin and unassigned mohdjishin Dec 23, 2024
@purnesh42H
Copy link
Contributor

@mohdjishin are you un-blocked on your actual use case? or still need more clarifications?

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@purnesh42H
Copy link
Contributor

Closing this issue as we don't plan to pursue this. There are other ways to approach the problem that work just as well like using the generic interface to access the raw data. Feel free to ask more questions for your specific use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. stale Status: Requires Reporter Clarification Type: Question
Projects
None yet
Development

No branches or pull requests

4 participants