|
| 1 | +# V2 implementation proposal |
| 2 | + |
| 3 | +## Why this proposal |
| 4 | + |
| 5 | +The current implementation uses different `CloudEvent` classes for the implementation |
| 6 | +of different protocol bindings. This adds complexity the integration of new technologies |
| 7 | +like `pydantic`. |
| 8 | + |
| 9 | +The separation of responsibilities in the package is not clear. We have multiple JSON |
| 10 | +implementations, while the format specification is the same for all bindings in the |
| 11 | +[`CloudEvent` spec](https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md): |
| 12 | + |
| 13 | +* Pydantic implementation is calling the HTTP implementation for this, adding unnecessary overhead |
| 14 | +* the Kafka binding uses its own implementation, creating code duplication and unnecessary maintenance efforts |
| 15 | + |
| 16 | +We should have a single `CloudEvent` class used for all bindings/formats since the event |
| 17 | +is the Python representation of the event data, which is the same no matter what binding |
| 18 | +is used. (we potentially could receive a HTTP event and forward it as it is using Kafka). |
| 19 | + |
| 20 | +The specific bindings and formats implementations should be separate from the `CloudEvent` |
| 21 | +class. |
| 22 | + |
| 23 | +This proposal contains some base rules for the package architecture and how we could implement |
| 24 | +the `CloudEvent` spec in this package. |
| 25 | + |
| 26 | +## Requirements |
| 27 | + |
| 28 | +In order to be able to transform a python CloudEvent to/from a specific |
| 29 | +protocol binding, this SDK has the following responsibilities: |
| 30 | + |
| 31 | +* Identifying what attributes have to be marshalled/unmarshalled (identifying the cloudevents extensions) |
| 32 | +* Validate required fields and generation of defaults |
| 33 | +* Marshalling/unmarshalling the cloudevents core and extensions fields |
| 34 | +* Marshalling/unmarshalling the `data` field |
| 35 | +* Encode/decode the marshalled fields to/from the specific binding format |
| 36 | + |
| 37 | +We want to separate the responsibilities that belong to the event and its implementation |
| 38 | +in the user system (data, extensions) from the binding and format specification for |
| 39 | +core fields. |
| 40 | + |
| 41 | +## Modules |
| 42 | + |
| 43 | +We should separate the implementation of the requirements so that they can be tested independently and |
| 44 | +each component be used for composition. |
| 45 | + |
| 46 | +Ideally we should have these modules (naming TBC): |
| 47 | + |
| 48 | +* `cloudevents.event` - This would contain the `CloudEvent` base class. This class is the same for all bindings/formats. |
| 49 | +* `cloudevents.formats` - This would contain the structured formats implementations (JSON, AVRO, Protobuf) |
| 50 | +* `cloudevents.bindings` - This would contain the specific binding implementations (kafka, http) |
| 51 | + |
| 52 | +This matches the naming and concepts we have in the `CloudEvent` spec |
| 53 | + |
| 54 | +## Matching requirements and modules |
| 55 | + |
| 56 | +### The `cloudevents.event` module and `CloudEvent` class |
| 57 | + |
| 58 | +The `CloudEvent` class would satisfy the following requirements: |
| 59 | + |
| 60 | +* Validate required fields and generation of default values |
| 61 | +* Marshalling/unmarshalling the `data` field |
| 62 | +* Identifying what attributes have to be marshalled/unmarshalled |
| 63 | + |
| 64 | +The `data` field depends on the specific event and the `contentdatatype` attribute and |
| 65 | +its marshalling/unmarshalling logic should be the same for any binding/format. It seems |
| 66 | +a `CloudEvent` class responsibility to take care of this. (We should implement this logic |
| 67 | +in overridable methods) |
| 68 | + |
| 69 | +Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie: |
| 70 | + |
| 71 | +* adding new extensions fields |
| 72 | +* adding fields/methods not meant to be marshalled (internal fields) |
| 73 | +* implementing new defaults (ie. creating a `MyEvent` class with `type="myevent"` default) |
| 74 | + |
| 75 | +We also want a DTO that can be accepted by the most common libraries (ie. JSON) to keep the |
| 76 | +`CloudEvent` class decoupled from the formats/bindings. Using a dictionary seems a good idea. |
| 77 | + |
| 78 | +We should implement a `to_dict`/`from_dict` functionalities in the `CloudEvent` class. |
| 79 | + |
| 80 | +QUESTION: Can we define a data type for the `data` field after it's been marshalled? |
| 81 | +(ie. can we say it will be either `str` or binary data?) |
| 82 | + |
| 83 | +### Formats |
| 84 | + |
| 85 | +Formats do solve a single responsibility: |
| 86 | + |
| 87 | +* Marshalling/unmarshalling the cloudevents core and extensions fields |
| 88 | + |
| 89 | +Their implementation has to be used when other bindings implement their |
| 90 | +structured input/output and make sure their implementation is compatible. |
| 91 | + |
| 92 | +Each format accepts the dictionary from the `CloudEvent` as param |
| 93 | +and produces an output based on the format itself. Ie. |
| 94 | + |
| 95 | +```python |
| 96 | +def to_json(event: dict) -> str: ... |
| 97 | +def from_json(event: str) -> dict: ... |
| 98 | +``` |
| 99 | + |
| 100 | +TODO: Review typing as JSON produces a string but other formats might be different. |
| 101 | + |
| 102 | +### Bindings |
| 103 | + |
| 104 | +Bindings do solve these responsibilities: |
| 105 | + |
| 106 | +* Marshalling/unmarshalling the cloudevents core and extensions fields **(if binary format)** |
| 107 | +* Encode/decode the marshalled fields to/from the specific binding format |
| 108 | + |
| 109 | +They also have the responsibility of coordinating the other modules. |
| 110 | + |
| 111 | +This is a **concept** example for `cloudevents.bindings.http` module |
| 112 | + |
| 113 | +```python |
| 114 | +from cloudevents.formats.json import to_json, from_json |
| 115 | +from cloudevents.event import CloudEvent |
| 116 | + |
| 117 | + |
| 118 | +# Shared dataclass to help with typing, might be a dictionary or a named tuple (not important) |
| 119 | +@dataclass |
| 120 | +class HTTPCloudEvent: |
| 121 | + headers: list |
| 122 | + body: Any # unsure about typing here |
| 123 | + |
| 124 | + |
| 125 | +def to_binary(event: CloudEvent) -> HTTPCloudEvent: |
| 126 | + e = event.to_dict() |
| 127 | + # logic to marshall into http binary output |
| 128 | + return HTTPCloudEvent(headers=[], body=[]) |
| 129 | + |
| 130 | + |
| 131 | +def to_json(event: CloudEvent) -> HTTPCloudEvent: |
| 132 | + e = to_json(event.to_dict()) |
| 133 | + # logic to compose the JSON response |
| 134 | + return HTTPCloudEvent(headers=[], body=[]) |
| 135 | + |
| 136 | + |
| 137 | +def from_binary(headers: list, body: dict) -> CloudEvent: |
| 138 | + e = {} |
| 139 | + # specific logic to unmarshall raw headers into a dictionary |
| 140 | + return CloudEvent.from_dict(e) |
| 141 | + |
| 142 | + |
| 143 | +def from_json(body: str) -> CloudEvent: |
| 144 | + e = from_json(body) |
| 145 | + return CloudEvent.from_dict(e) |
| 146 | + |
| 147 | + |
| 148 | +def from_http(headers: list, body: dict) -> CloudEvent: |
| 149 | + # identify if structured or binary based on headers / body |
| 150 | + # and use the proper `from_` function |
| 151 | + # We should also handle here batches |
| 152 | + ... |
| 153 | +``` |
| 154 | + |
| 155 | +Other bindings would behave similarly, where the end user would need only |
| 156 | +to use the relevant module in `cloudevents.bindings` namespace. |
| 157 | + |
| 158 | +## CloudEvent subclasses |
| 159 | + |
| 160 | +We'll want SDK users to be able to extend the `CloudEvent` class, so they will |
| 161 | +be able to manage marshalling/unmarshalling of the `data` field for different |
| 162 | +event types and their python typing. |
| 163 | + |
| 164 | +This could be easily done by introducing a mapping between the `type` field and |
| 165 | +`CloudEvent` subclasses as a parameter in the `from_` functions. Ie. |
| 166 | + |
| 167 | +```python |
| 168 | +def from_binary(headers: list, body: dict, event_types: Optional[dict[str, type]] = None) -> CloudEvent: |
| 169 | + event_as_dict = {} |
| 170 | + # specific logic to unmarshall raw headers into a dictionary |
| 171 | + if event_types: |
| 172 | + # Use |
| 173 | + return event_types.get(event_as_dict['type'], CloudEvent).from_dict(e) |
| 174 | + else: |
| 175 | + return CloudEvent.from_dict(e) |
| 176 | +``` |
| 177 | + |
| 178 | +## Pydantic support |
| 179 | + |
| 180 | +The current Pydantic implementation is implemented as a possible substitute of |
| 181 | +HTTP Event class, by looking at its conversion module and tests, but it's used |
| 182 | +by calling the functions in conversion module. |
| 183 | + |
| 184 | +Pydantic offers good support and performance for data validation, defaults and |
| 185 | +JSON serialization (some of the requirements we have identified) |
| 186 | + |
| 187 | +We need to make a decision: |
| 188 | + |
| 189 | +* We use pydantic as a base for `CloudEvent` class: |
| 190 | + * PROs |
| 191 | + * No need to implement data validation |
| 192 | + * Performance (pydantic V2 is written in Rust) |
| 193 | + * We can lay down some base logic on `data` field serialization (ie. nested pydantic models) |
| 194 | + * Enable users to use pydantic for automatic documentation functionalities |
| 195 | + * CONs |
| 196 | + * We introduce a new package dependency, and depend on its implementation |
| 197 | + * _minor_ If we use pydantic JSON serialization it is implemented in the model class (slightly different from what proposed but we can use wrapper functions in the JSON format module) |
| 198 | +* We remove pydantic support: |
| 199 | + * PROs |
| 200 | + * We have more control over validation and defaults implementation |
| 201 | + * We don't depend on other packages |
| 202 | + * CONs |
| 203 | + * Performance on validation and defaults (we can still use libraries like `orjson` for JSON performance) |
| 204 | + * Additional effort in writing and maintaining data validation and required/default fields |
| 205 | +... |
0 commit comments