Skip to content

Commit 9b8df1e

Browse files
committed
V2 roadmap proposal
Signed-off-by: Federico Busetti <[email protected]>
1 parent 21572af commit 9b8df1e

File tree

1 file changed

+205
-0
lines changed

1 file changed

+205
-0
lines changed

v2-roadmap.md

+205
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# V2 implementation proposal
2+
3+
## Why this proposal
4+
5+
The current implementation uses different `CloudEvent` classes for the implementation
6+
of different protocol bindings. This adds complexity the integration of new technologies
7+
like `pydantic`.
8+
9+
The separation of responsibilities in the package is not clear. We have multiple JSON
10+
implementations, while the format specification is the same for all bindings in the
11+
[`CloudEvent` spec](https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md):
12+
13+
* Pydantic implementation is calling the HTTP implementation for this, adding unnecessary overhead
14+
* the Kafka binding uses its own implementation, creating code duplication and unnecessary maintenance efforts
15+
16+
We should have a single `CloudEvent` class used for all bindings/formats since the event
17+
is the Python representation of the event data, which is the same no matter what binding
18+
is used. (we potentially could receive a HTTP event and forward it as it is using Kafka).
19+
20+
The specific bindings and formats implementations should be separate from the `CloudEvent`
21+
class.
22+
23+
This proposal contains some base rules for the package architecture and how we could implement
24+
the `CloudEvent` spec in this package.
25+
26+
## Requirements
27+
28+
In order to be able to transform a python CloudEvent to/from a specific
29+
protocol binding, this SDK has the following responsibilities:
30+
31+
* Identifying what attributes have to be marshalled/unmarshalled (identifying the cloudevents extensions)
32+
* Validate required fields and generation of defaults
33+
* Marshalling/unmarshalling the cloudevents core and extensions fields
34+
* Marshalling/unmarshalling the `data` field
35+
* Encode/decode the marshalled fields to/from the specific binding format
36+
37+
We want to separate the responsibilities that belong to the event and its implementation
38+
in the user system (data, extensions) from the binding and format specification for
39+
core fields.
40+
41+
## Modules
42+
43+
We should separate the implementation of the requirements so that they can be tested independently and
44+
each component be used for composition.
45+
46+
Ideally we should have these modules (naming TBC):
47+
48+
* `cloudevents.event` - This would contain the `CloudEvent` base class. This class is the same for all bindings/formats.
49+
* `cloudevents.formats` - This would contain the structured formats implementations (JSON, AVRO, Protobuf)
50+
* `cloudevents.bindings` - This would contain the specific binding implementations (kafka, http)
51+
52+
This matches the naming and concepts we have in the `CloudEvent` spec
53+
54+
## Matching requirements and modules
55+
56+
### The `cloudevents.event` module and `CloudEvent` class
57+
58+
The `CloudEvent` class would satisfy the following requirements:
59+
60+
* Validate required fields and generation of default values
61+
* Marshalling/unmarshalling the `data` field
62+
* Identifying what attributes have to be marshalled/unmarshalled
63+
64+
The `data` field depends on the specific event and the `contentdatatype` attribute and
65+
its marshalling/unmarshalling logic should be the same for any binding/format. It seems
66+
a `CloudEvent` class responsibility to take care of this. (We should implement this logic
67+
in overridable methods)
68+
69+
Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie:
70+
71+
* adding new extensions fields
72+
* adding fields/methods not meant to be marshalled (internal fields)
73+
* implementing new defaults (ie. creating a `MyEvent` class with `type="myevent"` default)
74+
75+
We also want a DTO that can be accepted by the most common libraries (ie. JSON) to keep the
76+
`CloudEvent` class decoupled from the formats/bindings. Using a dictionary seems a good idea.
77+
78+
We should implement a `to_dict`/`from_dict` functionalities in the `CloudEvent` class.
79+
80+
QUESTION: Can we define a data type for the `data` field after it's been marshalled?
81+
(ie. can we say it will be either `str` or binary data?)
82+
83+
### Formats
84+
85+
Formats do solve a single responsibility:
86+
87+
* Marshalling/unmarshalling the cloudevents core and extensions fields
88+
89+
Their implementation has to be used when other bindings implement their
90+
structured input/output and make sure their implementation is compatible.
91+
92+
Each format accepts the dictionary from the `CloudEvent` as param
93+
and produces an output based on the format itself. Ie.
94+
95+
```python
96+
def to_json(event: dict) -> str: ...
97+
def from_json(event: str) -> dict: ...
98+
```
99+
100+
TODO: Review typing as JSON produces a string but other formats might be different.
101+
102+
### Bindings
103+
104+
Bindings do solve these responsibilities:
105+
106+
* Marshalling/unmarshalling the cloudevents core and extensions fields **(if binary format)**
107+
* Encode/decode the marshalled fields to/from the specific binding format
108+
109+
They also have the responsibility of coordinating the other modules.
110+
111+
This is a **concept** example for `cloudevents.bindings.http` module
112+
113+
```python
114+
from cloudevents.formats.json import to_json, from_json
115+
from cloudevents.event import CloudEvent
116+
117+
118+
# Shared dataclass to help with typing, might be a dictionary or a named tuple (not important)
119+
@dataclass
120+
class HTTPCloudEvent:
121+
headers: list
122+
body: Any # unsure about typing here
123+
124+
125+
def to_binary(event: CloudEvent) -> HTTPCloudEvent:
126+
e = event.to_dict()
127+
# logic to marshall into http binary output
128+
return HTTPCloudEvent(headers=[], body=[])
129+
130+
131+
def to_json(event: CloudEvent) -> HTTPCloudEvent:
132+
e = to_json(event.to_dict())
133+
# logic to compose the JSON response
134+
return HTTPCloudEvent(headers=[], body=[])
135+
136+
137+
def from_binary(headers: list, body: dict) -> CloudEvent:
138+
e = {}
139+
# specific logic to unmarshall raw headers into a dictionary
140+
return CloudEvent.from_dict(e)
141+
142+
143+
def from_json(body: str) -> CloudEvent:
144+
e = from_json(body)
145+
return CloudEvent.from_dict(e)
146+
147+
148+
def from_http(headers: list, body: dict) -> CloudEvent:
149+
# identify if structured or binary based on headers / body
150+
# and use the proper `from_` function
151+
# We should also handle here batches
152+
...
153+
```
154+
155+
Other bindings would behave similarly, where the end user would need only
156+
to use the relevant module in `cloudevents.bindings` namespace.
157+
158+
## CloudEvent subclasses
159+
160+
We'll want SDK users to be able to extend the `CloudEvent` class, so they will
161+
be able to manage marshalling/unmarshalling of the `data` field for different
162+
event types and their python typing.
163+
164+
This could be easily done by introducing a mapping between the `type` field and
165+
`CloudEvent` subclasses as a parameter in the `from_` functions. Ie.
166+
167+
```python
168+
def from_binary(headers: list, body: dict, event_types: Optional[dict[str, type]] = None) -> CloudEvent:
169+
event_as_dict = {}
170+
# specific logic to unmarshall raw headers into a dictionary
171+
if event_types:
172+
# Use
173+
return event_types.get(event_as_dict['type'], CloudEvent).from_dict(e)
174+
else:
175+
return CloudEvent.from_dict(e)
176+
```
177+
178+
## Pydantic support
179+
180+
The current Pydantic implementation is implemented as a possible substitute of
181+
HTTP Event class, by looking at its conversion module and tests, but it's used
182+
by calling the functions in conversion module.
183+
184+
Pydantic offers good support and performance for data validation, defaults and
185+
JSON serialization (some of the requirements we have identified)
186+
187+
We need to make a decision:
188+
189+
* We use pydantic as a base for `CloudEvent` class:
190+
* PROs
191+
* No need to implement data validation
192+
* Performance (pydantic V2 is written in Rust)
193+
* We can lay down some base logic on `data` field serialization (ie. nested pydantic models)
194+
* Enable users to use pydantic for automatic documentation functionalities
195+
* CONs
196+
* We introduce a new package dependency, and depend on its implementation
197+
* _minor_ If we use pydantic JSON serialization it is implemented in the model class (slightly different from what proposed but we can use wrapper functions in the JSON format module)
198+
* We remove pydantic support:
199+
* PROs
200+
* We have more control over validation and defaults implementation
201+
* We don't depend on other packages
202+
* CONs
203+
* Performance on validation and defaults (we can still use libraries like `orjson` for JSON performance)
204+
* Additional effort in writing and maintaining data validation and required/default fields
205+
...

0 commit comments

Comments
 (0)