idle is a simple stateful stream processing framework
idle hopes to integrate in the following ways:
- kafka to {kafka, s3, ?}
- ? to {kafka, s3, ?}
with point #1 as priority.
- we rely on jq and json for schema processing. this can be slow.
- sql queries are baked into an 'evaluation tree/frame' but we still have to evaluate in-memory for each step
- opinionated parallelism structure means there is limited opportunity for fan-out or re-processing of events
this is a rough outline of what is next in no particular order:
- sql queries are parsed on start and baked in as an Eval Tree
- schema automation:
- column aliases are handled and inserted into the output schema
- column type casts are handled and inserted into the output schema
- schema automation:
- protobuf is supported out of the box
- how do we specify schemas?
- error handling with side-outputs
- streams mode to load multiple pipelines in one process
- SPIKE: SQL handling to join on another stream in 'streams mode'
- graceful termination of processes
- SQL compat (this will unlikely be >80% coverage of the postgres sql spec)
- sinks
- kafka producer
- s3 writer
- sources
- kafka consumer
- watermarking and orderliness