Skip to content
This repository was archived by the owner on Aug 13, 2024. It is now read-only.

Add Spark Streaming with Kinesis support #142

Open
alexanderdean opened this issue Mar 17, 2016 · 0 comments
Open

Add Spark Streaming with Kinesis support #142

alexanderdean opened this issue Mar 17, 2016 · 0 comments
Assignees

Comments

@alexanderdean
Copy link
Contributor

See #43, #139

The basic idea is that you point a new webhook at a Snowplow Scala Stream Collector. Behind the Snowplow collector, we have a Spark Streaming job running Schema Guru. We would shard based on schema so that only one worker is operating on each schema. We fetch the most recent schema from Iglu, merge the output of our new 5 minutes worth of derivation, and then push the updated schema back to Iglu as a patch.

/cc @chuwy @fblundun, @a1nayak

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants