Skip to content

Rust Arroyo adapter #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Rust Arroyo adapter #98

wants to merge 15 commits into from

Conversation

fpacifici
Copy link
Collaborator

@fpacifici fpacifici commented Apr 10, 2025

A basic Rust Arroyo adapter to urn streaming pipelines.

The basic idea is to have the runtime built in rust and able to run either rust or python
application logic on top of it. Potentially mixing up rust and python on the same pipeline
on the same consumer should be possible.

The adapter is built with pyo3 and maturin and it is packaged inside the sentry_stream
python package as a binary.
The python runner is still the entrypoint. All the pipeline translation logic is still the same
and it is in python so only the basic primitives have to be written in rust.
At this stage only source, sink and map are provided, as a followup there will be a
basic rust Arroyo strategy that runs pyhon code so we can make python primitives
immediately runnable in rust and port them to rust native when we want. Still the
primitives will be implemented only once.

On top of this work we can define how to fully integrate Rust code in the pipeline.

Work to be done after this PR.

  • Provide a standard rust/python mixed strategy so we can easily make the existing primitives from rust even if they are built in python
  • Support full configuration for sources and sinks.
  • Provide a better way to test the Rust consumer side from the python side. Now we only expose the run method which is untestable. Also support the local broker/consumer/producer everywhere
  • Provide a way to run Rust native logic in a pipeline defined with the Python DSL
  • Better typing
  • Some load testing
  • Support the missing primitives. This only supports map

Run it this way

python -m sentry_streams.runner \
    -n SimpleMap \
    --adapter rust_arroyo \
    --config ~/code/streams/sentry_streams/sentry_streams/deployment_config/simple_map.yaml \
    ~/code/streams/sentry_streams/sentry_streams/examples/simple_map.py

Architecture:

@fpacifici fpacifici force-pushed the fpacifici/rust_adapter branch from 9f63abb to 10e66f7 Compare April 13, 2025 18:51
@fpacifici fpacifici force-pushed the fpacifici/rust_adapter branch from bd59e6f to 95dfcec Compare April 14, 2025 20:46
@fpacifici fpacifici marked this pull request as ready for review April 14, 2025 23:49
},
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It was a previous implementation. I forgot to remove.

@fpacifici fpacifici changed the title Rust Arroyo adapter POC Rust Arroyo adapter Apr 24, 2025
@ayirr7
Copy link
Member

ayirr7 commented Apr 24, 2025

Since the PR is fairly large, I think I'm having a bit of a hard time understanding how all the files in src/ fit together, as well as how they interact with rust_arroyo.py. Would you be able to explain that at a high level in this PR (or however you prefer)?

)


class RustArroyoAdapter(StreamAdapter[Route, Route]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I really understand why we have this Adapter? At a high level, does this accomplish something different than ArroyoAdapter in streams/?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually it does not do much differently. Though I created a different one because the data structures involved are quite different:

  • ArroyoStep does not exist, it is replaced by a Rust enum.
  • ArroyoConsumer is replaced by a rust one, but its interface in Rust is quite different.
  • Arroyo Consumer and Producers have to be created by the Rust code because they are the rust ones so I cannot use the logic that creates the python ones. These are Arroyo structs, they cannot be exposed to python.
  • I cannot expose the StrategyFactory to python (that's an arroyo interface) so everything has to be hidden behind the run method in the ArroyoConsumer Rust structure.

All in all it could be possible to merge the logic into the python Arroyo adapter but the result would be very brittle as the interfaces to deal with between python and Rust are very different.
The logic is more clear this way.

@fpacifici
Copy link
Collaborator Author

Since the PR is fairly large, I think I'm having a bit of a hard time understanding how all the files in src/ fit together, as well as how they interact with rust_arroyo.py. Would you be able to explain that at a high level in this PR (or however you prefer)?

Done. See the PR description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants