-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for arrow stream #265
Comments
@auxten how do you get a schema when using this
|
I understand that what you’re trying to do is retrieve the output schema and then stream the data into Delta Lake.
|
I added chdb to my etl benchmarks, feel free to have a look, if i am doing something terribly wrong |
@auxten If I understand right, clickhouse-local can load and process the data in streaming style, but chdb collect data from the clickhouse-local in batch style? |
yes, you are partially right. For input side, chDB does exactly the same as clickhouse-local. Reading data from file and http or s3 in stream and also random access mode. But for output side, the data is written in batch style. This is what we need to improve. |
first congratulation on the progress you made, chDB is substantially better than just 6 months ago, I am trying to read a folder of csv and export it to delta, current I am using df = sess.sql(sql,"ArrowTable") to transfer the data to deltalake Python, the problem is I am getting OOM errors, would be nice if you can add support for arrow recordbatch so the transfer is done in smaller batch
thanks
The text was updated successfully, but these errors were encountered: