Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StarRocks Connector #17329

Open
chenjian2664 opened this issue May 3, 2023 · 18 comments
Open

Add StarRocks Connector #17329

chenjian2664 opened this issue May 3, 2023 · 18 comments
Assignees
Labels
enhancement New feature or request jdbc Relates to Trino JDBC driver

Comments

@chenjian2664
Copy link
Contributor

StarRocks https://www.starrocks.io/

@chenjian2664 chenjian2664 added enhancement New feature or request roadmap Top level issues for major efforts in the project labels May 3, 2023
@chenjian2664 chenjian2664 self-assigned this May 3, 2023
@ebyhr ebyhr removed the roadmap Top level issues for major efforts in the project label May 3, 2023
@chenjian2664
Copy link
Contributor Author

Re-open it if we think we need it.

@ukclivecox
Copy link

Is there a more comprehensive description of why this issue was closed?

@hashhar
Copy link
Member

hashhar commented Sep 6, 2023

Probably because @chenjian2664 no longer needs it?

@chenjian2664
Copy link
Contributor Author

@hashhar Thanks for the attention, let me open it. Hopefully it can get reviewed

@chenjian2664 chenjian2664 reopened this Sep 7, 2023
@jakemongaya
Copy link

@chenjian2664 have you tried using the MySQL connector? Starrocks is mysql compatible.
Unable to test this as I don't have starrocks setup
https://docs.starrocks.io/en-us/main/integrations/IDE_integrations/DataGrip

@chenjian2664
Copy link
Contributor Author

@jakemongaya I tried. StarRocks is a OLAP database system, but it is usual that there will be multi ways to interact with the database, it's worth to add a connector for it.

Starrocks is mysql compatible

I am not sure how much the database can achieve. Even the database can fully make it, but in that way there might loose lots of chances for optimization of StarRocks, some best practice for MySQL may not suitable for the StarRocks.

@brunomarram
Copy link

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

@Wahno
Copy link

Wahno commented Dec 7, 2023

Expecting, Is there some progress on this?

@YuriyGavrilov
Copy link

+1

@hackeryang
Copy link
Member

hackeryang commented Dec 26, 2023

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';
set global sql_dialect = 'trino';

Then you will see all tables stored in the default catalog of StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector.
The better way is this PR: #17330

@A-little-bit-of-data
Copy link

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing?
There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

@hackeryang
Copy link
Member

hackeryang commented May 7, 2024

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

Hi @A-little-bit-of-data , we use Trino 423 and StarRocks 3.2 (without the storage-compute separation mode).

The key parameter to see tables in StarRocks is the set global enable_groupby_use_output_alias=true;, it only can see tables in the default catalog stored inside StarRocks, and cannot see tables in the hive catalog outside StarRocks for now, because i think that the concept of catalog itself means a storage engine, Trino queries other storages instead of other computing engines, so we cannot have such a querying chain: Trino->StarRocks->Hive.

I understand why you want to have such a querying chain, because StarRocks 3.x (with the storage-compute separation mode) can querying Hive 3X faster than Trino for now(with the C++ X86 AVX2 vectorization SIMD), but please be patient, the Project HummingBird is working on: #14237

@dishkakrauch
Copy link

Any news?

@Max-Cheng
Copy link

We finished the Starrocks/Doris interlnal storage direct source connector.Include dynamicFilter、Pushdown and some featrue. Faster than JDBC connector about 30%-50%. If community need it please let me know.

@dishkakrauch
Copy link

We finished the Starrocks/Doris interlnal storage direct source connector.Include dynamicFilter、Pushdown and some featrue. Faster than JDBC connector about 30%-50%. If community need it please let me know.

The community needs it)

@nqvuong1998
Copy link

Hi @Max-Cheng , it is essential to support Starrocks connector in Trino.

@Max-Cheng
Copy link

Max-Cheng commented Dec 30, 2024

@nqvuong1998 @dishkakrauch
We devolop in 435 branch. Code formated and refector interface into master branch will cost some time. Also, curreent code style unable merge by community. I'll finish code-style lint and some unit test contribute into community.
All work will be completed by the end of February at the earliest. It would be even better if you could join the development.

@Max-Cheng
Copy link

Creat a draft PR for the Starrocks Connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jdbc Relates to Trino JDBC driver
Development

Successfully merging a pull request may close this issue.