From 8044a2f45e2db67bcd176b5d7b71f9072b1d7b0a Mon Sep 17 00:00:00 2001 From: xudong963 Date: Thu, 29 Dec 2022 18:26:02 +0800 Subject: [PATCH 1/3] docs: blog for new sqllogictest framework --- .../blog/2022-12-29-sqllogictest-framework.md | 44 +++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 website/blog/2022-12-29-sqllogictest-framework.md diff --git a/website/blog/2022-12-29-sqllogictest-framework.md b/website/blog/2022-12-29-sqllogictest-framework.md new file mode 100644 index 0000000000000..e1a921ebc2e5c --- /dev/null +++ b/website/blog/2022-12-29-sqllogictest-framework.md @@ -0,0 +1,44 @@ +--- +title: Rewrite sqllogictest framework in rust +description: sqllogictest, rust +date: 2022-12-29 +tags: [databend, sqllogictest, rust] +authors: +- name: Xudong + url: https://github.com/xudong963 + image_url: https://github.com/xudong963.png +--- + +Sqllogictest is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines. Sqllogictest was originally designed to test [SQLite](http://www.sqlite.org/), but it is database engine neutral and can just as easily be used to test other database products. + +In the rust ecosystem, [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) is a very good implementation of the sqllogictest framework, thanks to which databend can easily and quickly switch the sqllogictest framework from python to rust. + +### Background + +- For some historical reasons, databend's original sqllogictest framework was the python version, as seen in [rfc for sqllogictest](https://databend.rs/doc/contributing/rfcs/new_sql_logic_test_framework) +- We decided to rewrite it in rust for the following reasons: + - Unified codebase, all with rust, long-term can improve the framework development and iteration speed, after all, the whole team is rustacean. + - The old framework did not have a strict parser front end and some errors could not be caught. + - [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) crate is maturing and building databend's new sqllogictest framework based on it can save a lot of labor. + - Switching from python to rust, there is a potential performance gain. The current python version of sqllogictest has a suboptimal runtime, resulting in a slower CI, and a more desirable end result, **with about 10x improvement**. + +### Implementation + +- The first version of sqllogictest does not strictly follow the [sqllogictest wiki](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki) implementation, so the format of the test file needs to be adjusted (purely physical work), for example: + - query and `----` has extra blank lines in between. + - The comment format is different. + - Some queries with empty results, such as `select ' ', 1`, are displayed directly with ` `, which can easily cause confusion throughout the test file and needs to be displayed with `(empty)` instead. + - ... +- Databend supports three client handlers: mysql, http, and clickhouse as described in [handlers](https://databend.rs/doc/reference/api), each of which returns a different format of content. Mysql is more normal, but http returns json and clickhouse returns tsv. In http and clickhouse, the following substitutions need to be made: + - `inf` -> `Infinity` + - `nan` -> `NaN` + - `\\N` -> `NULL` +- Isolation between test files. In order to increase the parallelism as much as possible (multiple test files can run in parallel), we need to isolate different files to prevent misuse of database or table, avoid database or table being dropped by mistake, we introduced `sandbox tenant`, each test file a separate sandbox environment, so that the files can run in parallel, greatly reducing the test time. +- ... + +### Unsolved Issues +What is the best way to effectively test a query that has dynamic results? +- For example, `SHOW TABLE STATUS` results will include the creation time of the table, which is dynamic, and it is worth discussing how to test such sql. + +### Conclusion +Overall, the benefits of switching from the python version to the rust version to solve the problems mentioned in the background are very good, and [RIIR](https://github.com/ansuz/RIIR) has its justification! For more information on using the databend sqllogictest framework, see [README](https://github.com/datafuselabs/databend/blob/main/tests/sqllogictests/README.md). Some future todos: [sqllogictest tracking](https://github.com/datafuselabs/databend/issues/9174). And finally, thanks to [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) for the support! \ No newline at end of file From fc76bd57efd575ebbe76415f2154a06b50f17c93 Mon Sep 17 00:00:00 2001 From: xudong963 Date: Thu, 29 Dec 2022 20:41:39 +0800 Subject: [PATCH 2/3] update --- website/blog/2022-12-29-sqllogictest-framework.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/blog/2022-12-29-sqllogictest-framework.md b/website/blog/2022-12-29-sqllogictest-framework.md index e1a921ebc2e5c..a0f40705510ea 100644 --- a/website/blog/2022-12-29-sqllogictest-framework.md +++ b/website/blog/2022-12-29-sqllogictest-framework.md @@ -9,7 +9,7 @@ authors: image_url: https://github.com/xudong963.png --- -Sqllogictest is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines. Sqllogictest was originally designed to test [SQLite](http://www.sqlite.org/), but it is database engine neutral and can just as easily be used to test other database products. +[Sqllogictest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki) is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines. Sqllogictest was originally designed to test [SQLite](http://www.sqlite.org/), but it is database engine neutral and can just as easily be used to test other database products. In the rust ecosystem, [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) is a very good implementation of the sqllogictest framework, thanks to which databend can easily and quickly switch the sqllogictest framework from python to rust. @@ -17,8 +17,8 @@ In the rust ecosystem, [sqllogictest-rs](https://github.com/risinglightdb/sqllog - For some historical reasons, databend's original sqllogictest framework was the python version, as seen in [rfc for sqllogictest](https://databend.rs/doc/contributing/rfcs/new_sql_logic_test_framework) - We decided to rewrite it in rust for the following reasons: - - Unified codebase, all with rust, long-term can improve the framework development and iteration speed, after all, the whole team is rustacean. - - The old framework did not have a strict parser front end and some errors could not be caught. + - By having a unified codebase written in Rust, we can improve the speed of development and iteration on our framework in the long term. This is because the entire team is proficient in Rust. + - The previous framework lacked a strict parser at the front end, which resulted in some errors going undetected. - [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) crate is maturing and building databend's new sqllogictest framework based on it can save a lot of labor. - Switching from python to rust, there is a potential performance gain. The current python version of sqllogictest has a suboptimal runtime, resulting in a slower CI, and a more desirable end result, **with about 10x improvement**. From 54b47f4bbe8978099bd690d52395498605c00cc2 Mon Sep 17 00:00:00 2001 From: xudong963 Date: Fri, 30 Dec 2022 12:28:17 +0800 Subject: [PATCH 3/3] update --- .../blog/2022-12-29-sqllogictest-framework.md | 72 ++++++++++--------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/website/blog/2022-12-29-sqllogictest-framework.md b/website/blog/2022-12-29-sqllogictest-framework.md index a0f40705510ea..df5ba66635913 100644 --- a/website/blog/2022-12-29-sqllogictest-framework.md +++ b/website/blog/2022-12-29-sqllogictest-framework.md @@ -9,36 +9,42 @@ authors: image_url: https://github.com/xudong963.png --- -[Sqllogictest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki) is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines. Sqllogictest was originally designed to test [SQLite](http://www.sqlite.org/), but it is database engine neutral and can just as easily be used to test other database products. - -In the rust ecosystem, [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) is a very good implementation of the sqllogictest framework, thanks to which databend can easily and quickly switch the sqllogictest framework from python to rust. - -### Background - -- For some historical reasons, databend's original sqllogictest framework was the python version, as seen in [rfc for sqllogictest](https://databend.rs/doc/contributing/rfcs/new_sql_logic_test_framework) -- We decided to rewrite it in rust for the following reasons: - - By having a unified codebase written in Rust, we can improve the speed of development and iteration on our framework in the long term. This is because the entire team is proficient in Rust. - - The previous framework lacked a strict parser at the front end, which resulted in some errors going undetected. - - [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) crate is maturing and building databend's new sqllogictest framework based on it can save a lot of labor. - - Switching from python to rust, there is a potential performance gain. The current python version of sqllogictest has a suboptimal runtime, resulting in a slower CI, and a more desirable end result, **with about 10x improvement**. - -### Implementation - -- The first version of sqllogictest does not strictly follow the [sqllogictest wiki](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki) implementation, so the format of the test file needs to be adjusted (purely physical work), for example: - - query and `----` has extra blank lines in between. - - The comment format is different. - - Some queries with empty results, such as `select ' ', 1`, are displayed directly with ` `, which can easily cause confusion throughout the test file and needs to be displayed with `(empty)` instead. - - ... -- Databend supports three client handlers: mysql, http, and clickhouse as described in [handlers](https://databend.rs/doc/reference/api), each of which returns a different format of content. Mysql is more normal, but http returns json and clickhouse returns tsv. In http and clickhouse, the following substitutions need to be made: - - `inf` -> `Infinity` - - `nan` -> `NaN` - - `\\N` -> `NULL` -- Isolation between test files. In order to increase the parallelism as much as possible (multiple test files can run in parallel), we need to isolate different files to prevent misuse of database or table, avoid database or table being dropped by mistake, we introduced `sandbox tenant`, each test file a separate sandbox environment, so that the files can run in parallel, greatly reducing the test time. -- ... - -### Unsolved Issues -What is the best way to effectively test a query that has dynamic results? -- For example, `SHOW TABLE STATUS` results will include the creation time of the table, which is dynamic, and it is worth discussing how to test such sql. - -### Conclusion -Overall, the benefits of switching from the python version to the rust version to solve the problems mentioned in the background are very good, and [RIIR](https://github.com/ansuz/RIIR) has its justification! For more information on using the databend sqllogictest framework, see [README](https://github.com/datafuselabs/databend/blob/main/tests/sqllogictests/README.md). Some future todos: [sqllogictest tracking](https://github.com/datafuselabs/databend/issues/9174). And finally, thanks to [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) for the support! \ No newline at end of file +**Rewriting sqllogictest Framework with Rust** + +This post is about a big move we've made for Databend. We successfully switched the sqllogictest framework from Python to Rust using sqllogictest-rs, a robust implementation of the sqllogictest framework for the Rust ecosystem. Sqllogictest was designed with SQLite in mind. Benefiting from its neutrality towards database engines, we can use Sqllogictest to verify the accuracy of a SQL database engine as well. This is done by comparing query results from multiple SQL engines running the same query. + +**Why sqllogictest-rs**? + +The original sqllogictest framework ([RFC for sqllogictest](https://databend.rs/doc/contributing/rfcs/new_sql_logic_test_framework)) was written in Python. We planned a switch to [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) for the following reasons: + +- The entire Databend team is proficient in Rust. Working with a unified codebase written in Rust would boost our productivity over the long term. +- The previous framework lacked a strict parser at the front end and resulted in errors going undetected. +- As the sqllogictest-rs crate is maturing, building a new sqllogictest framework based on it would save us a lot of effort in the long run. +- We expected a 10x performance boost from the switch to Rust. The Python sqllogictest had been experiencing suboptimal runtime that resulted in a slower CI. + +**How We Nailed It** + +Our first version of sqllogictest doesn't strictly follow the [sqllogictest wiki](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki). This is a little bit frustrating because we have to manually adjust the format of the test files, for example, in some cases like these: + +- Extra blank lines between the query and `----`. +- Non-identical comment formats. +- Confusing empty strings: It displays results from queries like `select ' '` with ` `, rather than `(empty)`. + +Databend supports three types of client handlers: MySQL, HTTP, and ClickHouse. Each type of them returns content in a different format. The HTTP handler returns content in JSON format and the ClickHouse handler returns it in TSV, both of which require the following substitutions: + +- inf -> Infinity +- nan -> NaN +- \\N -> NULL + +We introduced `sandbox tenant` to increase parallelism. Each test file now runs in parallel in its own sandbox environment that is separated from each other. The benefits of doing so include preventing a database or table from being dropped by mistake and significantly reducing test time. + +**Unsolved Issues​** + +We're still figuring out the most effective way to test a query that returns dynamic results. For example, the `Create_time` in the result returned from `SHOW TABLE STATUS`. + +**After the Switch** + +We're glad to see an efficiency improvement after going with sqllogictest-rs and this will benefit the entire Databend community. Our special thanks go to [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) for the great support, and everyone who has been involved. If you're also a fan of sqllogictest, stay tuned for more exciting news by visiting the following links: + +- [README](https://github.com/datafuselabs/databend/blob/main/tests/sqllogictests/README.md) +- [sqllogictest tracking](https://github.com/datafuselabs/databend/issues/9174) \ No newline at end of file