Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate datasource tests to insta #15258

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

shruti2522
Copy link
Contributor

@shruti2522 shruti2522 commented Mar 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 16, 2025
@alamb
Copy link
Contributor

alamb commented Mar 16, 2025

Thanks @shruti2522 !

It looks like there are some CI failures on this one so marking as draft for now

@alamb alamb marked this pull request as draft March 16, 2025 10:59
@@ -126,6 +126,7 @@ datafusion-physical-plan = { workspace = true }
datafusion-sql = { workspace = true }
flate2 = { version = "1.1.0", optional = true }
futures = { workspace = true }
insta = "1.42.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be use to use the workspace version

@shruti2522 shruti2522 force-pushed the insta_datasource branch 5 times, most recently from 1615551 to 394d444 Compare March 20, 2025 13:52
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@shruti2522
Copy link
Contributor Author

shruti2522 commented Mar 22, 2025

I have migrated all the tests for datasource to insta, except these two, I’m still working on these and will open a separate PR for it.

  1. physical_plan::json::tests::test_json_with_repartitioning::case_1_uncompressed : for this one, on using batches_to_string, only one case is failing and on using batches_to_sort_string all of them are failing.
---- datasource::physical_plan::json::tests::test_json_with_repartitioning::case_1_uncompressed stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: json_with_repartitioning
Source: datafusion/core/src/datasource/physical_plan/json.rs:582
────────────────────────────────────────────────────────────────────────────────
Expression: batches_to_string(&res)
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    0     0 │ +-----+------------------+---------------+------+
    1     1 │ | a   | b                | c             | d    |
    2     2 │ +-----+------------------+---------------+------+
          3 │+| 2   | [2.0, , -6.1]    | [false, ]     | text |
          4 │+|     |                  |               |      |
    3     5 │ | 1   | [2.0, 1.3, -6.1] | [false, true] | 4    |
    4     6 │ | -10 | [2.0, 1.3, -6.1] | [true, true]  | 4    |
    5       │-| 2   | [2.0, , -6.1]    | [false, ]     | text |
    6       │-|     |                  |               |      |
    7     7 │ +-----+------------------+---------------+------+
────────────┴───────────────────────────────────────────────────────────────────
  1. this one is failing because of LF (\n) and CR (\r) diff
---- datasource::physical_plan::csv::tests::test_create_external_table_with_terminator_with_newlines_in_values stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: create_external_table_with_terminator_with_newlines_in_values
Source: datafusion/core/src/datasource/physical_plan/csv.rs:591
────────────────────────────────────────────────────────────────────────────────
Expression: batches_to_string(&df)
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    0     0 │ +-------+-----------------------------+␊
    1     1 │ | col1  | col2                        |␊
    2     2 │ +-------+-----------------------------+␊
    3       │-| 1     | hello␊
          3 |+| 1     | hello␍
    4     4 │ world                 |␊
    5       │-| 2     | something␊
          5 │+| 2     | something␍
    6     6 │ else              |␊
    7       │-| 3     | ␊
    8       │-many␊
    9       │-lines␊
   10       │-make␊
   11       │-good test␊
          7 │+| 3     | ␍
          8 │+many␍
          9 │+lines␍
         10 │+make␍
         11 │+good test␍
   12    12 │  |␊
   13    13 │ | 4     | unquoted                    |␊
   14    14 │ | value | end                         |␊
   15    15 │ +-------+-----------------------------+
────────────┴───────────────────────────────────────────────────────────────────

cc @blaginin @alamb

@shruti2522 shruti2522 marked this pull request as ready for review March 22, 2025 06:54
"| 781 |",
"+--------------+"];
assert_batches_eq!(expected, &query_result);
println!("csv compressed:{}", batches_to_string(&query_result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shruti2522 @blaginin and @xudong963

I took the liberty of pushing a few of @blaginin 's suggestions and merging this branch up from main to get the tests passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants