Skip to content

Migrate datasource tests to insta #15258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 27, 2025
Merged

Conversation

shruti2522
Copy link
Contributor

@shruti2522 shruti2522 commented Mar 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 16, 2025
@alamb
Copy link
Contributor

alamb commented Mar 16, 2025

Thanks @shruti2522 !

It looks like there are some CI failures on this one so marking as draft for now

@alamb alamb marked this pull request as draft March 16, 2025 10:59
@@ -126,6 +126,7 @@ datafusion-physical-plan = { workspace = true }
datafusion-sql = { workspace = true }
flate2 = { version = "1.1.0", optional = true }
futures = { workspace = true }
insta = "1.42.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be use to use the workspace version

@shruti2522 shruti2522 force-pushed the insta_datasource branch 5 times, most recently from 1615551 to 394d444 Compare March 20, 2025 13:52
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@github-actions github-actions bot added substrait Changes to the substrait crate datasource Changes to the datasource crate and removed substrait Changes to the substrait crate datasource Changes to the datasource crate labels Mar 20, 2025
@shruti2522
Copy link
Contributor Author

shruti2522 commented Mar 22, 2025

I have migrated all the tests for datasource to insta, except these two, I’m still working on these and will open a separate PR for it.

  1. physical_plan::json::tests::test_json_with_repartitioning::case_1_uncompressed : for this one, on using batches_to_string, only one case is failing and on using batches_to_sort_string all of them are failing.
---- datasource::physical_plan::json::tests::test_json_with_repartitioning::case_1_uncompressed stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: json_with_repartitioning
Source: datafusion/core/src/datasource/physical_plan/json.rs:582
────────────────────────────────────────────────────────────────────────────────
Expression: batches_to_string(&res)
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    0     0 │ +-----+------------------+---------------+------+
    1     1 │ | a   | b                | c             | d    |
    2     2 │ +-----+------------------+---------------+------+
          3 │+| 2   | [2.0, , -6.1]    | [false, ]     | text |
          4 │+|     |                  |               |      |
    3     5 │ | 1   | [2.0, 1.3, -6.1] | [false, true] | 4    |
    4     6 │ | -10 | [2.0, 1.3, -6.1] | [true, true]  | 4    |
    5       │-| 2   | [2.0, , -6.1]    | [false, ]     | text |
    6       │-|     |                  |               |      |
    7     7 │ +-----+------------------+---------------+------+
────────────┴───────────────────────────────────────────────────────────────────
  1. this one is failing because of LF (\n) and CR (\r) diff
---- datasource::physical_plan::csv::tests::test_create_external_table_with_terminator_with_newlines_in_values stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: create_external_table_with_terminator_with_newlines_in_values
Source: datafusion/core/src/datasource/physical_plan/csv.rs:591
────────────────────────────────────────────────────────────────────────────────
Expression: batches_to_string(&df)
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
    0     0 │ +-------+-----------------------------+␊
    1     1 │ | col1  | col2                        |␊
    2     2 │ +-------+-----------------------------+␊
    3       │-| 1     | hello␊
          3 |+| 1     | hello␍
    4     4 │ world                 |␊
    5       │-| 2     | something␊
          5 │+| 2     | something␍
    6     6 │ else              |␊
    7       │-| 3     | ␊
    8       │-many␊
    9       │-lines␊
   10       │-make␊
   11       │-good test␊
          7 │+| 3     | ␍
          8 │+many␍
          9 │+lines␍
         10 │+make␍
         11 │+good test␍
   12    12 │  |␊
   13    13 │ | 4     | unquoted                    |␊
   14    14 │ | value | end                         |␊
   15    15 │ +-------+-----------------------------+
────────────┴───────────────────────────────────────────────────────────────────

cc @blaginin @alamb

@shruti2522 shruti2522 marked this pull request as ready for review March 22, 2025 06:54
"| 781 |",
"+--------------+"];
assert_batches_eq!(expected, &query_result);
println!("csv compressed:{}", batches_to_string(&query_result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shruti2522 @blaginin and @xudong963

I took the liberty of pushing a few of @blaginin 's suggestions and merging this branch up from main to get the tests passing

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all, let's go

@xudong963 xudong963 merged commit 3af31ca into apache:main Mar 27, 2025
27 checks passed
qstommyshu pushed a commit to qstommyshu/datafusion that referenced this pull request Mar 28, 2025
* insta migrate

* Update datafusion/core/src/datasource/physical_plan/parquet.rs

Co-authored-by: Dmitrii Blaginin <[email protected]>

* More suggestions from code review

* Update snapshots

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* insta migrate

* Update datafusion/core/src/datasource/physical_plan/parquet.rs

Co-authored-by: Dmitrii Blaginin <[email protected]>

* More suggestions from code review

* Update snapshots

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants