Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder #49304

Closed
wants to merge 41 commits into from

Conversation

anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented Dec 27, 2024

What changes were proposed in this pull request?

Move virt col family related mapping into db layer instead of encoder

Why are the changes needed?

Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:

  • encoder is only responsible for managing encoding based on type such as noPrefix, prefix, range etc
  • the onus of maintaining virtual col families is now with the underlying DB layer
  • this layer can now also expose metrics for internal as well as non-internal column families

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing Unit tests and added unit tests

[info] Run completed in 8 minutes, 48 seconds.
[info] Total number of tests run: 305
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 305, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db marked this pull request as ready for review December 28, 2024 06:54
@anishshri-db
Copy link
Contributor Author

cc - @ericm-db @jingz-db - PTAL, thx !

@anishshri-db anishshri-db changed the title [SPARK-50655][SS] Move virt col family related mapping into db layer instead of encoder [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder Jan 2, 2025
@anishshri-db
Copy link
Contributor Author

Would you mind fixing conflicts? Thanks!

@HeartSaVioR - done ! PTAL

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue with reviewing tests, but I see something to be really careful.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done another round of review.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master/4.0.

HeartSaVioR pushed a commit that referenced this pull request Feb 19, 2025
…er instead of encoder

### What changes were proposed in this pull request?
Move virt col family related mapping into db layer instead of encoder

### Why are the changes needed?
Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:
- encoder is only responsible for managing encoding based on type such as noPrefix, prefix, range etc
- the onus of maintaining virtual col families is now with the underlying DB layer
- this layer can now also expose metrics for internal as well as non-internal column families

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing Unit tests and added unit tests

```
[info] Run completed in 8 minutes, 48 seconds.
[info] Total number of tests run: 305
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 305, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49304 from anishshri-db/task/SPARK-50655.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit 5759882)
Signed-off-by: Jungtaek Lim <[email protected]>
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
…er instead of encoder

### What changes were proposed in this pull request?
Move virt col family related mapping into db layer instead of encoder

### Why are the changes needed?
Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:
- encoder is only responsible for managing encoding based on type such as noPrefix, prefix, range etc
- the onus of maintaining virtual col families is now with the underlying DB layer
- this layer can now also expose metrics for internal as well as non-internal column families

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing Unit tests and added unit tests

```
[info] Run completed in 8 minutes, 48 seconds.
[info] Total number of tests run: 305
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 305, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#49304 from anishshri-db/task/SPARK-50655.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
kazemaksOG pushed a commit to kazemaksOG/spark-custom-scheduler that referenced this pull request Mar 27, 2025
…er instead of encoder

### What changes were proposed in this pull request?
Move virt col family related mapping into db layer instead of encoder

### Why are the changes needed?
Keep abstraction clear around ownership and also expose internal/non-internal key metrics correctly.
With this change, we have the following:
- encoder is only responsible for managing encoding based on type such as noPrefix, prefix, range etc
- the onus of maintaining virtual col families is now with the underlying DB layer
- this layer can now also expose metrics for internal as well as non-internal column families

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing Unit tests and added unit tests

```
[info] Run completed in 8 minutes, 48 seconds.
[info] Total number of tests run: 305
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 305, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#49304 from anishshri-db/task/SPARK-50655.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants