Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impl rest catalog + table updates & requirements #146

Merged
merged 29 commits into from
Jan 7, 2025

Conversation

jwtryg
Copy link
Contributor

@jwtryg jwtryg commented Sep 9, 2024

Hi @zeroshade

I think it's really cool that you are working on a golang-iceberg implementation, and I would like to contribute if I can.
I have tried to finish the rest catalog implementation, and then I have added table updates and requirements, using a new MetadataBuilder struct that can simplify updating metadata. Like the existing implementation, I have tried to stay close to the pyIceberg implementation.

I thought this could be a good starting point for also implementing transactions and table updates. I would love to get your input and change/adjust the implementation accordingly.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I like the idea in general. I've done a first pass to list a bunch of requested changes.

catalog/catalog.go Outdated Show resolved Hide resolved
catalog/catalog.go Outdated Show resolved Hide resolved
partitions.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Show resolved Hide resolved
type MetadataV1 struct {
Schema iceberg.Schema `json:"schema"`
type metadataV1 struct {
Schema *iceberg.Schema `json:"schema"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this change mean we'll have to perform nil checks everywhere we try to use the schema now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the metadataV1 is now unexported, so this shoulod be relatively few places. But yes, it does. However, if we assign schemas directly, vet will throw an error as we are copiyng a lock.

table/requirements.go Outdated Show resolved Hide resolved
table/requirements.go Outdated Show resolved Hide resolved
table/updates.go Show resolved Hide resolved
table/updates.go Outdated Show resolved Hide resolved
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next round of comments

catalog/rest.go Outdated
Comment on lines 87 to 90
type Identifier struct {
Namespace []string `json:"namespace"`
Name string `json:"name"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to export this type, we should probably name it RestIdentifier or something equivalent to separate it from other catalog identifier types.

catalog/rest.go Outdated
Comment on lines 549 to 643
Identifiers []struct {
Namespace []string `json:"namespace"`
Name string `json:"name"`
} `json:"identifiers"`
Identifiers []Identifier `json:"identifiers"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the identifier type used anywhere other than here? The reason I had done it inline here was because it was only used in this one spot and i didn't want it to get confused with table.Identifier

catalog/rest.go Outdated
Comment on lines 688 to 690
for k, v := range ret.Config {
config[k] = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why loop instead of just doing maps.Copy (which does the loop internally)

catalog/rest.go Outdated
Comment on lines 714 to 716
for k, v := range ret.Config {
config[k] = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why not maps.Copy(config, ret.Config)?

partitions.go Outdated
Comment on lines 120 to 124
// Fields returns a clone of the partition fields in this spec.
func (ps *PartitionSpec) Fields() []PartitionField {
return slices.Clone(ps.fields)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're okay with bumping to go1.23 we could use iter here and do slices.Values(ps.Fields) this way we don't have clone the entire slice but also maintain that we disallow users from modifying the slice.

Thus this function would be:

func (ps *PartitionSpec) Fields() iter.Seq[PartitionField] {
    return slices.Values(ps.fields)
}

and a user would be able to iterate over the fields:

for f := range spec.Fields() {
    // do something
}

Alternately, you could use slices.All if you want to preserve the index, value nature of the range

Comment on lines 596 to 603
func containsBy[S []E, E any](elems S, found func(e E) bool) bool {
for _, e := range elems {
if found(e) {
return true
}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with slices.ContainsFunc


// maxBy returns the maximum value of extract(e) for all e in elems.
// If elems is empty, returns 0.
func maxBy[S []E, E any](elems S, extract func(e E) int) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ~[]E for better coverage of generics and future proofing

}

func (c *commonMetadata) Ref() SnapshotRef { return c.SnapshotRefs[MainBranch] }
func (c *commonMetadata) Refs() map[string]SnapshotRef { return maps.Clone(c.SnapshotRefs) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use maps.All like i mentioned before for slices.All/slices.Values so that we can return an iterator without having to clone the whole map.

Comment on lines 689 to 690
func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) }
func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before about using iterators and slices.Values or slices.All

Comment on lines 692 to 694
if c == nil || other == nil {
return c == other
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what scenario is c nil?

@jwtryg
Copy link
Contributor Author

jwtryg commented Oct 7, 2024

Hi @zeroshade

I sincerely apologise for not getting back to you sooner, I suddenly found myself with very little time on my hands.
Once more, I thank you for your feedback - I will make sure to address all of your comments above this week!

@zeroshade
Copy link
Member

@jwtryg any updates?

@jwtryg
Copy link
Contributor Author

jwtryg commented Oct 27, 2024

@zeroshade thank you for your patience :)

I have implemented your feedback and made some other changes - specifically, being more mindful of what should be exported, and then undoing the manic nil-pointer checking I somehow got myself into.

@zeroshade
Copy link
Member

My apologies for the long delay here, could you resolve the conflicts? I should be able to give this a new pass of review in the next day or so.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a final few questions / nitpicks for after the conflicts are resolved.

table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
table/metadata.go Outdated Show resolved Hide resolved
@zeroshade
Copy link
Member

@jwtryg Is there anything else outstanding on this or is this ready for review again?

@jwtryg
Copy link
Contributor Author

jwtryg commented Jan 6, 2025

@jwtryg Is there anything else outstanding on this or is this ready for review again?

@zeroshade I will look through all of your previous comments today to make sure it is ready :)

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwtryg in general this looks good to me as-is and has been waiting for a while so I would like to get it in. The only thing that I think is missing would be unit tests for the requirements/updates, but I'd be okay leaving those for a future PR in the interests of getting this in, particularly for the rest catalog impls.

What do you think?

@zeroshade zeroshade merged commit 9c5f3e4 into apache:main Jan 7, 2025
10 checks passed
@chil-pavn
Copy link
Contributor

Part of #63 . @jwtryg let me know if you are already working on the unit tests so that we both don't end up doing the same.

@jwtryg
Copy link
Contributor Author

jwtryg commented Jan 8, 2025

@zeroshade that's super :)

@chil-pavn I have only just started - so feel free to go ahead. Will you only be working on the unit tests for the rest catalog operations, however?

@zeroshade
Copy link
Member

Btw, since both of you are being active on here, I'd love it either of you would be able and willing to give a look at the PRs I filed recently (#244 and #245) for more catalog functionality.

I'm currently trying to work towards full write implementations, piece by piece. :)

@chil-pavn
Copy link
Contributor

@jwtryg yes, i am intending to finish off rest catalog operations first as i already have a draft for the same. Once it is done, i could extend to others too.

@zeroshade sure, i would love to go through it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants