-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Blockstore stats from config #8814
base: master
Are you sure you want to change the base?
Conversation
@@ -112,24 +112,39 @@ type Controller struct { | |||
|
|||
var usageCounter = stats.NewUsageCounter() | |||
|
|||
func NewController(cfg config.Config, catalog *catalog.Catalog, authenticator auth.Authenticator, authService auth.Service, authenticationService authentication.Service, blockAdapter block.Adapter, metadataManager auth.MetadataManager, migrator Migrator, collector stats.Collector, cloudMetadataProvider cloud.MetadataProvider, actions actionsHandler, auditChecker AuditChecker, logger logging.Logger, sessionStore sessions.Store, pathProvider upload.PathProvider, usageReporter stats.UsageReporterOperations) *Controller { | |||
func NewController( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was too damn long for a single line.
Removed cloudMetadataProvider
.
MetadataManager: metadataManager, | ||
Migrator: migrator, | ||
Collector: collector, | ||
CloudMetadataProvider: cloudMetadataProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only removed this one. The rest is whitespaces.
@@ -33,7 +32,24 @@ const ( | |||
extensionValidationExcludeBody = "x-validation-exclude-body" | |||
) | |||
|
|||
func Serve(cfg config.Config, catalog *catalog.Catalog, middlewareAuthenticator auth.Authenticator, authService auth.Service, authenticationService authentication.Service, blockAdapter block.Adapter, metadataManager auth.MetadataManager, migrator Migrator, collector stats.Collector, cloudMetadataProvider cloud.MetadataProvider, actions actionsHandler, auditChecker AuditChecker, logger logging.Logger, gatewayDomains []string, snippets []params.CodeSnippet, pathProvider upload.PathProvider, usageReporter stats.UsageReporterOperations) http.Handler { | |||
func Serve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was too damn long for a single line.
Removed cloudMetadataProvider
, and renamed middlewareAuthenticator
to authenticator
(not related, but simplifies).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide the style template you are using, if we change to something like this I need to break it down manually on every refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this to the list of topics for the Code-Style effort -
So this will be done systematically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, my main concern is inserting all the storage adapter logic into the stats library
@@ -27,7 +27,7 @@ type MetadataProvider interface { | |||
GetMetadata(context.Context) (map[string]string, error) | |||
} | |||
|
|||
func NewMetadata(ctx context.Context, logger logging.Logger, blockstoreType string, metadataProvider MetadataProvider, cloudMetadataProvider cloud.MetadataProvider) *Metadata { | |||
func NewMetadata(ctx context.Context, logger logging.Logger, metadataProvider MetadataProvider, storageConfig config.StorageConfig) *Metadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internals of the provider should stay externally, I would expect all this code to be in a storageAdapterMetadata provider. Maybe even provided by the block adapter factory.
If we're already here I would suggest replacing the current signature of this function.
func NewMetadata(ctx context.Context, logger logging.Logger, blockstoreType string, metadataProvider []MetadataProvider) *Metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is -
blockstoreType
should also be an array.
So it'll be a bunch of repeated code outside of this function.
Note that BuildMetadataProvider
in this file was also "aware" of blockstore logic -
It wasn't part of the NewMetadata()
function, but each NewMetadata()
caller was calling this function to pas its result as a param.
So I don't think that using the same trick gives much better encapsulation, only more repeated code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following, why should blockstore type be an array?
I know, it was done in order to not have circular dependency and not to build a factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guy-har I tried what you suggested, which is basically moving buildMetadataProvider
to the blockfactory
, so it can be implemented for Enterprise as well.
However, as you suspected, this creates a circular dependency -
Since blockfactory
is using the stats
package when building the block.Adapter
.
This is probably why buildMetadataProvider
is in this file in first place.
We can extract the metadata builder into a separate module (in modules
) that will be implement by Enterprise, and I think this can solve the circular dependencies (though it requires validation) - but I think that's a huge overkill for this.
Any other suggestions?
func getErrorMetadata() map[string]string { | ||
return map[string]string{ | ||
cloud.IDKey: "err", | ||
cloud.IDTypeKey: "err", | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's ErrorMetadata? Is it stats we want to collect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases we know the blockstore type, but failing to get the metadata from the cloud-provider.
In this case, we'll report err
to the BI (instead of nil
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as a product requirement, but from a functional pov -
You can check the tests here, and see that once it's nil
and not err
, the BI consumer will think it's not configured, instead of this being configured but failing.
I can return nil
here, but this also means we won't be able to unit-test these params.
@@ -33,7 +32,24 @@ const ( | |||
extensionValidationExcludeBody = "x-validation-exclude-body" | |||
) | |||
|
|||
func Serve(cfg config.Config, catalog *catalog.Catalog, middlewareAuthenticator auth.Authenticator, authService auth.Service, authenticationService authentication.Service, blockAdapter block.Adapter, metadataManager auth.MetadataManager, migrator Migrator, collector stats.Collector, cloudMetadataProvider cloud.MetadataProvider, actions actionsHandler, auditChecker AuditChecker, logger logging.Logger, gatewayDomains []string, snippets []params.CodeSnippet, pathProvider upload.PathProvider, usageReporter stats.UsageReporterOperations) http.Handler { | |||
func Serve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide the style template you are using, if we change to something like this I need to break it down manually on every refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guy-har thanks for your review.
Addressed your comments 🙏
@@ -27,7 +27,7 @@ type MetadataProvider interface { | |||
GetMetadata(context.Context) (map[string]string, error) | |||
} | |||
|
|||
func NewMetadata(ctx context.Context, logger logging.Logger, blockstoreType string, metadataProvider MetadataProvider, cloudMetadataProvider cloud.MetadataProvider) *Metadata { | |||
func NewMetadata(ctx context.Context, logger logging.Logger, metadataProvider MetadataProvider, storageConfig config.StorageConfig) *Metadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is -
blockstoreType
should also be an array.
So it'll be a bunch of repeated code outside of this function.
Note that BuildMetadataProvider
in this file was also "aware" of blockstore logic -
It wasn't part of the NewMetadata()
function, but each NewMetadata()
caller was calling this function to pas its result as a param.
So I don't think that using the same trick gives much better encapsulation, only more repeated code...
func getErrorMetadata() map[string]string { | ||
return map[string]string{ | ||
cloud.IDKey: "err", | ||
cloud.IDTypeKey: "err", | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases we know the blockstore type, but failing to get the metadata from the cloud-provider.
In this case, we'll report err
to the BI (instead of nil
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I wasn't convinced
We can take this offline if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we embed the cloud metadata into stats. I understand that the factory was already there (which I think we can move it out) but it looks like the author use an interface wanted to make the cloud metadata provided the same way stats metadata are provided.
The stats package should provide a abstract way to construct and invoke the metadata providers - for the setup process or the process startup while trying to keep the cloud metadata outside of the stats package.
@nopcoder can you please elaborate? There's already an abstract way for metadata providers, which is Which solution do you have in mind? |
Sorry, I didn't go over the comments first. Explain the real issue found in the metadata provider implementation, based on the current code (except aws which I'm not 100% sure it is the case) the code try to identify information about the cloud environment lakeFS is running on :) Seems the assumption that the default blockstore adapter will match the cloud provider and the cloud metadata will provide more information. |
@nopcoder If I get it right, you're blocking this because you think that the design of the existing code is problematic (which I agree), and that this PR adds to the problem instead of solving it (which I also agree). However, refactoring the stats code will take a decent amount time (since there is no suggested solution yet), and will introduce risks of messing up the important BI data (considering the fact that this area is not covered by tests at all). So if that's the case, I think we should take a different approach. Since MSB data isn't being reported at the moment, I suggest to first fix the Issue itself (using the non-optimal solution in this PR), and prioritize the refactoring using our normal process (according to the usual needs + risk + availability etc.). All this, of course, will be void if someone can come up with a nice solution for this with as low risk as possible for the stats that are currently being sent to the BI. |
I want to add that I have a suggestion, in a untested pr format, related to the cloud bi information that will check all cloud providers. also related to the block adapter itself I think it is a different case, not related to the cloud that we can discuss with bi. |
Currently, some blockstore-related stats () are constructed in a way which doesn't support more than a single blockstore type.
This extends it to allow multiple types.
This also simplifies NewMetadata params, and the code of its callers.