Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add baseline multiseries regressor #4246
Add baseline multiseries regressor #4246
Changes from 14 commits
ee5b9e6
004a1dd
3f45eb2
67cbd75
308c8bc
0f35eec
a00d25a
3f597b4
c179eeb
fc932d5
7b20f7f
c66b572
42f2a26
bab201f
6489510
e08f1c5
38eb434
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is correct, but just to double check we're returning the predictions with the predicted values stacked right (i.e. in series form and not as a dataframe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, this is a copypasta fail. We're returning the unstacked dataframe here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably another nit, but if you're calling out all Baseline Estimators...is it worth putting together a story to add a BaselineEstimator class to the inheritance chain and have them all inherit this prop def?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #4255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the multiseries case, do we not try to find the significant lags? Does this just use all or none of the lags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to find the significant lags because we're not actually doing feature engineering here, just getting the properly lagged column that our baseline regressor relies on. By setting the lags that we calculate to be just
self.start_delay
, we only compute the one we know we need.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to add an explicit comment here for the case we're splitting on...e.g. if y is a dataframe, we expect it to be multiseries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not going to be doing any external matching on this name format, right? If so, I think we might want to establish a pattern of making this string format like a module level thing or accessible via the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - adjusting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: should we just run
self._encode_y_while_preserving_index(y)
even though we won't expect categorical columns just yet?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 interesting point, will we ever expect categorical columns? We're only supporting regression problems for multiseries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially sometime in the distant future! Just thought it would make it one step easier for whoever implements that 😄 it'll be a no-op anyways right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea in theory, but I'm worried about increasing runtime with checking if
y
contains any categorical columns. We'd have to do so in all cases, which feels wasteful when we know we won't be dealing with it.