-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heteroskedastic likelihoods and log-noise models #337
Conversation
This is a proof of concept of how heteroskedastic likelihoods may work.
This now supports also multi-task likelihoods. I subclassed the test_MultitaskHeteroskedasticLikelihood.ipynb.txt @gpleiss, @jacobrgardner, lmk what you think - if this looks generally ok, I'll start working on fixing the test failures / updating the examples. Having these expressive heteroskedastic noise models is of high practical relevance in cases when there are (joint) observations of the error covariance available. |
full_covar = AddedDiagLazyTensor(covar, log_noise_covar.exp()) | ||
else: | ||
# TODO: Poperly deal with non-diagonal noise covariance models | ||
full_covar = covar + log_noise_covar.exp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do full_covar = full_covar.add_jitter()
after this call so that we can use preconditioning.
This allows to do things like using a MTGP over Y and log_var_Y to get a noise model where the noise level is correlated with the function values, and then subsetting ot the noise estimate to use that in the likelihood of the primary GP.
So I made some good progress on this. The following nb shows how this can be used to build a joint model of both the actual tasks as well the outcomes and correlations between, using a Hadamard MTGP model, with a separate model of the same type stacked into the Heteroskedastic Likelihood module and trained on the observed log variances, providing heteroskedastic OOS measurement error estimates. Since we typically have measurement errors for all observations within each task, it would be nice if there was a way to exploit the Kronecker structure within each task, while allowing different covariate values across the tasks. test_HadamardMultitaskMultiOutputHeteroskedasticLikelihood_universal.ipynb.txt Making this work in batch mode for MTPGs will require changes to LazyTensors (#361) if we want to avoid a lot of hacking/special casing. Some of the handling of the input params in order to support the different modes is not very elegant (see https://github.com/cornellius-gp/gpytorch/pull/337/files#diff-52a9ec2599d1eaabfccb825b25c6b1eaR20). Welcoming ideas for how to make this cleaner. |
So this is in relatively good shape except for the case of batched Multi-Task GPs. Here I'm running into some serious problems making this work b/c the lazies are not able to handle multi-dimensional batches properly. See The proper way to fix this would be to support general batch shapes throughout all LazyTensors (as well as support broadcasting). This seems like like it would be the right thing to do if we want the LazyTensor framework to be more generally useful. |
@Balandat We're all definitely in agreement that We have plans to go back and do a larger refactor of the lazy tensors to make them support arbitrary batch dimensions, so the question is what to do in the meantime. |
@jacobrgardner No, unfortunately not. It turns out I actually need to do something smarter than just a Kronecker product to account for the different observation noise levels at the different data points. This requires
Is there a timeline for this? I'd be happy to contribute to this if it helps getting it off the ground. |
@gpleiss, @jacobrgardner I had to make some changes to the log_ deprecation to support the heteroskedastic likelihood setup, in the process I streamlined that getting rid of copy-pasted code. This works except for the proper handling of batches in all Multitask scenarios (train batch eval single, train single eval batch etc.) - I hope to address this right after Thanksgiving. I may even just have those edge cases error out for now with an information warning to use the previous implementation that I still retained in the code. |
@Balandat Sounds good! The deprecation code does look much cleaner to me, thanks. I would personally be fine with merging this in in a state that works for all models except batch multitask if that's where we are. The LT batch refactor is a larger thing that needs to happen anyways, and I think this already covers most common use cases (e.g. replacing WhiteNoiseKernel) |
Ok great, so I update the docstrings to indicate that Except form the multi-batch MTGP these are all non-breaking changes. I have some cool example notebooks for the heteroskedastic noise models and will put those up in a separate PR early next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just took a close pass through, and this all looks good to me! It might be good to eventually make an example notebook showing off the various new likelihood options.
Oh I see, haha. I was doing my code review while you posted that. If you already have notebooks you are planning on putting up, then I’m okay with this getting merged now for sure. |
Any update on the notebook for this? |
I haven't gotten to it (clearly). Will try to put something up in the next couple of weeks. |
@Balandat Is there something I can help to get an example for this? |
@t-vi , sorry this fell off the radar. Here is a simple nb that uses BoTorch's Here's how the model compares (visually) against the standard homoskedastic one: This model requires known observation noise levels. There is also work on implementing what's called the "most likely heteroskedastic GP" that does not, but that needs to be cleaned up a little bit (I have some local changes that hopefully I can push out some time in the near futuer): pytorch/botorch#250 |
Thank you for the quick reply! After I removed a few of the context managers (which I don't feel bad about) and the |
Great. I haven't looked at that one in a while. One thing to note about I don't think this is properly done in the earlier nb, though I'd have to go back to it and check in more detail. Depending on the setting, this might not be a big deal in practice if things work empirically, but I'm sure you could come up with examples where this would fail. |
Thank you @Balandat for the np's. |
@salrm8 Hmm something sounds off there, |
@Balandat , here is the code: |
@salrm8 your issue is the following block in the single task constructor:
The first positional arg of
things work. |
@Balandat Thanks a lot! It works now. |
Allows to specify generic (log-) noise models that are used to obtain out-of-sample noise estimates. This allows e.g. to stick a GP to be fit on the (log-) measured standard errors of observed data into the GaussianLikelihood, and then jointly fit that together with the GP to be fit on the data.