Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heteroskedastic likelihoods and log-noise models #337

Merged
merged 17 commits into from
Nov 22, 2018
Merged

Conversation

Balandat
Copy link
Collaborator

Allows to specify generic (log-) noise models that are used to obtain out-of-sample noise estimates. This allows e.g. to stick a GP to be fit on the (log-) measured standard errors of observed data into the GaussianLikelihood, and then jointly fit that together with the GP to be fit on the data.

This is a proof of concept of how heteroskedastic likelihoods may work.
@Balandat
Copy link
Collaborator Author

This now supports also multi-task likelihoods. I subclassed the MultitaskGaussianLikelihood from the _MultitaskGaussianLikelihoodBase as with the non-MT one, using a trivial MultitaskHomoskedasticNoise model. Interestingly, this both simplifies the code and reduces wall time compared to the Kronecker implementation (see attached nb [remove .txt file extension]).

test_MultitaskHeteroskedasticLikelihood.ipynb.txt

@gpleiss, @jacobrgardner, lmk what you think - if this looks generally ok, I'll start working on fixing the test failures / updating the examples.

Having these expressive heteroskedastic noise models is of high practical relevance in cases when there are (joint) observations of the error covariance available.

gpytorch/likelihoods/gaussian_likelihood.py Outdated Show resolved Hide resolved
gpytorch/likelihoods/gaussian_likelihood.py Outdated Show resolved Hide resolved
full_covar = AddedDiagLazyTensor(covar, log_noise_covar.exp())
else:
# TODO: Poperly deal with non-diagonal noise covariance models
full_covar = covar + log_noise_covar.exp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to do full_covar = full_covar.add_jitter() after this call so that we can use preconditioning.

gpytorch/likelihoods/homoskedastic_noise.py Outdated Show resolved Hide resolved
gpytorch/models/exact_gp.py Outdated Show resolved Hide resolved
gpytorch/likelihoods/__init__.py Outdated Show resolved Hide resolved
gpytorch/likelihoods/__init__.py Outdated Show resolved Hide resolved
gpytorch/lazy/lazy_tensor.py Outdated Show resolved Hide resolved
gpytorch/functions/__init__.py Show resolved Hide resolved
gpytorch/functions/__init__.py Outdated Show resolved Hide resolved
This allows to do things like using a MTGP over Y and log_var_Y to get a
noise model where the noise level is correlated with the function
values, and then subsetting ot the noise estimate to use that in the
likelihood of the primary GP.
@Balandat
Copy link
Collaborator Author

Balandat commented Nov 9, 2018

So I made some good progress on this.

The following nb shows how this can be used to build a joint model of both the actual tasks as well the outcomes and correlations between, using a Hadamard MTGP model, with a separate model of the same type stacked into the Heteroskedastic Likelihood module and trained on the observed log variances, providing heteroskedastic OOS measurement error estimates. Since we typically have measurement errors for all observations within each task, it would be nice if there was a way to exploit the Kronecker structure within each task, while allowing different covariate values across the tasks.

test_HadamardMultitaskMultiOutputHeteroskedasticLikelihood_universal.ipynb.txt

Making this work in batch mode for MTPGs will require changes to LazyTensors (#361) if we want to avoid a lot of hacking/special casing.

Some of the handling of the input params in order to support the different modes is not very elegant (see https://github.com/cornellius-gp/gpytorch/pull/337/files#diff-52a9ec2599d1eaabfccb825b25c6b1eaR20). Welcoming ideas for how to make this cleaner.

@Balandat Balandat self-assigned this Nov 9, 2018
@Balandat Balandat added the WIP label Nov 9, 2018
@Balandat
Copy link
Collaborator Author

So this is in relatively good shape except for the case of batched Multi-Task GPs. Here I'm running into some serious problems making this work b/c the lazies are not able to handle multi-dimensional batches properly. See _MultitaskGaussianLikelihoodBase for how I try to hack around this doing highly questionable things like calling .evaluate() unnecessarily. But even then, there are shape consistencies that pop up at runtime deep inside the LazyTensors.

The proper way to fix this would be to support general batch shapes throughout all LazyTensors (as well as support broadcasting). This seems like like it would be the right thing to do if we want the LazyTensor framework to be more generally useful.

@jacobrgardner
Copy link
Member

@Balandat We're all definitely in agreement that LazyTensor should handle arbitrary batch shapes. Do you think the two PRs you have for DiagLazyTensor (#362) and KroneckerProductLazyTensor (#369) will be sufficient for this setting?

We have plans to go back and do a larger refactor of the lazy tensors to make them support arbitrary batch dimensions, so the question is what to do in the meantime.

@Balandat
Copy link
Collaborator Author

Do you think the two PRs you have for DiagLazyTensor (#362) and KroneckerProductLazyTensor (#369) will be sufficient for this setting?

@jacobrgardner No, unfortunately not. It turns out I actually need to do something smarter than just a Kronecker product to account for the different observation noise levels at the different data points. This requires BlockDiagLazyTensor and MatmulLazyTensor to work properly. I'm not entirely sure what's going on in the batch MT tests, maybe you can take a look to see if there is a quick and dirty short-term fix?

We have plans to go back and do a larger refactor of the lazy tensors to make them support arbitrary batch dimensions.

Is there a timeline for this? I'd be happy to contribute to this if it helps getting it off the ground.

@Balandat
Copy link
Collaborator Author

@gpleiss, @jacobrgardner I had to make some changes to the log_ deprecation to support the heteroskedastic likelihood setup, in the process I streamlined that getting rid of copy-pasted code.

This works except for the proper handling of batches in all Multitask scenarios (train batch eval single, train single eval batch etc.) - I hope to address this right after Thanksgiving. I may even just have those edge cases error out for now with an information warning to use the previous implementation that I still retained in the code.

@jacobrgardner
Copy link
Member

@Balandat Sounds good! The deprecation code does look much cleaner to me, thanks.

I would personally be fine with merging this in in a state that works for all models except batch multitask if that's where we are. The LT batch refactor is a larger thing that needs to happen anyways, and I think this already covers most common use cases (e.g. replacing WhiteNoiseKernel)

@Balandat
Copy link
Collaborator Author

Ok great, so I update the docstrings to indicate that MultitaskGaussianLikelihoodKronecker is scheduled to be deprecated and should only be used for batched MTGP for the time being. Everything else can use the standard likelihoods as is, which are now implemented based on the generalized Likelihood setup using a trivial noise model.

Except form the multi-batch MTGP these are all non-breaking changes.

I have some cool example notebooks for the heteroskedastic noise models and will put those up in a separate PR early next week.

Copy link
Member

@jacobrgardner jacobrgardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a close pass through, and this all looks good to me! It might be good to eventually make an example notebook showing off the various new likelihood options.

@jacobrgardner
Copy link
Member

Oh I see, haha. I was doing my code review while you posted that. If you already have notebooks you are planning on putting up, then I’m okay with this getting merged now for sure.

@jacobrgardner jacobrgardner merged commit 2ba63d6 into master Nov 22, 2018
@jacobrgardner jacobrgardner deleted the heteroskedastic branch November 22, 2018 16:55
@admercs
Copy link

admercs commented Apr 8, 2019

Any update on the notebook for this?

@Balandat
Copy link
Collaborator Author

Balandat commented Apr 9, 2019

I haven't gotten to it (clearly). Will try to put something up in the next couple of weeks.

@t-vi
Copy link

t-vi commented Jan 30, 2020

@Balandat Is there something I can help to get an example for this?

@Balandat
Copy link
Collaborator Author

@t-vi , sorry this fell off the radar.

Here is a simple nb that uses BoTorch's HeteroskedasticSingleTaskGP for this:
demo_HeteroskedasticSingleTaskGP.ipynb.txt

Here's how the model compares (visually) against the standard homoskedastic one:
heteroskedastic_example

This model requires known observation noise levels. There is also work on implementing what's called the "most likely heteroskedastic GP" that does not, but that needs to be cleaned up a little bit (I have some local changes that hopefully I can push out some time in the near futuer): pytorch/botorch#250

@t-vi
Copy link

t-vi commented Jan 30, 2020

Thank you for the quick reply! After I removed a few of the context managers (which I don't feel bad about) and the log_scale keyword argument (which is more scary), the heteroskedastic multitask notebook you posted here earlier seemed to work reasonably well, too.

@Balandat
Copy link
Collaborator Author

Great. I haven't looked at that one in a while. One thing to note about HeteroskedasticSingleTaskGP in the nb I just posted is that it uses the AddedLossTerm abstraction to make sure that the likelihood term of the noise model is properly accounted for when computing the mll of the full model: https://github.com/pytorch/botorch/blob/master/botorch/models/gp_regression.py#L329-L332

I don't think this is properly done in the earlier nb, though I'd have to go back to it and check in more detail. Depending on the setting, this might not be a big deal in practice if things work empirically, but I'm sure you could come up with examples where this would fail.

@salrm8
Copy link

salrm8 commented Feb 4, 2020

Thank you @Balandat for the np's.
Before seeing your new np for HeteroskedasticSingleTaskGP, I tried to rewrite your previous np HeteroskedasticSingleTaskGP for a single task case. However, when trying to optimize the model hyperparameters, this message shows up: AttributeError: 'ConstantMean' object has no attribute 'log_prob'. Is it possible to resolve the issue? Thanks a lot!

@Balandat
Copy link
Collaborator Author

Balandat commented Feb 4, 2020

@salrm8 Hmm something sounds off there, log_prob should never be called on ConstantMean. Do you have a full repro (i.e. notebook) for this that you can share?

@salrm8
Copy link

salrm8 commented Feb 4, 2020

@Balandat , here is the code:
https://github.com/salrm8/gprTests/blob/master/gpr_hetero_SingleMulti.py
At the bottom of the code, we call both multi- and single-task GPs. The latter does not work.
Here is the error message:
https://github.com/salrm8/gprTests/blob/master/errMsg.txt
Thank you for your help.

@Balandat
Copy link
Collaborator Author

Balandat commented Feb 9, 2020

@salrm8 your issue is the following block in the single task constructor:

self.mean_module = gpytorch.means.ConstantMean(
    gpytorch.means.ConstantMean()
)

The first positional arg of ConstantMean is prior, so you register a mean module as a prior here, which messes things up in the MLL computation. If you remove this and just do

self.mean_module = gpytorch.means.ConstantMean()

things work.

@salrm8
Copy link

salrm8 commented Feb 14, 2020

@Balandat Thanks a lot! It works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants