Ship PairwiseGP from Botorch into BoFire#768
Conversation
…ogate now correctly influence GP fitting
jduerholt
left a comment
There was a problem hiding this comment.
Hi Jim,
looks already really nice!
I let a few comments. I try to get the other PR on the Priors merged in today, and then you can merge main in again and also go over the comments.
Best,
Johannes
| ) | ||
| return self | ||
|
|
||
| @model_validator(mode="after") |
There was a problem hiding this comment.
Why do we need this? We do not have it anywhere else, or?
There was a problem hiding this comment.
I think, this is not needed and should also not be enforced here.
There was a problem hiding this comment.
yes you are right, will remove it.
| self.kernel, | ||
| batch_shape=torch.Size(), | ||
| active_dims=list(range(n_dim)), | ||
| features_to_idx_mapper=None, |
There was a problem hiding this comment.
Here we need to the stuff to the features_to_idx mapper to have feature dependent kernels. Have a look at the singletaskgp
There was a problem hiding this comment.
ok, I adjusted this. We now bring this in BoTorchSurrogate, and have done some refactoring as a result of it, you will see it later. Could remove some duplicated code!
| ) | ||
| fit_gpytorch_mll(mll, options=self.training_specs, max_attempts=50) | ||
|
|
||
| def _predict(self, transformed_X: pd.DataFrame): |
There was a problem hiding this comment.
Can we not get this from somewhere via inheritnce?
There was a problem hiding this comment.
yes you are right. will do this.
| "PairwiseGPSurrogate requires unique labcodes." | ||
| ) | ||
|
|
||
| if not isinstance(preferences, pd.DataFrame): |
There was a problem hiding this comment.
I would remove this, we are having this from the typing
There was a problem hiding this comment.
removed in next push
Resolved prior-constraint conflicts: adopted hotfix/noiseprior's generalized initial_value support across Positive/GreaterThan/LessThan, kept the concrete Interval type and PairwiseGP-related specs. Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
@Jimbo994: the stuff regarding the priors is merged. |
- #4: remove validate_scaler_features from PairwiseGPSurrogate -- it duplicated the validator already on TrainableBotorchSurrogate. - #6: add a `likelihood` field to PairwiseGPSurrogate ("probit" default, or "logit"). PairwiseGP has no noise prior to expose, but the probit/logit likelihood choice is now configurable. - #7: drop the PairwiseGPSurrogate._predict override and inherit it from BotorchSurrogate -- PairwiseGP.posterior ignores observation_noise, so the base implementation is equivalent. - #9: remove the runtime isinstance(preferences, pd.DataFrame) check; the fit() type hint already documents the contract. Comments #5 (features_to_idx_mapper) and #8 (labcode vs index) are parked for further discussion with the reviewer. Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
I did some more changes yesterday evening, you find it in |
Add `get_feature_indices` to the PairwiseTrainableSurrogate mixin and pass it as `features_to_idx_mapper` in `_fit_pairwise`, so feature-specific kernels (a collection of sub-kernels each acting on a subset of inputs) resolve their feature keys to datapoint-tensor columns. PairwiseGPSurrogate does not derive from TrainableBotorchSurrogate, so the method is defined on the mixin rather than inherited; it is structurally identical to TrainableBotorchSurrogate.get_feature_indices (noted in its docstring). Adds a mixed continuous/categorical feature-specific-kernel test. Co-Authored-By: Claude Opus 4.7 <[email protected]>
# Conflicts: # bofire/data_models/priors/interval.py # tests/bofire/data_models/specs/prior_constraints.py
…ces (#5) Per reviewer discussion, avoid duplicating `get_feature_indices`. It needs `inputs`, `categorical_encodings` and `engineered_features`; the first two are already on `BotorchSurrogate`, so `engineered_features` is hoisted to join them: - `engineered_features` field + `validate_aggregations` move from `TrainableSurrogate` to the `BotorchSurrogate` data model; the duplicate field declaration on `LinearDeterministicSurrogate` is removed. - functional `BotorchSurrogate.__init__` sets `engineered_features`; redundant assignments drop from `TrainableBotorchSurrogate`, `LinearDeterministicSurrogate` and `PairwiseGPSurrogate`. - `get_feature_indices` moves to functional `BotorchSurrogate`; the copies on `TrainableBotorchSurrogate` and the `PairwiseTrainableSurrogate` mixin are removed. `SingleTaskGP` and `PairwiseGP` now inherit one shared copy. This supersedes the mixin-local copy from commit bb4d1fd. The `CategoricalDeterministicSurrogate` spec gains the inherited `engineered_features` field. Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
@jduerholt this is ready for another round of review after the tests complete. |
jduerholt
left a comment
There was a problem hiding this comment.
Looks almost done, only thing missing besides my comments is to do proper registration in data_models/priors/_register.py
| return v | ||
|
|
||
| @model_validator(mode="after") | ||
| def validate_aggregations(self): |
There was a problem hiding this comment.
Can you rename this to validate_engineered_features? validate_aggregations is legacy.
There was a problem hiding this comment.
had to rename in linear_deterministic_surrogate too.
| inputs: Inputs | ||
| outputs: Outputs | ||
| predict: Callable[..., pd.DataFrame] | ||
| input_preprocessing_specs: InputTransformSpecs |
There was a problem hiding this comment.
Are the input_preprocessing_specs also provided from there? I do not think so. Have a look.
There was a problem hiding this comment.
Also check for categorical_encodings and the rest here.
There was a problem hiding this comment.
input_preprocessing_specs, and categorical_encodings, engineered_features are inherited from BotorchSurrogate and Surrogate. The scalar is now set in the PairwiseGPSurrogate.
So we place priors and constraints on the lengthscale and outputscale, which are related to the RBFKernel and ScaleKernel respectively, they should already be handled in the data_models/priors/_register.py. We don't place any constraints or priors on the likelihood so we need no further changes here? Or am I missing something? |
# Conflicts: # CHANGELOG.md
Motivation
This PR adds PairwiseGPSurrogate, a Gaussian process surrogate that infers a latent utility function from pairwise comparison data, wrapping BoTorch's PairwiseGP. It introduces a PairwiseTrainableSurrogate mixin with a fit(experiments, preferences) signature: experiments holds the candidate designs (tagged with a labcode), and preferences holds winner/loser pairs.
Supporting data-model additions were needed to mirror BoTorch's PairwiseGP defaults: a new SmoothedBoxPrior, a concrete instantiable Interval prior constraint, and an optional initial_value on the GreaterThan prior constraint (without it the kernel lengthscale initializes to 0 → non-PSD covariance).
Have you read the Contributing Guidelines on pull requests?
yes
Have you updated
CHANGELOG.md?yes
Test Plan