Skip to content

Ship PairwiseGP from Botorch into BoFire#768

Merged
jduerholt merged 24 commits into
mainfrom
feature/pairwise
May 26, 2026
Merged

Ship PairwiseGP from Botorch into BoFire#768
jduerholt merged 24 commits into
mainfrom
feature/pairwise

Conversation

@Jimbo994
Copy link
Copy Markdown
Collaborator

Motivation

This PR adds PairwiseGPSurrogate, a Gaussian process surrogate that infers a latent utility function from pairwise comparison data, wrapping BoTorch's PairwiseGP. It introduces a PairwiseTrainableSurrogate mixin with a fit(experiments, preferences) signature: experiments holds the candidate designs (tagged with a labcode), and preferences holds winner/loser pairs.

Supporting data-model additions were needed to mirror BoTorch's PairwiseGP defaults: a new SmoothedBoxPrior, a concrete instantiable Interval prior constraint, and an optional initial_value on the GreaterThan prior constraint (without it the kernel lengthscale initializes to 0 → non-PSD covariance).

Have you read the Contributing Guidelines on pull requests?

yes

Have you updated CHANGELOG.md?

yes

Test Plan

  • New tests/bofire/surrogates/test_pairwise_gp.py (10 tests): fit + predict, latent-utility ranking recovery (Kendall-Tau), sign-convention symmetry (swapping A/B slots + flipping sign yields the identical fit), dumps/loads round-trip, and all validation errors (missing/duplicate labcode, missing preference columns, unknown labcodes, dropped ties).
  • Data-model specs added for PairwiseGPSurrogate (valid + 2 invalid), SmoothedBoxPrior, Interval, the updated GreaterThan, and a ScaleKernel with a SmoothedBoxPrior outputscale — all exercised by the existing serialization / deserialization round-trip suite.
  • pytest tests/bofire/data_models + surrogate suite: 1373 passed, ruff clean.
  • The new tutorial docs/tutorials/advanced_examples/pairwise_gp.qmd renders successfully with quarto render.

@Jimbo994 Jimbo994 requested a review from jduerholt May 20, 2026 21:22
Copy link
Copy Markdown
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jim,

looks already really nice!

I let a few comments. I try to get the other PR on the Priors merged in today, and then you can merge main in again and also go over the comments.

Best,

Johannes

Comment thread bofire/data_models/priors/interval.py
Comment thread bofire/data_models/surrogates/pairwise_gp.py
Comment thread bofire/data_models/surrogates/pairwise_gp.py
)
return self

@model_validator(mode="after")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? We do not have it anywhere else, or?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this is not needed and should also not be enforced here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right, will remove it.

Comment thread bofire/surrogates/pairwise_gp.py Outdated
self.kernel,
batch_shape=torch.Size(),
active_dims=list(range(n_dim)),
features_to_idx_mapper=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to the stuff to the features_to_idx mapper to have feature dependent kernels. Have a look at the singletaskgp

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I adjusted this. We now bring this in BoTorchSurrogate, and have done some refactoring as a result of it, you will see it later. Could remove some duplicated code!

Comment thread bofire/surrogates/pairwise_gp.py
Comment thread bofire/surrogates/pairwise_gp.py Outdated
)
fit_gpytorch_mll(mll, options=self.training_specs, max_attempts=50)

def _predict(self, transformed_X: pd.DataFrame):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not get this from somewhere via inheritnce?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be, or?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right. will do this.

Comment thread bofire/surrogates/pairwise_trainable.py
Comment thread bofire/surrogates/pairwise_trainable.py Outdated
"PairwiseGPSurrogate requires unique labcodes."
)

if not isinstance(preferences, pd.DataFrame):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this, we are having this from the typing

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in next push

jduerholt and others added 6 commits May 21, 2026 13:06
Resolved prior-constraint conflicts: adopted hotfix/noiseprior's generalized
initial_value support across Positive/GreaterThan/LessThan, kept the concrete
Interval type and PairwiseGP-related specs.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@jduerholt
Copy link
Copy Markdown
Contributor

@Jimbo994: the stuff regarding the priors is merged.

- #4: remove validate_scaler_features from PairwiseGPSurrogate -- it
  duplicated the validator already on TrainableBotorchSurrogate.
- #6: add a `likelihood` field to PairwiseGPSurrogate ("probit" default,
  or "logit"). PairwiseGP has no noise prior to expose, but the
  probit/logit likelihood choice is now configurable.
- #7: drop the PairwiseGPSurrogate._predict override and inherit it from
  BotorchSurrogate -- PairwiseGP.posterior ignores observation_noise, so
  the base implementation is equivalent.
- #9: remove the runtime isinstance(preferences, pd.DataFrame) check; the
  fit() type hint already documents the contract.

Comments #5 (features_to_idx_mapper) and #8 (labcode vs index) are parked
for further discussion with the reviewer.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@jduerholt
Copy link
Copy Markdown
Contributor

I did some more changes yesterday evening, you find it in main.

Jimbo994 and others added 3 commits May 22, 2026 14:15
Add `get_feature_indices` to the PairwiseTrainableSurrogate mixin and pass
it as `features_to_idx_mapper` in `_fit_pairwise`, so feature-specific
kernels (a collection of sub-kernels each acting on a subset of inputs)
resolve their feature keys to datapoint-tensor columns.

PairwiseGPSurrogate does not derive from TrainableBotorchSurrogate, so the
method is defined on the mixin rather than inherited; it is structurally
identical to TrainableBotorchSurrogate.get_feature_indices (noted in its
docstring).

Adds a mixed continuous/categorical feature-specific-kernel test.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
# Conflicts:
#	bofire/data_models/priors/interval.py
#	tests/bofire/data_models/specs/prior_constraints.py
…ces (#5)

Per reviewer discussion, avoid duplicating `get_feature_indices`. It needs
`inputs`, `categorical_encodings` and `engineered_features`; the first two
are already on `BotorchSurrogate`, so `engineered_features` is hoisted to
join them:

- `engineered_features` field + `validate_aggregations` move from
  `TrainableSurrogate` to the `BotorchSurrogate` data model; the duplicate
  field declaration on `LinearDeterministicSurrogate` is removed.
- functional `BotorchSurrogate.__init__` sets `engineered_features`;
  redundant assignments drop from `TrainableBotorchSurrogate`,
  `LinearDeterministicSurrogate` and `PairwiseGPSurrogate`.
- `get_feature_indices` moves to functional `BotorchSurrogate`; the copies
  on `TrainableBotorchSurrogate` and the `PairwiseTrainableSurrogate` mixin
  are removed. `SingleTaskGP` and `PairwiseGP` now inherit one shared copy.

This supersedes the mixin-local copy from commit bb4d1fd. The
`CategoricalDeterministicSurrogate` spec gains the inherited
`engineered_features` field.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@Jimbo994
Copy link
Copy Markdown
Collaborator Author

Jimbo994 commented May 22, 2026

@jduerholt this is ready for another round of review after the tests complete.

Copy link
Copy Markdown
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost done, only thing missing besides my comments is to do proper registration in data_models/priors/_register.py

Comment thread bofire/data_models/surrogates/pairwise_gp.py
Comment thread bofire/data_models/surrogates/pairwise_gp.py
return v

@model_validator(mode="after")
def validate_aggregations(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to validate_engineered_features? validate_aggregations is legacy.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to rename in linear_deterministic_surrogate too.

inputs: Inputs
outputs: Outputs
predict: Callable[..., pd.DataFrame]
input_preprocessing_specs: InputTransformSpecs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the input_preprocessing_specs also provided from there? I do not think so. Have a look.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check for categorical_encodings and the rest here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_preprocessing_specs, and categorical_encodings, engineered_features are inherited from BotorchSurrogate and Surrogate. The scalar is now set in the PairwiseGPSurrogate.

@Jimbo994
Copy link
Copy Markdown
Collaborator Author

Looks almost done, only thing missing besides my comments is to do proper registration in data_models/priors/_register.py

So we place priors and constraints on the lengthscale and outputscale, which are related to the RBFKernel and ScaleKernel respectively, they should already be handled in the data_models/priors/_register.py. We don't place any constraints or priors on the likelihood so we need no further changes here? Or am I missing something?

@jduerholt jduerholt merged commit c418a59 into main May 26, 2026
10 of 11 checks passed
@Jimbo994 Jimbo994 deleted the feature/pairwise branch May 26, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants