Skip to content

Conversation

MarcBresson
Copy link

Description

Add feature to encode class labels if they are not correct.

Current behaviour

from sklearn.datasets import make_classification
import numpy as np
from xgboost import XGBClassifier

X, y = make_classification(n_samples=100, n_features=20, n_informative=10, n_redundant=10, n_classes=3, random_state=42)

labels = np.array(["class 0", "class 1", "class 2"])
y_named = labels[y]
model = XGBClassifier()
model.fit(X, y_named)

error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], line 10
      8 y_named = labels[y]
      9 model = XGBClassifier()
---> 10 model.fit(X, y_named)

File ~/Documents/xgboost/.venv/lib/python3.13/site-packages/xgboost/core.py:729, in require_keyword_args.<locals>.throw_if.<locals>.inner_f(*args, **kwargs)
    727 for k, arg in zip(sig.parameters, args):
    728     kwargs[k] = arg
--> 729 return func(**kwargs)

File ~/Documents/xgboost/.venv/lib/python3.13/site-packages/xgboost/sklearn.py:1641, in XGBClassifier.fit(self, X, y, sample_weight, base_margin, eval_set, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights)
   1636     expected_classes = self.classes_
   1637 if (
   1638     classes.shape != expected_classes.shape
   1639     or not (classes == expected_classes).all()
   1640 ):
-> 1641     raise ValueError(
   1642         f"Invalid classes inferred from unique values of `y`.  "
   1643         f"Expected: {expected_classes}, got {classes}"
   1644     )
   1646 params = self.get_xgb_params()
   1648 if callable(self.objective):

ValueError: Invalid classes inferred from unique values of `y`.  Expected: [0 1 2], got ['class 0' 'class 1' 'class 2']

New behaviour

from sklearn.datasets import make_classification
import numpy as np
from xgboost import XGBClassifier

X, y = make_classification(n_samples=100, n_features=20, n_informative=10, n_redundant=10, n_classes=3, random_state=42)

labels = np.array(["class 0", "class 1", "class 2"])
y_named = labels[y]
model = XGBClassifier()
model.fit(X, y_named)

output None without error

@trivialfis trivialfis self-requested a review October 6, 2025 15:45
@trivialfis
Copy link
Member

Thank you for the feature addition. For the sklearn interface, we need to consider some other things for consistency:

  • Model serialization. Is the model still valid if it's saved and loaded?
  • Custom objective/metrics.

Related: #11256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants