TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly #507

VIVEK-MARRI · 2025-09-14T16:12:48Z

Problem

SettingWithCopyWarning appears when filling missing values in TabPFNClassifier.
Previous tests manually preprocessed NAs, so the classifier's internal handling of missing values was never validated.

Solution

Integrated proper NA handling directly in the classifier’s preprocessing pipeline.
Categorical/text columns are filled with 'missing'.
Numeric columns are filled with 0.
Updated tests/test_na_handling.py to pass raw data with NAs and added assertions using pytest to verify correct predictions.

Benefits

Prevents runtime warnings in pandas.
Ensures TabPFNClassifier handles datasets with missing values robustly.
Improves code stability and user experience.
Provides a proper automated test that will fail if NA handling breaks in the future.

Test

Ran python -m pytest tests/test_na_handling.py -v → test passed successfully, no errors or warnings.
Classifier trained and predicted on datasets with missing values without any manual preprocessing.

CLAassistant · 2025-09-14T16:12:55Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request aims to improve NA handling and prevent pandas' SettingWithCopyWarning. However, the implementation has some significant issues. A new preprocess_input function is added but remains unused within the classifier, making it dead code. Consequently, the core logic of the classifier's data handling is not changed. The accompanying test file, tests/test_na_handling.py, does not validate the classifier's ability to handle missing values internally, as it manually preprocesses the data before fitting. Furthermore, the test is structured as a script without assertions, which is not a robust way to verify functionality. My review includes suggestions to properly integrate the NA handling logic, improve the test structure, and fix code style issues.

gemini-code-assist · 2025-09-14T16:13:41Z

src/tabpfn/classifier.py

+def preprocess_input(X):
+    X = X.copy()
+
+    for col in X.columns:
+        if X[col].dtype == 'object':   # Categorical/text columns
+            X[col] = X[col].fillna('missing').astype(str)
+        else:                           # Numeric columns
+            X[col] = X[col].fillna(0).astype(float)
+
+    return X
+


This preprocess_input function is defined but never called within the TabPFNClassifier or elsewhere in the project. As a result, it is dead code and does not contribute to the NA handling logic. The classifier's existing preprocessing pipeline should be modified if the intent is to change how missing values are handled. I recommend removing this function until it is properly integrated.

tests/test_na_handling.py

src/tabpfn/classifier.py

VIVEK-MARRI · 2025-09-19T15:15:01Z

Integrated native NA handling into TabPFNClassifier (fit, predict, predict_proba) – no manual preprocessing required.

Added a dedicated pytest test (tests/test_na_handling.py) to validate that the classifier handles missing values internally.

Verified locally: all tests passed successfully (python -m pytest -v).

bejaeger · 2025-10-09T09:08:41Z

Hi @VIVEK-MARRI , thanks for the contribution. There seem to be a few things that can be improved in the PR. If you could have a look at the issues pointed out and also fix the styling issues it would be great. We will then be able to review. Thanks!

oscarkey · 2025-10-10T08:04:18Z

hey, thank you for this contribution! TabPFN should already handle missing values, so would it be possible for you to open an issue with a small example dataset that shows when this doesn't work for you? Then we can look into what's going on.

VIVEK-MARRI · 2025-10-10T13:41:41Z

Thanks for the feedback! I've opened an issue with a reproducible example as requested: #545 .

oscarkey · 2025-10-14T07:45:15Z

As discussed in #545 we think tabpfn is currentlying working as intended, so closing this for now.

Fix NA handling in TabPFNClassifier

5ebce78

gemini-code-assist bot reviewed Sep 14, 2025

View reviewed changes

Integrate NA handling into classifier and add robust pytest

3c3553b

VIVEK-MARRI changed the title ~~Fix NA handling: avoid pandas SettingWithCopyWarning and fill missing values safely~~ TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly Sep 14, 2025

Fix: handle NA values in TabPFNClassifier and add test

3c91ee3

VIVEK-MARRI requested a review from a team as a code owner October 9, 2025 14:31

VIVEK-MARRI requested review from oscarkey and removed request for a team October 9, 2025 14:31

oscarkey closed this Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly #507

TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly #507

Uh oh!

VIVEK-MARRI commented Sep 14, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Sep 14, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

VIVEK-MARRI commented Sep 19, 2025

Uh oh!

bejaeger commented Oct 9, 2025 •

edited

Loading

Uh oh!

oscarkey commented Oct 10, 2025

Uh oh!

VIVEK-MARRI commented Oct 10, 2025

Uh oh!

oscarkey commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly #507

TabPFNClassifier now natively handles missing values—no manual preprocessing needed, safer and more user-friendly #507

Uh oh!

Conversation

VIVEK-MARRI commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Benefits

Test

Uh oh!

CLAassistant commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

VIVEK-MARRI commented Sep 19, 2025

Uh oh!

bejaeger commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscarkey commented Oct 10, 2025

Uh oh!

VIVEK-MARRI commented Oct 10, 2025

Uh oh!

oscarkey commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VIVEK-MARRI commented Sep 14, 2025 •

edited

Loading

CLAassistant commented Sep 14, 2025 •

edited

Loading

bejaeger commented Oct 9, 2025 •

edited

Loading