ENH: adds columnwise fillna support #62393

a18rhodes · 2025-09-21T04:52:39Z

Adds support for columnwise fillna.

Only supports homogeneous column and replacement value dtypes, raises ValueError otherwise. Tests added.

closes column-wise fillna with Series/dict NotImplemented #4514
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

a18rhodes · 2025-09-22T00:20:31Z

This is my first PR on an open source project, so I'm still a little unfamiliar with the work flow.

I may have merged in too many changes by mistake; I intended to only update generic.py, the relevant tests, and the what's new doc. If I need to update anything to reduce the scope I'm happy to make whatever changes are needed.

WillAyd · 2025-09-22T23:17:04Z

Hey @a18rhodes no worries - we can fix this up.

Looking at the commit history, I think you accidentally picked up commits from other users, which is causing the extra files to show as modified. Here are the errant commit hashes:

e817930
2b25842
08d21d7

Essentially you will want to remove those from your branch. This link shows two ways of achieving that:

https://www.abrahamberg.com/blog/git-remove-commits-from-branch-after-push-reset-revert-or-rebase/

I'll leave it up to you to try either approach.

Once you have those commits fixed, you will want to pull the changes from the main branch of pandas, which you can do with:

git remote add upstream [email protected]:pandas-dev/pandas.git  # only need to do once, can also use https
git pull upstream main

while on your working branch.

In terms of resolving any conflicts, just try to play close attention to what git is telling you. It can definitely be intimidating, but usually a conflict needs your input. git will tell you "hey you made changes X,Y,Z while another user made different changes to that same file." There is never a single answer on how to resolve that, so just give it your best shot and if you have questions you can highlight them here

a18rhodes · 2025-09-23T02:22:06Z

Thanks @WillAyd! I think I resolved the issue, it looks like I have the correct 3 files now and have dropped the other user's commits.

WillAyd

Nice work overall

pandas/core/generic.py

a18rhodes · 2025-09-24T05:21:57Z

I did some benchmarking to find the best approach. As it turns out, frames with nrows >> ncols perform better with apply, otherwise, the Transpose approach is better (@kdebrab pointed out the performance issues of the Transpose approach in the issue with the apply approach as the solution).

import pandas as pd
import numpy as np

def approach_transpose(df_to_fill, value):
    """vectorized approach using transpose."""
    return df_to_fill.T.fillna(value=value).T

def approach_apply(df_to_fill, value):
    """DataFrame.apply() on columns (axis=0)."""
    return df_to_fill.apply(lambda col: col.fillna(value=value))

for num_rows, num_cols in ((10, 10_000), (1_000, 1_000), (10_000, 10)):
    print(f"Setting up sample data for {num_rows:_}x{num_cols:_}...")
    data = np.random.rand(num_rows, num_cols)
    data[data < 0.25] = np.nan  # Make 25% of the data NaN
    df = pd.DataFrame(data)
    update_data = {i: float(i) for i in range(num_rows)}  # just use the row number for fill
    print(f"\tCreated a {num_rows:_}x{num_cols:_} DataFrame and a fill dictionary.")
    print(f"\t----Testing the .T (transpose) approach on {num_rows:_}x{num_cols:_}----", end="\n\t")
    %timeit c = approach_transpose(df, update_data)
    print(f"\t----Testing the df.apply() approach on {num_rows:_}x{num_cols:_}----", end="\n\t")
    %timeit c = approach_apply(df, update_data)
    print()

Setting up sample data for 10x10_000...
        Created a 10x10_000 DataFrame and a fill dictionary.
        ----Testing the .T (transpose) approach on 10x10_000----
        2.23 ms ± 226 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        ----Testing the df.apply() approach on 10x10_000----
        1.8 s ± 44.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Setting up sample data for 1_000x1_000...
        Created a 1_000x1_000 DataFrame and a fill dictionary.
        ----Testing the .T (transpose) approach on 1_000x1_000----
        147 ms ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        ----Testing the df.apply() approach on 1_000x1_000----
        296 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Setting up sample data for 10_000x10...
        Created a 10_000x10 DataFrame and a fill dictionary.
        ----Testing the .T (transpose) approach on 10_000x10----
        1.88 s ± 78.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        ----Testing the df.apply() approach on 10_000x10----
        13.5 ms ± 273 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

WillAyd · 2025-09-24T14:06:05Z

pandas/core/generic.py

+                        f"(value.dtype={value_dtype} vs {dtypes[0]})"
+                    )
+                nrows, ncols = self.shape
+                if nrows > 1000 * ncols:


I appreciate what you are trying to do here but this heuristic is pretty arbitrary; its better to just keep with the transpose method (at smaller data volumes, the difference is likely negligible anyway)

Ah I posted this before seeing your benchmarks. Even still, I don't think its that important to optimize for extremely wide dataframes like that. We can leave that to another PR if we even cared for it

Fair point, as you mentioned, the heuristic is pretty arbitrarily based on my machine performance, etc.

I'll update using just the transpose method.

WillAyd

Generally this lgtm @mroeschke any feedback?

WillAyd · 2025-09-24T15:27:10Z

pandas/tests/frame/methods/test_fillna.py

+        df.fillna(df.max(axis=1), axis=1, inplace=True)
+        expected = DataFrame(
+            {
+                "a": [1.0, 1, 2, 3, 4],


Ultra nit but can you represent these all as float values? I had to double take why only the first row of data would be floats, but in actuality all of the data is float and this just relies on implicit conversion.

I think it would be less confusing to just label all values as 1.0, 2.0, 3.0, 3.0, 4.0

Good point, not sure why I got lazy on this of all things. I will update it.

mroeschke · 2025-09-24T17:26:27Z

pandas/core/generic.py

+                dtypes = [result[col].dtype for col in result.columns]
+                if len(set(dtypes)) > 1:
+                    raise ValueError(
+                        "All columns must have the same dtype, but got dtypes: "
+                        f"{dict(zip(result.columns, dtypes))}"


Suggested change

dtypes = [result[col].dtype for col in result.columns]

if len(set(dtypes)) > 1:

raise ValueError(

"All columns must have the same dtype, but got dtypes: "

f"{dict(zip(result.columns, dtypes))}"

unique_dtypes = np.unique(self._mgr.get_dtypes())

if len(unique_dtypes) > 1:

raise ValueError(

"All columns must have the same dtype, but got dtypes: "

f"{list(unique_dtypes)}"

sorry, not sure how I missed this suggestion. will implement in my next commit.

On further inspection, the np.unique approach fails for non-homogeneous column dtypes due to"<" comparsion between mixed types (see test "test_fillna_dict_series_axis_1_mismatch_cols").

np.unique approach fails

Pandas has its own unique function, check if it works.

see test "test_fillna_dict_series_axis_1_mismatch_cols"

When you amend and force push, makes it hard to see previous CI runs results.

I'll give that a shot.

Yeah, that's a good point, I've been trying to keep a clean commit history since I have had so many, but it probably adds to the confusion in some cases. That said, I never committed the failing code, just pointing out a new unit test I added that fails when trying np.unique.

mroeschke · 2025-09-24T17:27:48Z

pandas/core/generic.py

+                if (value_dtype := np.asarray(value).dtype) != dtypes[0]:
+                    raise ValueError(
+                        "Dtype mismatch for value "
+                        f"(value.dtype={value_dtype} vs {dtypes[0]})"
+                    )


Can you try removing this case and seeing if a similar validation still hits? I'm hopping fillna after the transpose enforces this.

Checked in to this. Dropping this check results in no fill happening at all, rather than an exception of any type being raised.
(Note these results are without the check in question)

def test_fillna_dict_series_axis_1_value_mismatch_with_cols(self): df = DataFrame( { "a": [np.nan, 1, 2, np.nan, np.nan], "b": [1, 2, 3, np.nan, np.nan], "c": [np.nan, 1, 2, 3, 4], } ) with pytest.raises(Exception): print() print(df.fillna(Series({"a": "abc", "b": "def", "c": "hij"}), axis=1))

pytest ./pandas/tests/frame/methods/test_fillna.py --ff -x + /root/virtualenvs/pandas-dev/bin/ninja [1/1] Generating write_version_file with a custom command ==================================== test session starts ==================================== platform linux -- Python 3.11.13, pytest-8.4.2, pluggy-1.6.0 PyQt5 5.15.11 -- Qt runtime 5.15.17 -- Qt compiled 5.15.14 rootdir: /workspaces/pandas configfile: pyproject.toml plugins: hypothesis-6.139.2, localserver-0.9.0.post0, anyio-4.10.0, qt-4.5.0, cov-7.0.0, xdist-3.8.0, cython-0.3.1 collected 65 items run-last-failure: rerun previous 1 failure first pandas/tests/frame/methods/test_fillna.py a b c 0 NaN 1.0 NaN 1 1.0 2.0 1.0 2 2.0 3.0 2.0 3 NaN NaN 3.0 4 NaN NaN 4.0 F ========================================= FAILURES ========================================== ____________ TestFillNA.test_fillna_dict_series_axis_1_value_mismatch_with_cols _____________ self = <pandas.tests.frame.methods.test_fillna.TestFillNA object at 0x70da9893a790> def test_fillna_dict_series_axis_1_value_mismatch_with_cols(self): df = DataFrame( { "a": [np.nan, 1, 2, np.nan, np.nan], "b": [1, 2, 3, np.nan, np.nan], "c": [np.nan, 1, 2, 3, 4], } ) > with pytest.raises(Exception): E Failed: DID NOT RAISE <class 'Exception'> pandas/tests/frame/methods/test_fillna.py:503: Failed ------------------- generated xml file: /workspaces/pandas/test-data.xml -------------------- =================================== slowest 30 durations ==================================== 0.03s call pandas/tests/frame/methods/test_fillna.py::TestFillNA::test_fillna_dict_series_axis_1_value_mismatch_with_cols 0.01s teardown pandas/tests/frame/methods/test_fillna.py::TestFillNA::test_fillna_dict_series_axis_1_value_mismatch_with_cols (1 durations < 0.005s hidden. Use -vv to show these durations.) ================================== short test summary info ================================== FAILED pandas/tests/frame/methods/test_fillna.py::TestFillNA::test_fillna_dict_series_axis_1_value_mismatch_with_cols - Failed: DID NOT RAISE <class 'Exception'> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ===================================== 1 failed in 0.62s =====================================

cc @jbrockmendel do you know of a function that achieves something equivalent to the check above (maybe can_hold_element?

yah can_hold_element seems like the right thing here (though i also like the idea of "just let it fall through"). Potential trouble is that for EAs can_hold_element always returns True.

doc/source/whatsnew/v3.0.0.rst

a18rhodes requested a review from mroeschke as a code owner September 21, 2025 21:34

a18rhodes force-pushed the fillna-colwise-4515 branch from c0044cf to 9fde541 Compare September 23, 2025 02:14

a18rhodes force-pushed the fillna-colwise-4515 branch from 9fde541 to 5380528 Compare September 23, 2025 02:37

WillAyd requested changes Sep 23, 2025

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

a18rhodes force-pushed the fillna-colwise-4515 branch 2 times, most recently from 94ce8be to f3d404f Compare September 24, 2025 05:03

WillAyd requested changes Sep 24, 2025

View reviewed changes

a18rhodes force-pushed the fillna-colwise-4515 branch from f3d404f to 9974aad Compare September 24, 2025 14:21

WillAyd reviewed Sep 24, 2025

View reviewed changes

a18rhodes force-pushed the fillna-colwise-4515 branch from 9974aad to e5c55a5 Compare September 24, 2025 16:51

mroeschke reviewed Sep 24, 2025

View reviewed changes

mroeschke reviewed Sep 27, 2025

View reviewed changes

doc/source/whatsnew/v3.0.0.rst Outdated Show resolved Hide resolved

a18rhodes force-pushed the fillna-colwise-4515 branch from e5c55a5 to 6abc1ff Compare September 28, 2025 02:09

a18rhodes added 2 commits September 28, 2025 18:59

Addresses pandas-dev#4514 - adds columnwise fillna.

9be575a

update to use can_hold_element instead of naive exact dtype matching

54f5395

a18rhodes force-pushed the fillna-colwise-4515 branch from 6abc1ff to 54f5395 Compare September 28, 2025 19:23

Uh oh!

ENH: adds columnwise fillna support #62393

Are you sure you want to change the base?

ENH: adds columnwise fillna support #62393

Uh oh!

Conversation

a18rhodes commented Sep 21, 2025

Uh oh!

a18rhodes commented Sep 22, 2025

Uh oh!

WillAyd commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a18rhodes commented Sep 23, 2025

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a18rhodes commented Sep 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WillAyd commented Sep 22, 2025 •

edited

Loading