Skip to content

ValueError: The output generated by func have different column names than the ones provided by get_feature_names_out. Got output with columns names: ['x0', 'x1', 'x2', 'x3', 'x4'] #43

Description

@emmanuel-contreras

Hello, I am running into the error below trying to run the TPM example provided in the docstrings.

ValueError: The output generated by `func` have different column names than the ones provided by `get_feature_names_out`. 
Got output with columns names: ['x0', 'x1', 'x2', 'x3', 'x4'] and 
`get_feature_names_out` returned: ['Gene_1', 'Gene_2', 'Gene_3', 'Gene_4', 'Gene_5'].
 The column names can be overridden by setting `set_output(transform='pandas')` or 
`set_output(transform='polars')` such that the column names are set to the names provided by `get_feature_names_out`.

This is the code I am running from the example

from rnanorm.datasets import load_toy_data
from rnanorm import TPM
dataset = load_toy_data()
dataset.exp
#          Gene_1  Gene_2  Gene_3  Gene_4  Gene_5
#Sample_1     200     300     500    2000    7000
#Sample_2     400     600    1000    4000   14000
#Sample_3     200     300     500    2000   17000
#Sample_4     200     300     500    2000    2000
tpm = TPM(gtf=dataset.gtf_path).set_output(transform="pandas")
tpm.fit_transform(dataset.exp)

I also tried running the example code from this issue #20 which produces the same error message

from rnanorm import TPM
import pandas as pd
df = pd.DataFrame([[200, 400, 400], [300, 300, 800]], index=["Sample1", "Sample2"], columns=["Gene1", "Gene2", "Gene3"])
gene_lengths = pd.Series([100, 100, 200], index=["Gene1", "Gene2", "Gene3"])
df
#          Gene1  Gene2  Gene3
# Sample1    200    400    400
# Sample2    300    300    800

# In [6]: gene_lengths
# Gene1    100
# Gene2    100
# Gene3    200
# dtype: int64

TPM(gene_lengths=gene_lengths).set_output(transform="pandas").fit_transform(df)
# Out[7]:
#             Gene1     Gene2     Gene3
# Sample1  250000.0  500000.0  250000.0
# Sample2  300000.0  300000.0  400000.0

The error happens when running tpm.fit_transform(dataset.exp) this is on a new conda environment with python 3.13, pandas 2.3.2 rnanorm 2.2.0, sklearn 1.7.2, and as you can see, even having set_output(transform="pandas") the error occurs.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions