Error when running SemiBin2 (normalization?) #159

eperezv · 2024-03-08T12:35:21Z

Hello,

I'm running SemiBin2 to my dataset with the multi_easy_bin option. Everything seemed to work properly until it failed with something related to normalization. Any idea of the issue cause and/or how to address it?

Thank you

(SemiBin) eduardo@eduardo-PC:/data$ SemiBin2 multi_easy_bin -i contigs.flt.fna -b mapped/*.sort.bam -o semibin2_output --separator _ -p 18
[2024-03-08 10:19:33,306] INFO: Binning for short_read
[2024-03-08 10:19:33,306] INFO: SemiBin will run in self supervised mode
[2024-03-08 10:19:34,370] INFO: Running with GPU.
[2024-03-08 10:19:34,370] INFO: Performing multi-sample binning
[2024-03-08 10:19:34,371] INFO: Generating training data...
[2024-03-08 10:20:17,377] INFO: Calculating coverage for every sample.
[2024-03-08 11:31:04,311] INFO: Processed: mapped/C101.sort.bam
[2024-03-08 11:37:05,271] INFO: Processed: mapped/C102.sort.bam
[2024-03-08 11:37:05,272] INFO: Processed: mapped/C103.sort.bam
[2024-03-08 11:37:05,272] INFO: Processed: mapped/C111.sort.bam
[2024-03-08 11:37:05,272] INFO: Processed: mapped/C112.sort.bam
[2024-03-08 11:37:05,272] INFO: Processed: mapped/C113.sort.bam
[2024-03-08 11:37:05,272] INFO: Processed: mapped/C11.sort.bam
[2024-03-08 11:41:28,363] INFO: Processed: mapped/C12.sort.bam
[2024-03-08 11:50:46,311] INFO: Processed: mapped/C13.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C161.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C162.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C163.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C171.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C172.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C173.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C181.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C182.sort.bam
[2024-03-08 11:50:46,312] INFO: Processed: mapped/C183.sort.bam
[2024-03-08 11:53:33,711] INFO: Processed: mapped/C191.sort.bam
[2024-03-08 12:03:55,805] INFO: Processed: mapped/C192.sort.bam
[2024-03-08 12:08:27,854] INFO: Processed: mapped/C193.sort.bam
[2024-03-08 12:33:25,614] INFO: Processed: mapped/C1.sort.bam
[2024-03-08 12:38:03,411] INFO: Processed: mapped/C21.sort.bam
[2024-03-08 12:38:03,411] INFO: Processed: mapped/C22.sort.bam
[2024-03-08 12:38:03,411] INFO: Processed: mapped/C23.sort.bam
[2024-03-08 12:38:03,411] INFO: Processed: mapped/C2.sort.bam
[2024-03-08 12:38:03,412] INFO: Processed: mapped/C31.sort.bam
[2024-03-08 12:38:03,412] INFO: Processed: mapped/C32.sort.bam
[2024-03-08 12:38:03,412] INFO: Processed: mapped/C33.sort.bam
[2024-03-08 12:38:03,412] INFO: Processed: mapped/C3.sort.bam
[2024-03-08 12:42:33,510] INFO: Processed: mapped/C81.sort.bam
[2024-03-08 12:42:33,510] INFO: Processed: mapped/C82.sort.bam
[2024-03-08 12:42:33,510] INFO: Processed: mapped/C83.sort.bam
[2024-03-08 12:42:33,510] INFO: Processed: mapped/C91.sort.bam
[2024-03-08 12:44:14,776] INFO: Processed: mapped/C92.sort.bam
[2024-03-08 12:48:07,180] INFO: Processed: mapped/C93.sort.bam
[2024-03-08 12:48:07,180] INFO: Processed: mapped/CE1.sort.bam
[2024-03-08 12:48:07,180] INFO: Processed: mapped/CE2.sort.bam
[2024-03-08 12:48:07,180] INFO: Processed: mapped/CE3.sort.bam
[2024-03-08 13:12:59,818] INFO: Training model and clustering for S1CNODE.
[2024-03-08 13:12:59,820] INFO: Start training from a single sample.
[2024-03-08 13:13:00,438] INFO: Training model...
  0%|                                                                                                           | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/eduardo/miniconda3/envs/SemiBin/bin/SemiBin2", line 33, in <module>
    sys.exit(load_entry_point('SemiBin==2.1.0', 'console_scripts', 'SemiBin2')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin-2.1.0-py3.12.egg/SemiBin/main.py", line 1563, in main2
    multi_easy_binning(
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin-2.1.0-py3.12.egg/SemiBin/main.py", line 1326, in multi_easy_binning
    training(logger, None, args.num_process,
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin-2.1.0-py3.12.egg/SemiBin/main.py", line 1103, in training
    model = train_self(logger,
            ^^^^^^^^^^^^^^^^^^
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin-2.1.0-py3.12.egg/SemiBin/self_supervised_model.py", line 77, in train_self
    train_data_depth = normalize(train_data_depth, axis=1, norm='l1')
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/scikit_learn-1.4.1.post1-py3.12-linux-x86_64.egg/sklearn/utils/_param_validation.py", line 213, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/scikit_learn-1.4.1.post1-py3.12-linux-x86_64.egg/sklearn/preprocessing/_data.py", line 1925, in normalize
    X = check_array(
        ^^^^^^^^^^^^
  File "/home/eduardo/miniconda3/envs/SemiBin/lib/python3.12/site-packages/scikit_learn-1.4.1.post1-py3.12-linux-x86_64.egg/sklearn/utils/validation.py", line 1072, in check_array
    raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 39)) while a minimum of 1 is required by the normalize function.

The text was updated successfully, but these errors were encountered:

psj1997 · 2024-03-11T12:49:54Z

It seems it still the error when combining the k-mer features and abundance features. Can you have a look for the files generated from SemiBin for every sample? (data.csv/data_split.csv/cov.csv) How many columns in these files?

Thanks!

eperezv · 2024-03-11T14:37:24Z

I see a folder containing the fasta files and files like C1.sort.bam_21_data.cov.csv and C1.sort.bam_21_data_split_cov.csv. But there are also other folders per each sample that contain maybe what you are asking for.
data.csv contains 176 columns (i.e., one with no head, 135 columns named 1, 2, 3... and then another 39 colums with mapped/C1.sort.bam_cov
data_split.csv same as before but just the heads.
data_cov.csv contains 40 columns (one with numbers + 39 that are my samples, sme as before, mapped/C1...

psj1997 · 2024-03-11T14:50:22Z

Can you show the five first rows of the data.csv ,data_split.csv,data_csv.csv and cov_split.csv?

eperezv · 2024-03-11T15:14:50Z

I don't have exactly the files you indicate, but these are the ones I have (per sample)

data.csv

data_split.csv

data_cov.csv

data_split_cov.csv

psj1997 · 2024-03-11T15:23:03Z

Can you help to check the first columns of data_split_cov.csv? If they are '1581622_1, 1581622_2'? Thanks!

eperezv · 2024-03-11T15:25:40Z

There is no _1, _2... Only what's shown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running SemiBin2 (normalization?) #159

Error when running SemiBin2 (normalization?) #159

eperezv commented Mar 8, 2024

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024 •

edited

Loading

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024

Error when running SemiBin2 (normalization?) #159

Error when running SemiBin2 (normalization?) #159

Comments

eperezv commented Mar 8, 2024

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024 • edited Loading

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024

psj1997 commented Mar 11, 2024

eperezv commented Mar 11, 2024

eperezv commented Mar 11, 2024 •

edited

Loading