-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatching numbers of read names were observed #76
Comments
Hi, This is usually caused by not having separate forward and reverse files, but I'm guessing based on your command line that isn't true. However, it seems you are using 'cleaned' reads. I suspect that in your cleaned reads file there are (rare) pairs sequences with the same name excluding everything after the first space. If that was true, then this command (untested) would show some:
TLDR: Just use raw reads if possible. Thanks for your interest in singlem. |
Hello Ben, I am encountering a similar error using a single file of interleaved reads. I have reproduced the full error log below. Looking at the documentation returned by Looking at the relevant line in Is that a bug that can be fixed? Or can singleM/graftM not operate on single files of interleaved reads? (Also, while these are raw reads, but from JGI, and my understanding is that JGI performs some QC before releasing the data. Is that also a problem?) Thanks,
|
Hi @dmitrisvetlov, I believe your error is different, because mfqe is reporting exactly double the number of reads found than it was expecting. I think this is a straight up bug you found, introduced into graftm when mfqe replaced fxtract. However, I would recommend updating the v1.0 beta (see https://wwood.github.io/singlem/Installation ) as the new version is much better. Will release bioconda when I get out of beta, which should be within a few weeks. However, in the new version you cannot yet provide interleaved reads. It might be possible to deinterleave them on the fly and stream the forward and reverse e.g. via I think the easiest thing here is just to deinterleave the reads before providing them to singlem, unless there is some strong reason not to? Thanks, ben |
Hi Ben, Thanks for letting me know. I am using the workaround of deinterleaving the reads via Dmitri |
Hello!
I am trying to use singlem for multiple samples using the following:
singlem pipe --forward /scratch/vilardi.k/katiefiles/clean_reads2/all_fq/${i}.clean.R1.fq --reverse /scratch/vilardi.k/katiefiles/clean_reads2/all_fq/${i}.clean.R2.fq --otu_table ${i}_otu_table.csv --threads 24
Some samples turn out fine and I get an OTU table as output. However, for some samples I get the following error:
03/13/2021 03:29:00 PM INFO: Using as input 1 different sequence files e.g. /scratch/vilardi.k/katiefiles/clean_reads2/all_fq/Filtered_W2_4_TCCATTGC-AGGTAGGA_L00M.clean.R1.fq
03/13/2021 03:29:00 PM INFO: Searching with 14 SingleM package(s)
03/13/2021 03:29:00 PM INFO: Searching for reads matching 28 different protein HMM(s)
Traceback (most recent call last):
File "/home/vilardi.k/.conda/envs/singlem/bin/singlem", line 513, in
known_sequence_taxonomy = args.known_sequence_taxonomy)
File "/home/vilardi.k/.conda/envs/singlem/lib/python3.6/site-packages/singlem/pipe.py", line 45, in run
otu_table_object = self.run_to_otu_table(**kwargs)
File "/home/vilardi.k/.conda/envs/singlem/lib/python3.6/site-packages/singlem/pipe.py", line 176, in run_to_otu_table
search_result = self._search(hmms, forward_read_files, reverse_read_files)
File "/home/vilardi.k/.conda/envs/singlem/lib/python3.6/site-packages/singlem/pipe.py", line 860, in _search
run(hmms, graftm_protein_search_directory, True)
File "/home/vilardi.k/.conda/envs/singlem/lib/python3.6/site-packages/singlem/pipe.py", line 848, in run
extern.run(cmd)
File "/home/vilardi.k/.conda/envs/singlem/lib/python3.6/site-packages/extern/init.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command graftM graft
returned non-zero exit status 1.
mfqe] Iterating input FASTQ file\n[2021-03-13T21:11:51Z INFO mfqe] Extracted 21918 reads from 17218560 total\nthread 'main' panicked at 'Mismatching numbers of read names were observed. Expected:\n[21915]\nbut found\n[21918]', src/main.rs:333:9\nnote: run with
RUST_BACKTRACE=1
environment variable to display a backtrace.\n"STDOUT was: b''\n'STDOUT was: b''(graftM graft comammd is super long so I am just posting the main parts of the error output)
I am not sure how to prevent 'Mismatching numbers of read names'.
The text was updated successfully, but these errors were encountered: