-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binned populations from appraise not matching numbers extracted from genomes #166
Comments
Upon some digging, this seems to arise due to the markers from the MAGs mapping to spurious clusters from the metagenomes, perhaps due to running singleM on the raw reads. Applying a prevalence filter makes the numbers more probable. Perhaps adding this option to summarise when combining multiple samples might be useful. |
I'm a little confused about what commands and analysis you've run exactly. Typically you wouldn't cluster the OTU tables before using appraise, instead just specify --imperfect. What are "spurious clusters from the metagenomes" ? |
Hi Ben, I have 120 raw metagenomes and 1700+ MAGs, so I split them into batches and ran singlem pipe
Then I concatenated the results using singlem summarise
As the otu table, especially from the raw metagenomes, was quite large, I clustered the markers at the default identity. Clustering the otu table containing all 59 markers never finished, so I split each marker and clustered them separately, for example
Upon inspecting this `raw_reads_otu_table_clustered.tsv' file, I observed several sequences within each marker that were detected only in one sample at best. These are what I termed spurious clusters, perhaps singleton clusters arising due to sequencing error? (I used the raw reads with no trimming.) I followed the same steps as above for the MAGs, and then ran appraise as follows:
As mentioned above, when I look at a particular marker, say S3.13 ribosomal S9, I see a total of 1434 fragments identified from my 1767 MAGs (95% gANI). When I appraise this against the clustered otu table from the raw metagenomes, the binned otu table gives me 6402 fragments as being binned. For this gene, the otu table from the raw reads contained roughly 36,000 clusters (about 5-8x the amount of unique bacteria/archaea one would expect for these samples, going by our amplicon data). When I apply a 5% prevalence filter on the clustered markers from the raw metagenomes (sequences detected in at least 6/120 samples), this reduces to 1520 fragments being binned (from appraise). While I understand that there is no requirement to cluster the raw reads (and instead just use appraise --imperfect), I did that to obtain a species-level OTU table for my community analysis. I assume clustering prior to appraise should also give similar results? |
Hi Ben,
Apologies for the multiple questions. I ran singlem separately on my metagenome reads and my MAGs. I then separated each marker and clustered them using the default species-level identity.
When I look at a particular marker, say S3.13 Ribosomal S9, I see a total of 1434 fragments identified from my 1767 MAGs (95% gANI). Next, I ran appraise on the clustered markers from the raw reads and the MAGs using the --imperfect option with default identity settings, and separated the binned and unaccounted populations. The binned populations now contain 6402 unique fragments. I am a bit lost on how to interpret this.
Regards, Adi
The text was updated successfully, but these errors were encountered: