Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set PATH for data #115

Open
thulasis opened this issue Nov 29, 2022 · 11 comments
Open

Unable to set PATH for data #115

thulasis opened this issue Nov 29, 2022 · 11 comments

Comments

@thulasis
Copy link

Hi
I am trying to install singlem in our local system with conda environment. I am having two issues

  1. with singlem bin
    conda activate singlem
    cd bin
    export PATH=$PWD:$PATH
    singlem -h
    It works well and displays scripts options

but after deactivating conda environment and re-activating conda environment
the singlem -h displays
"singlem command not found

  1. with data
    after
    singlem data --output-directory /tmp/dbs
    then adding path to ~/.bashrc
    export SINGLEM_METAPACKAGE_PATH='/tmp/dbs/S3.0.5.metapackage20220806.smpkg.zb/payload_directory'

when I am running
singlem data --verify-only

The following error pops up
11/29/2022 04:32:07 PM INFO: SingleM v1.0.0beta2
11/29/2022 04:32:07 PM INFO: Acquiring SingleM packages from environment variable
11/29/2022 04:32:07 PM INFO: Retrieval successful. Location of backpack is: /tmp/dbs/S3.0.5.metapackage20220806.smpkg.zb/payload_directory
Traceback (most recent call last):
File "/home/swmed.org/s212810/miniconda3/envs/singlem/bin/singlem", line 1084, in
Metapackage.verify(output_directory = args.output_directory)
File "/home/swmed.org/s212810/miniconda3/envs/singlem/bin/../singlem/metapackage.py", line 133, in verify
backpack = zenodo_backpack.acquire(env_var_name=DATA_ENVIRONMENT_VARIABLE, version=DATA_DEFAULT_VERSION)
File "/home/swmed.org/s212810/miniconda3/envs/singlem/lib/python3.9/site-packages/zenodo_backpack/init.py", line 121, in acquire
if version != zb.data_version_string():
File "/home/swmed.org/s212810/miniconda3/envs/singlem/lib/python3.9/site-packages/zenodo_backpack/init.py", line 71, in data_version_string
return self.contents[DATA_VERSION]
KeyError: 'data_version'

Please help me with this

Thanks,
Tulasi

@wwood
Copy link
Owner

wwood commented Nov 30, 2022

Hi,

Thanks for giving it a go.

For (1) PATH is not set by conda activate (unless you e.g. create a file in etc/conda/activate.d of the conda env directory which does this). This issue won't remain in the future when there is a proper bioconda package.

For (2) this bug is fixed in the main branch and 1.0.0beta3, which I just pushed. Basically you need to specify

export SINGLEM_METAPACKAGE_PATH='/tmp/dbs/S3.0.5.metapackage20220806.smpkg.zb

i.e. don't have the /payload_directory bit. You don't need to redownload the data, the previous version was just telling you the wrong export of SINGLEM_METAPACKAGE_PATH.

HTH, ben

@thulasis
Copy link
Author

Hi Ben,

I fixed the 1) problem manually by transferring files to conda environment and it was worked.
And the second one, it is working now after the fix.

BTW I am running this on nanopore reads. I am using the " singlem pipe --sequences" option. I guess I am doing right. If not please let me know.

Thanks,
Tulasi

@wwood
Copy link
Owner

wwood commented Nov 30, 2022

Glad the first 2 issues went away.

I've not really tested singlem on nanopore datasets, so your mileage may vary. I suspect it might work OK but keen to see if e.g. the profiles from nanopore roughly match the profiles from Illumina sequencing. There's certainly a number of improvements that I can think of that might be suitable.

One thing you might want to try is pipe --hmmsearch-package-assignment. Nanopore reads will be long enough to break the default assumption that a read only encodes the window from at most 1 gene - that flag removes that assumption at a small cost to runtime.

Let me know how you go, if you don't mind?

Thanks,
ben

@thulasis
Copy link
Author

thulasis commented Dec 1, 2022

Hi Ben,

Thanks for the suggestions. I already tested these reads on EPI2ME. You are correct the mileage is low. I tried pipe --hmsearch-package-assignment but not much improvement in the classification of OTUs.

I am getting these messages on screen after running the option hmmsearch

12/01/2022 11:32:37 AM INFO: SingleM v1.0.0beta2
12/01/2022 11:32:37 AM INFO: Retrieval successful. Location of backpack is: /tmp/dbs/S3.0.5.metapackage20220806.smpkg.zb
12/01/2022 11:32:37 AM INFO: Loaded 59 SingleM packages
12/01/2022 11:32:37 AM INFO: Using as input 1 different sequence files e.g. barcode01.fastq.gz
12/01/2022 11:32:37 AM INFO: Filtering sequence files through DIAMOND blastx
12/01/2022 11:55:01 AM INFO: Finished DIAMOND prefilter phase
12/01/2022 11:55:01 AM INFO: Assigning sequences to SingleM packages with HMMSEARCH ..
12/01/2022 11:55:01 AM INFO: Searching with 59 SingleM package(s)
12/01/2022 11:55:01 AM INFO: Searching for reads matching 77 different protein HMM(s)
12/01/2022 11:55:15 AM INFO: Finished search phase
12/01/2022 11:55:15 AM INFO: Running separate alignments in GraftM..
12/01/2022 11:56:48 AM INFO: Finished extracting aligned sequences
12/01/2022 11:56:48 AM INFO: Running taxonomic assignment ..
12/01/2022 11:56:48 AM INFO: Assigning taxonomy by singlem query ..

Do you did it change anything from the default run options?
Of course, the sequence data is not so cool, as it is filled with 99% of host DNA.

Thanks,
Tulasi

@wwood
Copy link
Owner

wwood commented Dec 1, 2022

Hi,

That output looks right, though I cannot tell from it how many reads are being picked up. Of course, if there is near-zero microbial reads then community profiling isn't really possible. Thanks for keeping me up to date.

When you say

not much improvement in the classification of OTUs.

What does that mean? Not good taxonomic assignment or not good number of reads included?

Thanks, ben

@thulasis
Copy link
Author

thulasis commented Dec 1, 2022

Hi Ben,

What I meant to say is from default settings I got the otu-table with 90 data points while with the hmmsearch option, I got 77 data points in the csv file.

The taxonomic assignment is same in both files and almost similar to Nanopre's EPI2ME pipeline, as the sample got 99.9% host DNA reads. It was expected.

Thanks,
Tulasi

@wwood
Copy link
Owner

wwood commented Dec 2, 2022

Thanks for letting me know. I actually find those results a bit surprising. I would have thought you'd get more data points in the hmmsearch option.

Would you mind please running without the hmmsearch option but using --archive-otu-table and then sending me that file via email (perhaps zipped to save space) or otherwise please? Included in that will be the raw reads that ultimately go into the output CSV file, and will help me debug what seems to be going awry.

Thanks, ben

@thulasis
Copy link
Author

thulasis commented Dec 2, 2022

Hi Ben,

Yeah sure. I am running it now. Here I attached the result file. Do you need me to send the initial raw reads as well?
We have 10 GB data for each barcode. If you like that also. I will share with you on our lamella cloud.

Thanks,
barcode01_archive.txt

Tulasi

@wwood
Copy link
Owner

wwood commented Dec 2, 2022 via email

@thulasis
Copy link
Author

thulasis commented Dec 6, 2022

Thanks Ben

@aljazdzy
Copy link

I had a similar issue AND I'm also running Nanopore reads so I am very interested in any potential outcomes that may have been resolved here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants