Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add taxonomic analysis and human reads removal wf #192

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

PlushZ
Copy link

@PlushZ PlushZ commented Apr 18, 2023

@bebatut @wm75 This is workflow for Taxonomic Analysis of SARS-CoV-2 Wastewater Samples with Human Read Removal. Would be a first part of metagenomic data variant analysis

Copy link

github-actions bot commented Mar 5, 2024

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Taxonomic-Analysis-of-SARS-CoV-2-Wastewater-Samples-with-Human-Read-Removal.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: SARS-CoV-2 reference genome:

        • step_state: scheduled
      • Step 2: Paired Collection:

        • step_state: scheduled
      • Step 3: toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.20.1+galaxy0:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmpvp1vnw0g/files/d/3/0/dataset_d30a72ce-4a42-4ef0-8b6c-48da70a66eec.dat' 'SRR12596170_fastq.fastq.gz' && ln -s '/tmp/tmpvp1vnw0g/files/4/5/7/dataset_4576cabe-e687-45d1-bb99-c0f24069cb39.dat' 'SRR12596170_fastq_R2.fastq.gz' &&    fastp  --thread ${GALAXY_SLOTS:-1} --report_title 'fastp report for SRR12596170_fastq.fastq.gz'   -i 'SRR12596170_fastq.fastq.gz' -o first.fastq.gz  -I 'SRR12596170_fastq_R2.fastq.gz' -O second.fastq.gz       --detect_adapter_for_pe                                          &&  mv first.fastq.gz '/tmp/tmpvp1vnw0g/job_working_directory/000/6/outputs/dataset_9b59d290-9dee-469a-8932-9966a0a4d674.dat' && mv second.fastq.gz '/tmp/tmpvp1vnw0g/job_working_directory/000/6/outputs/dataset_91e7a8e9-be16-4bb8-9811-c2fbacf66e9a.dat'

            Exit Code:

            • 0

            Standard Error:

            • Detecting adapter sequence for read1...
              No adapter detected for read1
              
              Detecting adapter sequence for read2...
              No adapter detected for read2
              
              Read1 before filtering:
              total reads: 167374
              total bases: 12425652
              Q20 bases: 11614277(93.4702%)
              Q30 bases: 11368920(91.4956%)
              
              Read2 before filtering:
              total reads: 167374
              total bases: 12265824
              Q20 bases: 11350710(92.5393%)
              Q30 bases: 11085935(90.3807%)
              
              Read1 after filtering:
              total reads: 155144
              total bases: 11496170
              Q20 bases: 10977759(95.4906%)
              Q30 bases: 10795701(93.9069%)
              
              Read2 aftering filtering:
              total reads: 155144
              total bases: 11348252
              Q20 bases: 10851896(95.6261%)
              Q30 bases: 10659366(93.9296%)
              
              Filtering result:
              reads passed filter: 310288
              reads failed due to low quality: 24088
              reads failed due to too many N: 372
              reads failed due to too short: 0
              reads with adapter trimmed: 1800
              bases trimmed due to adapters: 37303
              
              Duplication rate: 12.3851%
              
              Insert size peak (evaluated by paired-end reads): 116
              
              JSON report: fastp.json
              HTML report: fastp.html
              
              fastp --thread 1 --report_title fastp report for SRR12596170_fastq.fastq.gz -i SRR12596170_fastq.fastq.gz -o first.fastq.gz -I SRR12596170_fastq_R2.fastq.gz -O second.fastq.gz --detect_adapter_for_pe 
              fastp v0.20.1, time used: 7 seconds
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fastqsanger.gz"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_options {"length_filtering_options": {"disable_length_filtering": false, "length_limit": null, "length_required": null}, "low_complexity_filter": {"complexity_threshold": null, "enable_low_complexity_filter": false}, "quality_filtering_options": {"disable_quality_filtering": false, "n_base_limit": null, "qualified_quality_phred": null, "unqualified_percent_limit": null}}
              output_options {"report_html": true, "report_json": true}
              overrepresented_sequence_analysis {"overrepresentation_analysis": false, "overrepresentation_sampling": null}
              read_mod_options {"base_correction_options": {"correction": false}, "cutting_by_quality_options": {"cut_by_quality3": false, "cut_by_quality5": false, "cut_mean_quality": null, "cut_window_size": null}, "polyg_tail_trimming": {"__current_case__": 1, "poly_g_min_len": null, "trimming_select": ""}, "polyx_tail_trimming": {"__current_case__": 1, "polyx_trimming_select": ""}, "umi_processing": {"umi": false, "umi_len": null, "umi_loc": "", "umi_prefix": ""}}
              single_paired {"__current_case__": 2, "adapter_trimming_options": {"adapter_sequence1": "", "adapter_sequence2": "", "disable_adapter_trimming": false}, "global_trimming_options": {"trim_front1": null, "trim_front2": null, "trim_tail1": null, "trim_tail2": null}, "paired_input": {"values": [{"id": 1, "src": "dce"}]}, "single_paired_selector": "paired_collection"}
          • Job 2:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmpvp1vnw0g/files/5/c/4/dataset_5c41e464-70bf-4ca3-ab35-b684ecf945fc.dat' 'SRR12596172_fastq.fastq.gz' && ln -s '/tmp/tmpvp1vnw0g/files/f/3/c/dataset_f3c2d737-1ea5-4bbe-a418-86cfe41c8882.dat' 'SRR12596172_fastq_R2.fastq.gz' &&    fastp  --thread ${GALAXY_SLOTS:-1} --report_title 'fastp report for SRR12596172_fastq.fastq.gz'   -i 'SRR12596172_fastq.fastq.gz' -o first.fastq.gz  -I 'SRR12596172_fastq_R2.fastq.gz' -O second.fastq.gz       --detect_adapter_for_pe                                          &&  mv first.fastq.gz '/tmp/tmpvp1vnw0g/job_working_directory/000/7/outputs/dataset_00406541-0045-4332-bc59-40fba66ebfb7.dat' && mv second.fastq.gz '/tmp/tmpvp1vnw0g/job_working_directory/000/7/outputs/dataset_28b581d3-871a-40f8-93dc-7ccd60fe556e.dat'

            Exit Code:

            • 0

            Standard Error:

            • Detecting adapter sequence for read1...
              No adapter detected for read1
              
              Detecting adapter sequence for read2...
              No adapter detected for read2
              
              Read1 before filtering:
              total reads: 167374
              total bases: 12425652
              Q20 bases: 11614277(93.4702%)
              Q30 bases: 11368920(91.4956%)
              
              Read2 before filtering:
              total reads: 167374
              total bases: 12265824
              Q20 bases: 11350710(92.5393%)
              Q30 bases: 11085935(90.3807%)
              
              Read1 after filtering:
              total reads: 155144
              total bases: 11496170
              Q20 bases: 10977759(95.4906%)
              Q30 bases: 10795701(93.9069%)
              
              Read2 aftering filtering:
              total reads: 155144
              total bases: 11348252
              Q20 bases: 10851896(95.6261%)
              Q30 bases: 10659366(93.9296%)
              
              Filtering result:
              reads passed filter: 310288
              reads failed due to low quality: 24088
              reads failed due to too many N: 372
              reads failed due to too short: 0
              reads with adapter trimmed: 1800
              bases trimmed due to adapters: 37303
              
              Duplication rate: 12.3851%
              
              Insert size peak (evaluated by paired-end reads): 116
              
              JSON report: fastp.json
              HTML report: fastp.html
              
              fastp --thread 1 --report_title fastp report for SRR12596172_fastq.fastq.gz -i SRR12596172_fastq.fastq.gz -o first.fastq.gz -I SRR12596172_fastq_R2.fastq.gz -O second.fastq.gz --detect_adapter_for_pe 
              fastp v0.20.1, time used: 7 seconds
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fastqsanger.gz"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_options {"length_filtering_options": {"disable_length_filtering": false, "length_limit": null, "length_required": null}, "low_complexity_filter": {"complexity_threshold": null, "enable_low_complexity_filter": false}, "quality_filtering_options": {"disable_quality_filtering": false, "n_base_limit": null, "qualified_quality_phred": null, "unqualified_percent_limit": null}}
              output_options {"report_html": true, "report_json": true}
              overrepresented_sequence_analysis {"overrepresentation_analysis": false, "overrepresentation_sampling": null}
              read_mod_options {"base_correction_options": {"correction": false}, "cutting_by_quality_options": {"cut_by_quality3": false, "cut_by_quality5": false, "cut_mean_quality": null, "cut_window_size": null}, "polyg_tail_trimming": {"__current_case__": 1, "poly_g_min_len": null, "trimming_select": ""}, "polyx_tail_trimming": {"__current_case__": 1, "polyx_trimming_select": ""}, "umi_processing": {"umi": false, "umi_len": null, "umi_loc": "", "umi_prefix": ""}}
              single_paired {"__current_case__": 2, "adapter_trimming_options": {"adapter_sequence1": "", "adapter_sequence2": "", "disable_adapter_trimming": false}, "global_trimming_options": {"trim_front1": null, "trim_front2": null, "trim_tail1": null, "trim_tail2": null}, "paired_input": {"values": [{"id": 4, "src": "dce"}]}, "single_paired_selector": "paired_collection"}
      • Step 4: toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • kraken2 --threads ${GALAXY_SLOTS:-1} --db '/cvmfs/data.galaxyproject.org/managed/kraken2_databases/kraken2_viral_db'    --paired '/tmp/tmpvp1vnw0g/files/d/3/0/dataset_d30a72ce-4a42-4ef0-8b6c-48da70a66eec.dat' '/tmp/tmpvp1vnw0g/files/4/5/7/dataset_4576cabe-e687-45d1-bb99-c0f24069cb39.dat'   --confidence '0.0' --minimum-base-quality '0' --minimum-hit-groups '2'    --report '/tmp/tmpvp1vnw0g/job_working_directory/000/8/outputs/dataset_bd438a34-cc3d-44c8-9c3d-ac96c62dec9b.dat'     > '/tmp/tmpvp1vnw0g/job_working_directory/000/8/outputs/dataset_75337e88-8056-4f07-97ce-0342825854d6.dat'

            Exit Code:

            • 0

            Standard Error:

            • Loading database information... done.
              167374 sequences (24.69 Mbp) processed in 1.468s (6841.7 Kseq/m, 1009.31 Mbp/m).
                5662 sequences classified (3.38%)
                161712 sequences unclassified (96.62%)
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fastqsanger.gz"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              confidence "0.0"
              dbkey "?"
              kraken2_database "viral2019-03"
              min_base_quality "0"
              minimum_hit_groups "2"
              quick false
              report {"create_report": true, "report_minimizer_data": false, "report_zero_counts": false, "use_mpa_style": false}
              single_paired {"__current_case__": 0, "input_pair": {"values": [{"id": 1, "src": "dce"}]}, "single_paired_selector": "collection"}
              split_reads false
              use_names false
          • Job 2:

            • Job state is ok

            Command Line:

            • kraken2 --threads ${GALAXY_SLOTS:-1} --db '/cvmfs/data.galaxyproject.org/managed/kraken2_databases/kraken2_viral_db'    --paired '/tmp/tmpvp1vnw0g/files/5/c/4/dataset_5c41e464-70bf-4ca3-ab35-b684ecf945fc.dat' '/tmp/tmpvp1vnw0g/files/f/3/c/dataset_f3c2d737-1ea5-4bbe-a418-86cfe41c8882.dat'   --confidence '0.0' --minimum-base-quality '0' --minimum-hit-groups '2'    --report '/tmp/tmpvp1vnw0g/job_working_directory/000/9/outputs/dataset_87d986a5-c189-45a3-a5a0-5cfd386b4bf5.dat'     > '/tmp/tmpvp1vnw0g/job_working_directory/000/9/outputs/dataset_26b4a608-3368-413e-a783-758b86df10cc.dat'

            Exit Code:

            • 0

            Standard Error:

            • Loading database information... done.
              167374 sequences (24.69 Mbp) processed in 1.475s (6806.6 Kseq/m, 1004.13 Mbp/m).
                5662 sequences classified (3.38%)
                161712 sequences unclassified (96.62%)
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fastqsanger.gz"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              confidence "0.0"
              dbkey "?"
              kraken2_database "viral2019-03"
              min_base_quality "0"
              minimum_hit_groups "2"
              quick false
              report {"create_report": true, "report_minimizer_data": false, "report_zero_counts": false, "use_mpa_style": false}
              single_paired {"__current_case__": 0, "input_pair": {"values": [{"id": 4, "src": "dce"}]}, "single_paired_selector": "collection"}
              split_reads false
              use_names false
      • Step 5: toolshed.g2.bx.psu.edu/repos/iuc/read_it_and_keep/read_it_and_keep/0.2.2+galaxy0:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmpvp1vnw0g/files/7/e/7/dataset_7e74610b-a616-4717-9b1c-dba7a2ba6168.dat' ref_untrimmed.fasta && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/read_it_and_keep/1563b58905f4/read_it_and_keep/trim_reference.py' ref_untrimmed.fasta ref.fasta && ln -s '/tmp/tmpvp1vnw0g/files/9/b/5/dataset_9b59d290-9dee-469a-8932-9966a0a4d674.dat' read1 && ln -s '/tmp/tmpvp1vnw0g/files/9/1/e/dataset_91e7a8e9-be16-4bb8-9811-c2fbacf66e9a.dat' read2 && readItAndKeep --tech illumina --ref_fasta ref.fasta --min_map_length 50 --min_map_length_pc 50.0  --reads1 read1 --reads2 read2 -o output

            Exit Code:

            • 0

            Standard Error:

            • Processed 100000 reads (or read pairs)
              

            Standard Output:

            • Input reads file 1	155144
              Input reads file 2	155144
              Kept reads 1	24959
              Kept reads 2	24959
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              adv {"enumerate_names": false, "min_map_length": "50", "min_map_length_pc": "50.0"}
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              reads {"__current_case__": 1, "paired_reads": {"values": [{"id": 7, "src": "dce"}]}, "read_type": "paired_collection"}
              ref_source {"__current_case__": 0, "ref_fasta": {"values": [{"id": 1, "src": "hda"}]}, "source": "history"}
              sequencing_tech "illumina"
              trim_reference true
          • Job 2:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmpvp1vnw0g/files/7/e/7/dataset_7e74610b-a616-4717-9b1c-dba7a2ba6168.dat' ref_untrimmed.fasta && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/read_it_and_keep/1563b58905f4/read_it_and_keep/trim_reference.py' ref_untrimmed.fasta ref.fasta && ln -s '/tmp/tmpvp1vnw0g/files/0/0/4/dataset_00406541-0045-4332-bc59-40fba66ebfb7.dat' read1 && ln -s '/tmp/tmpvp1vnw0g/files/2/8/b/dataset_28b581d3-871a-40f8-93dc-7ccd60fe556e.dat' read2 && readItAndKeep --tech illumina --ref_fasta ref.fasta --min_map_length 50 --min_map_length_pc 50.0  --reads1 read1 --reads2 read2 -o output

            Exit Code:

            • 0

            Standard Error:

            • Processed 100000 reads (or read pairs)
              

            Standard Output:

            • Input reads file 1	155144
              Input reads file 2	155144
              Kept reads 1	24959
              Kept reads 2	24959
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              adv {"enumerate_names": false, "min_map_length": "50", "min_map_length_pc": "50.0"}
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              reads {"__current_case__": 1, "paired_reads": {"values": [{"id": 8, "src": "dce"}]}, "read_type": "paired_collection"}
              ref_source {"__current_case__": 0, "ref_fasta": {"values": [{"id": 1, "src": "hda"}]}, "source": "history"}
              sequencing_tech "illumina"
              trim_reference true
      • Step 6: toolshed.g2.bx.psu.edu/repos/devteam/kraken2tax/Kraken2Tax/1.1:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is running

            Command Line:

            • awk '{ print $2, $3 }' OFS="\t" "/tmp/tmpvp1vnw0g/files/b/d/4/dataset_bd438a34-cc3d-44c8-9c3d-ac96c62dec9b.dat" | taxonomy-reader "/cvmfs/data.galaxyproject.org/managed/ncbi_taxonomy/ncbi-2015-10-05/names.dmp" "/cvmfs/data.galaxyproject.org/managed/ncbi_taxonomy/ncbi-2015-10-05/nodes.dmp" 1 > "/tmp/tmpvp1vnw0g/job_working_directory/000/12/outputs/dataset_61c4bc80-cc86-41a5-98f3-7577945e8bf7.dat"

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              ncbi_taxonomy "ncbi-2015-10-05"
              read_name "2"
              tax_id 5bc3b5997ae36e38
          • Job 2:

            • Job state is error

            Command Line:

            • awk '{ print $2, $3 }' OFS="\t" "/tmp/tmpvp1vnw0g/files/8/7/d/dataset_87d986a5-c189-45a3-a5a0-5cfd386b4bf5.dat" | taxonomy-reader "/cvmfs/data.galaxyproject.org/managed/ncbi_taxonomy/ncbi-2015-10-05/names.dmp" "/cvmfs/data.galaxyproject.org/managed/ncbi_taxonomy/ncbi-2015-10-05/nodes.dmp" 1 > "/tmp/tmpvp1vnw0g/job_working_directory/000/13/outputs/dataset_11ecc157-8193-468d-a152-8f217a33da1b.dat"

            Exit Code:

            • 127

            Standard Error:

            • /tmp/tmpvp1vnw0g/job_working_directory/000/13/tool_script.sh: line 9: taxonomy-reader: command not found
              

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              ncbi_taxonomy "ncbi-2015-10-05"
              read_name "2"
              tax_id 5bc3b5997ae36e38
      • Step 7: toolshed.g2.bx.psu.edu/repos/crs4/taxonomy_krona_chart/taxonomy_krona_chart/2.7.1:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "taxonomy"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              combine_inputs false
              dbkey "?"
              root_name "Root"
              type_of_data {"__current_case__": 0, "input": {"values": [{"id": 8, "src": "hdca"}]}, "max_rank": "8", "type_of_data_selector": "taxonomy"}
      • Step 8: toolshed.g2.bx.psu.edu/repos/crs4/taxonomy_krona_chart/taxonomy_krona_chart/2.7.1:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "taxonomy"
              __workflow_invocation_uuid__ "4dc0b42fdb4311eeaad6fbf0711a9142"
              chromInfo "/tmp/tmpvp1vnw0g/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              combine_inputs true
              dbkey "?"
              root_name "Root"
              type_of_data {"__current_case__": 0, "input": {"values": [{"id": 8, "src": "hdca"}]}, "max_rank": "8", "type_of_data_selector": "taxonomy"}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • b718e6261924da2c
      • history_state

        • error
      • invocation_id

        • b718e6261924da2c
      • invocation_state

        • scheduled
      • workflow_id

        • b718e6261924da2c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants