Skip to content

Commit

Permalink
typos
Browse files Browse the repository at this point in the history
  • Loading branch information
nsheff committed Aug 21, 2023
1 parent bec7e98 commit 5f27afb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This specification is in **DRAFT** form. This is **NOT YET AN APPROVED GA4GH spe

Reference sequences are fundamental to genomic analysis. To make their analysis reproducible and efficient, we require tools that can identify, store, retrieve, and compare reference sequences. The primary goal of the *Sequence Collections* (seqcol) project is **to standardize identifiers for collections of sequences**. Seqcol can be used to identify genomes, transcriptomes, or proteomes -- anything that can be represented as a collection of sequences. In brief, the project specifies 3 procedures:

1. **An algorithm for encoding sequence identifiers from collections.** The GA4GH standard [refget](http://samtools.github.io/hts-specs/refget.html) specifies a way to compute deterministic sequence identifiers from individual sequences themselves. Seqcol uses refget identifiers and adds functionality to wrap them into collections. Secol also handles sequence attributes, such as their names, lengths, or topologies. Seqcol identifiers are defined by a hash algorithm, rather than an accession authority, and are thus de-centralized and usable for many purposes, including private or new sequence collections, cases without connection to a central database, or validation of sequence collection content and provenance.
1. **An algorithm for encoding sequence identifiers from collections.** The GA4GH standard [refget](http://samtools.github.io/hts-specs/refget.html) specifies a way to compute deterministic sequence identifiers from individual sequences themselves. Seqcol uses refget identifiers and adds functionality to wrap them into collections. Seqcol also handles sequence attributes, such as their names, lengths, or topologies. Seqcol identifiers are defined by a hash algorithm, rather than an accession authority, and are thus de-centralized and usable for many purposes, including private or new sequence collections, cases without connection to a central database, or validation of sequence collection content and provenance.
2. **A lookup API to retrieve a collection given an identifier.** Seqcol also specifies a RESTful API to enable retrieving the sequence collections given an identifier. This allows one to retrieve the exact reference genome used for an analysis.
3. **A comparison API to assess compatibility of two collections.** Finally, seqcol also provides a standardized method of comparing the contents of two sequence collections. This comparison function can be used to determine if analysis results that used different references genomes may still be compatible.

Expand Down

0 comments on commit 5f27afb

Please sign in to comment.