From 5f27afb300c106bee9e1bc8e9bd95c1bf3918d0b Mon Sep 17 00:00:00 2001 From: nsheff Date: Mon, 21 Aug 2023 11:34:59 -0400 Subject: [PATCH] typos --- docs/specification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/specification.md b/docs/specification.md index 8376045..eadeb4d 100644 --- a/docs/specification.md +++ b/docs/specification.md @@ -20,7 +20,7 @@ This specification is in **DRAFT** form. This is **NOT YET AN APPROVED GA4GH spe Reference sequences are fundamental to genomic analysis. To make their analysis reproducible and efficient, we require tools that can identify, store, retrieve, and compare reference sequences. The primary goal of the *Sequence Collections* (seqcol) project is **to standardize identifiers for collections of sequences**. Seqcol can be used to identify genomes, transcriptomes, or proteomes -- anything that can be represented as a collection of sequences. In brief, the project specifies 3 procedures: -1. **An algorithm for encoding sequence identifiers from collections.** The GA4GH standard [refget](http://samtools.github.io/hts-specs/refget.html) specifies a way to compute deterministic sequence identifiers from individual sequences themselves. Seqcol uses refget identifiers and adds functionality to wrap them into collections. Secol also handles sequence attributes, such as their names, lengths, or topologies. Seqcol identifiers are defined by a hash algorithm, rather than an accession authority, and are thus de-centralized and usable for many purposes, including private or new sequence collections, cases without connection to a central database, or validation of sequence collection content and provenance. +1. **An algorithm for encoding sequence identifiers from collections.** The GA4GH standard [refget](http://samtools.github.io/hts-specs/refget.html) specifies a way to compute deterministic sequence identifiers from individual sequences themselves. Seqcol uses refget identifiers and adds functionality to wrap them into collections. Seqcol also handles sequence attributes, such as their names, lengths, or topologies. Seqcol identifiers are defined by a hash algorithm, rather than an accession authority, and are thus de-centralized and usable for many purposes, including private or new sequence collections, cases without connection to a central database, or validation of sequence collection content and provenance. 2. **A lookup API to retrieve a collection given an identifier.** Seqcol also specifies a RESTful API to enable retrieving the sequence collections given an identifier. This allows one to retrieve the exact reference genome used for an analysis. 3. **A comparison API to assess compatibility of two collections.** Finally, seqcol also provides a standardized method of comparing the contents of two sequence collections. This comparison function can be used to determine if analysis results that used different references genomes may still be compatible.