Skip to content

Commit f6fe95c

Browse files
committed
update glossary (#2)
1 parent f29423f commit f6fe95c

File tree

2 files changed

+90
-137
lines changed

2 files changed

+90
-137
lines changed

book/08-addendum/best-practices.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This page outlines the best-practices for metadata and data, used in the EMO-BON
2727
2. ## Identifiers
2828

2929
- ### Observatory IDs
30-
Format: lowercase with hyphens
30+
Format: lowercase with hyphens
3131
**Examples**:
3232
- `bpns`
3333
- `hcmr-1`

book/08-addendum/glossary.md

Lines changed: 89 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -4,177 +4,130 @@ title: Glossary
44

55
A comprehensive list of terms and definitions used throughout the EMO-BON project.
66

7-
## Physical Concepts
7+
## Terms - physical concepts
88

9-
### Partner
9+
- ### Partner
10+
An EMO-BON member, which is typically but not exclusively an institute.
1011

11-
An EMO-BON member, which is typically but not exclusively an institute.
12+
- ### Station
13+
An EMO-BON Station. Stations collect EMO-BON samples. Stations may have multiple observatories, and each observatory can involve contributions from one or more partners.
1214

13-
### Station
15+
- ### Observatory
16+
An EMO-BON organisational unit linked to the collection of a specific sample type (e.g., water column, soft sediment) from a fixed, pre-determined location. While *technically* an observatory is tied to a sample type, this distinction is often ignored in casual use since the observatory's base name (obs_id) is the same for all sample types.
1417

15-
An EMO-BON Station. Stations collect EMO-BON samples. Stations may have multiple observatories, and each observatory can involve contributions from one or more partners.
18+
> Definition may need an update once the ARMS units are fully incorporated in EMO-BON.
1619
17-
### Observatory
20+
- ### Sampling Event
21+
A sampling action performed at a particular observatory at a specific time, resulting in the collection of one or more samples.
1822

19-
An EMO-BON organisational unit linked to the collection of a specific sample type (e.g., water column, soft sediment) from a fixed, pre-determined location. While *technically* an observatory is tied to a sample type, this distinction is often ignored in casual use since the observatory's base name (obs_id) is the same for all sample types.
23+
- ### Material Sample
24+
Refers to a material sample collected during a sampling event. Each unique material sample has a unique material sample ID. Also used to refer to the sample material that was sequenced, where the physical sample no longer exists but it is virtually present via its sample ID.
2025

21-
:::{note}
22-
Definition may need an update once the ARMS units are fully incorporated in EMO-BON.
23-
:::
26+
## Terms - digital concepts
2427

25-
### Sampling Event
28+
- ### Logsheet
29+
The spreadsheets in which the observatories write their sample and event data. The source spreadsheets are on the EMO-BON Google Drive, from where they are harvested as CSV into EMO-BON's GitHub space. The "transformed" logsheets are those that have been subjected to a date-range selection and a QC.
2630

27-
A sampling action performed at a particular observatory at a specific time, resulting in the collection of one or more samples.
31+
- ### EMO-BON Data
32+
The content of the logsheets, which are filled by the observatories to describe their collected samples; the sequences in ENA; the outputs from bioinformatics processing.
2833

29-
### Material Sample
34+
> Once ARMS units are incorporated, this will also include ARMS images.
3035
31-
Refers to a material sample collected during a sampling event. Each unique material sample has a unique material sample ID. Also used to refer to the sample material that was sequenced, where the physical sample no longer exists but it is virtually present via its sample ID.
36+
- ### EMO-BON Metadata
37+
The data that is used specifically to describe EMO-BON data, performing the function of allowing discovery, understanding, organising, cataloguing, etc. Metadata are recorded in the ro-crate-metadata.json files; they are added to ENA accessions; they are in files in the EMO-BON repos governance-data, sequencing-data, observatory-profile, among others.
3238

33-
## Digital Concepts
39+
- ### EMO-BON Record
40+
A digital representation of a sampling event, capturing the relevant data and metadata associated with it. There is no fixed idea of what is included in an EMO-BON record, as that depends on the system that these records are being held in; for example, EMO-BON records in EurOBIS and in Blue Cloud will not necessarily be the same.
3441

35-
### Logsheet
42+
- ### Catalogue Asset
43+
The smallest unit of "EMO-BON dataset" that goes into a dataset's metadata catalogue, i.e., a specific "EMO-BON record" in a specific catalogue. Can be a single data file or a set of files.
3644

37-
The spreadsheets in which the observatories write their sample and event data. The source spreadsheets are on the EMO-BON Google Drive, from where they are harvested as CSV into EMO-BON's GitHub space. The "transformed" logsheets are those that have been subjected to a date-range selection and a QC.
45+
- ### EMO-BON Repository
46+
A GitHub repository that contains EMO-BON data and metadata.
3847

39-
### EMO-BON Data
48+
A GitHub repository represents a storage location for files and their version history, managed using Git version control which allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time.
4049

41-
The content of the logsheets, which are filled by the observatories to describe their collected samples; the sequences in ENA; the outputs from bioinformatics processing.
50+
- ### EMO-BON RO-Crate
51+
EMO-BON RO-Crates contain data associated with: logsheets from observatories, MetaGOflow runs, sequencing metadata. Usually, our RO-Crates are single repositories, but for some, one repository contains multiple RO-Crates. An RO-Crate is manifest via a ro-crate-metadata.json file.
4252

43-
:::{note}
44-
Once ARMS units are incorporated, this will also include ARMS images.
45-
:::
53+
An RO-Crate is a collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms.
4654

47-
### EMO-BON Metadata
55+
- ### ro-crate-metadata.json
56+
A ro-crate-metadata.json file that describes the contents of an EMO-BON RO-Crate.
4857

49-
The data that is used specifically to describe EMO-BON data, performing the function of allowing discovery, understanding, organising, cataloguing, etc. Metadata are recorded in the ro-crate-metadata.json files; they are added to ENA accessions; they are in files in the EMO-BON repos governance-data, sequencing-data, observatory-profile, among others.
58+
A ro-crate-metadata.json file is a JSON-LD file that provides a detailed description of the contents and structure of an RO-Crate. It maps relationships between files and their metadata, ensuring traceability, context, and purpose for data within research workflows.
5059

51-
### EMO-BON Record
60+
- ### Sequence
61+
A DNA string. Specifically, we mean (raw) sequences as produced from the material samples by Genoscope and held on their cloud drive and then archived on ENA.
5262

53-
A digital representation of a sampling event, capturing the relevant data and metadata associated with it. There is no fixed idea of what is included in an EMO-BON record, as that depends on the system that these records are being held in; for example, EMO-BON records in EurOBIS and in Blue Cloud will not necessarily be the same.
63+
- ### Processed Sequences / OTUs / ASVs
64+
These are sequences that have been processed by a bioinformatics code to a stage where they can be/have been compared to taxonomic reference libraries.
5465

55-
### Catalogue Asset
56-
57-
The smallest unit of "EMO-BON dataset" that goes into a dataset's metadata catalogue, i.e., a specific "EMO-BON record" in a specific catalogue. Can be a single data file or a set of files.
58-
59-
### EMO-BON Repository
60-
61-
A GitHub repository that contains EMO-BON data and metadata.
62-
63-
A GitHub repository represents a storage location for files and their version history, managed using Git version control which allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time.
64-
65-
### EMO-BON RO-Crate
66-
67-
EMO-BON RO-Crates contain data associated with: logsheets from observatories, MetaGOflow runs, sequencing metadata. Usually, our RO-Crates are single repositories, but for some, one repository contains multiple RO-Crates. An RO-Crate is manifest via a ro-crate-metadata.json file.
68-
69-
An RO-Crate is a collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms.
70-
71-
### ro-crate-metadata.json
72-
73-
A ro-crate-metadata.json file that describes the contents of an EMO-BON RO-Crate.
74-
75-
A ro-crate-metadata.json file is a JSON-LD file that provides a detailed description of the contents and structure of an RO-Crate. It maps relationships between files and their metadata, ensuring traceability, context, and purpose for data within research workflows.
76-
77-
### Sequence
78-
79-
A DNA string. Specifically, we mean (raw) sequences as produced from the material samples by Genoscope and held on their cloud drive and then archived on ENA.
80-
81-
### Processed Sequences / OTUs / ASVs
82-
83-
These are sequences that have been processed by a bioinformatics code to a stage where they can be/have been compared to taxonomic reference libraries.
84-
85-
- **OTU**: Operational Taxonomic Unit - a group of similar sequences
86-
- **ASV**: Amplicon Sequence Variant - unique biological sequences
66+
- **OTU**:
67+
Operational Taxonomic Unit - a group of similar sequences
68+
- **ASV**:
69+
Amplicon Sequence Variant - unique biological sequences
8770

8871
## Technical Terms
8972

90-
### GitHub Actions
91-
92-
An automation and CI/CD platform built into GitHub that allows workflows to be triggered by repository events.
93-
94-
### DVC (Data Version Control)
95-
96-
A version control system for data and machine learning models that works alongside Git, storing large files in remote storage while tracking metadata in Git.
97-
98-
### SPARQL
73+
- ### GitHub Actions
74+
An automation and CI/CD platform built into GitHub that allows workflows to be triggered by repository events.
9975

100-
A query language for RDF databases, used to query the EMO-BON knowledge graph.
76+
- ### DVC (Data Version Control)
77+
A version control system for data and machine learning models that works alongside Git, storing large files in remote storage while tracking metadata in Git.
10178

102-
### RDF (Resource Description Framework)
79+
- ### SPARQL
80+
A query language for RDF databases, used to query the EMO-BON knowledge graph.
10381

104-
A standard for representing information about resources in the form of subject-predicate-object triples.
82+
- ### RDF (Resource Description Framework)
83+
A standard for representing information about resources in the form of subject-predicate-object triples.
10584

106-
### Turtle (TTL)
85+
- ### Turtle (TTL)
86+
A textual syntax for RDF that is more human-readable than other RDF formats.
10787

108-
A textual syntax for RDF that is more human-readable than other RDF formats.
88+
- ### SHACL (Shapes Constraint Language)
89+
A language for validating RDF data against a set of conditions (shapes).
10990

110-
### SHACL (Shapes Constraint Language)
91+
- ### JSON-LD (JSON for Linking Data)
92+
A JSON-based format for representing linked data, used in ro-crate-metadata.json files.
11193

112-
A language for validating RDF data against a set of conditions (shapes).
94+
- ### ENA (European Nucleotide Archive)
95+
One of the world's primary repositories for nucleotide sequence data, operated by EMBL-EBI.
11396

114-
### JSON-LD (JSON for Linking Data)
97+
- ### MetaGOflow
98+
A bioinformatics workflow for processing metagenomic and metabarcoding sequence data.
11599

116-
A JSON-based format for representing linked data, used in ro-crate-metadata.json files.
100+
- ### UDAL (Universal Data Access Layer)
101+
A unified interface for querying EMO-BON data across different sources and formats.
117102

118-
### ENA (European Nucleotide Archive)
119-
120-
One of the world's primary repositories for nucleotide sequence data, operated by EMBL-EBI.
121-
122-
### MetaGOflow
123-
124-
A bioinformatics workflow for processing metagenomic and metabarcoding sequence data.
125-
126-
### UDAL (Universal Data Access Layer)
127-
128-
A unified interface for querying EMO-BON data across different sources and formats.
129-
130-
### S3 (Simple Storage Service)
131-
132-
Amazon's object storage service, used by EMO-BON for storing large data files.
133-
134-
## Standards and Ontologies
135-
136-
### SOSA (Sensor, Observation, Sample, and Actuator)
137-
138-
An ontology for modeling observations, samples, and sampling.
139-
140-
### Dublin Core
141-
142-
A metadata standard providing a core set of vocabulary elements for describing resources.
143-
144-
### Darwin Core
145-
146-
A standard for sharing biodiversity information, particularly species occurrences.
147-
148-
### MIxS (Minimum Information about any Sequence)
149-
150-
A standard for describing genomic, metagenomic, and marker gene sequences.
151-
152-
### Schema.org
153-
154-
A collaborative project providing schemas for structured data on web pages.
103+
- ### S3 (Simple Storage Service)
104+
Amazon's object storage service, used by EMO-BON for storing large data files.
155105

156106
## Acronyms
157107

158-
- **ARMS**: Autonomous Reef Monitoring Structures
159-
- **ASV**: Amplicon Sequence Variant
160-
- **CI/CD**: Continuous Integration / Continuous Deployment
161-
- **CSV**: Comma-Separated Values
162-
- **DVC**: Data Version Control
163-
- **eDNA**: Environmental DNA
164-
- **EMBRC**: European Marine Biological Resource Centre
165-
- **EMO-BON**: European Marine Omics Biodiversity Observation Network
166-
- **ENA**: European Nucleotide Archive
167-
- **GH**: GitHub
168-
- **JSON-LD**: JSON for Linking Data
169-
- **OTU**: Operational Taxonomic Unit
170-
- **QC**: Quality Control
171-
- **RDF**: Resource Description Framework
172-
- **RO-Crate**: Research Object Crate
173-
- **S3**: Simple Storage Service
174-
- **SHACL**: Shapes Constraint Language
175-
- **SOSA**: Sensor, Observation, Sample, and Actuator
176-
- **SPARQL**: SPARQL Protocol and RDF Query Language
177-
- **TTL**: Turtle (RDF format)
178-
- **UDAL**: Universal Data Access Layer
179-
- **URI**: Uniform Resource Identifier
180-
- **VRE**: Virtual Research Environment
108+
| Acronym | Meaning |
109+
|----------|-----------------------------------------------------------|
110+
| ARMS | Autonomous Reef Monitoring Structures |
111+
| ASV | Amplicon Sequence Variant |
112+
| CI/CD | Continuous Integration / Continuous Deployment |
113+
| CSV | Comma-Separated Values |
114+
| DVC | Data Version Control |
115+
| eDNA | Environmental DNA |
116+
| EMBRC | European Marine Biological Resource Centre |
117+
| EMO-BON | European Marine Omics Biodiversity Observation Network |
118+
| ENA | European Nucleotide Archive |
119+
| GH | GitHub |
120+
| JSON-LD | JSON for Linking Data |
121+
| OTU | Operational Taxonomic Unit |
122+
| QC | Quality Control |
123+
| RDF | Resource Description Framework |
124+
| RO-Crate | Research Object Crate |
125+
| S3 | Simple Storage Service |
126+
| SHACL | Shapes Constraint Language |
127+
| SOSA | Sensor, Observation, Sample, and Actuator |
128+
| SPARQL | SPARQL Protocol and RDF Query Language |
129+
| TTL | Turtle (RDF format) |
130+
| UDAL | Universal Data Access Layer |
131+
| URI | Uniform Resource Identifier |
132+
| VRE | Virtual Research Environment |
133+

0 commit comments

Comments
 (0)