You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A comprehensive list of terms and definitions used throughout the EMO-BON project.
6
6
7
-
## Physical Concepts
7
+
## Terms - physical concepts
8
8
9
-
### Partner
9
+
-### Partner
10
+
An EMO-BON member, which is typically but not exclusively an institute.
10
11
11
-
An EMO-BON member, which is typically but not exclusively an institute.
12
+
-### Station
13
+
An EMO-BON Station. Stations collect EMO-BON samples. Stations may have multiple observatories, and each observatory can involve contributions from one or more partners.
12
14
13
-
### Station
15
+
-### Observatory
16
+
An EMO-BON organisational unit linked to the collection of a specific sample type (e.g., water column, soft sediment) from a fixed, pre-determined location. While *technically* an observatory is tied to a sample type, this distinction is often ignored in casual use since the observatory's base name (obs_id) is the same for all sample types.
14
17
15
-
An EMO-BON Station. Stations collect EMO-BON samples. Stations may have multiple observatories, and each observatory can involve contributions from one or more partners.
18
+
> Definition may need an update once the ARMS units are fully incorporated in EMO-BON.
16
19
17
-
### Observatory
20
+
-### Sampling Event
21
+
A sampling action performed at a particular observatory at a specific time, resulting in the collection of one or more samples.
18
22
19
-
An EMO-BON organisational unit linked to the collection of a specific sample type (e.g., water column, soft sediment) from a fixed, pre-determined location. While *technically* an observatory is tied to a sample type, this distinction is often ignored in casual use since the observatory's base name (obs_id) is the same for all sample types.
23
+
-### Material Sample
24
+
Refers to a material sample collected during a sampling event. Each unique material sample has a unique material sample ID. Also used to refer to the sample material that was sequenced, where the physical sample no longer exists but it is virtually present via its sample ID.
20
25
21
-
:::{note}
22
-
Definition may need an update once the ARMS units are fully incorporated in EMO-BON.
23
-
:::
26
+
## Terms - digital concepts
24
27
25
-
### Sampling Event
28
+
-### Logsheet
29
+
The spreadsheets in which the observatories write their sample and event data. The source spreadsheets are on the EMO-BON Google Drive, from where they are harvested as CSV into EMO-BON's GitHub space. The "transformed" logsheets are those that have been subjected to a date-range selection and a QC.
26
30
27
-
A sampling action performed at a particular observatory at a specific time, resulting in the collection of one or more samples.
31
+
-### EMO-BON Data
32
+
The content of the logsheets, which are filled by the observatories to describe their collected samples; the sequences in ENA; the outputs from bioinformatics processing.
28
33
29
-
### Material Sample
34
+
> Once ARMS units are incorporated, this will also include ARMS images.
30
35
31
-
Refers to a material sample collected during a sampling event. Each unique material sample has a unique material sample ID. Also used to refer to the sample material that was sequenced, where the physical sample no longer exists but it is virtually present via its sample ID.
36
+
-### EMO-BON Metadata
37
+
The data that is used specifically to describe EMO-BON data, performing the function of allowing discovery, understanding, organising, cataloguing, etc. Metadata are recorded in the ro-crate-metadata.json files; they are added to ENA accessions; they are in files in the EMO-BON repos governance-data, sequencing-data, observatory-profile, among others.
32
38
33
-
## Digital Concepts
39
+
-### EMO-BON Record
40
+
A digital representation of a sampling event, capturing the relevant data and metadata associated with it. There is no fixed idea of what is included in an EMO-BON record, as that depends on the system that these records are being held in; for example, EMO-BON records in EurOBIS and in Blue Cloud will not necessarily be the same.
34
41
35
-
### Logsheet
42
+
-### Catalogue Asset
43
+
The smallest unit of "EMO-BON dataset" that goes into a dataset's metadata catalogue, i.e., a specific "EMO-BON record" in a specific catalogue. Can be a single data file or a set of files.
36
44
37
-
The spreadsheets in which the observatories write their sample and event data. The source spreadsheets are on the EMO-BON Google Drive, from where they are harvested as CSV into EMO-BON's GitHub space. The "transformed" logsheets are those that have been subjected to a date-range selection and a QC.
45
+
-### EMO-BON Repository
46
+
A GitHub repository that contains EMO-BON data and metadata.
38
47
39
-
### EMO-BON Data
48
+
A GitHub repository represents a storage location for files and their version history, managed using Git version control which allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time.
40
49
41
-
The content of the logsheets, which are filled by the observatories to describe their collected samples; the sequences in ENA; the outputs from bioinformatics processing.
50
+
-### EMO-BON RO-Crate
51
+
EMO-BON RO-Crates contain data associated with: logsheets from observatories, MetaGOflow runs, sequencing metadata. Usually, our RO-Crates are single repositories, but for some, one repository contains multiple RO-Crates. An RO-Crate is manifest via a ro-crate-metadata.json file.
42
52
43
-
:::{note}
44
-
Once ARMS units are incorporated, this will also include ARMS images.
45
-
:::
53
+
An RO-Crate is a collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms.
46
54
47
-
### EMO-BON Metadata
55
+
-### ro-crate-metadata.json
56
+
A ro-crate-metadata.json file that describes the contents of an EMO-BON RO-Crate.
48
57
49
-
The data that is used specifically to describe EMO-BON data, performing the function of allowing discovery, understanding, organising, cataloguing, etc. Metadata are recorded in the ro-crate-metadata.json files; they are added to ENA accessions; they are in files in the EMO-BON repos governance-data, sequencing-data, observatory-profile, among others.
58
+
A ro-crate-metadata.json file is a JSON-LD file that provides a detailed description of the contents and structure of an RO-Crate. It maps relationships between files and their metadata, ensuring traceability, context, and purpose for data within research workflows.
50
59
51
-
### EMO-BON Record
60
+
-### Sequence
61
+
A DNA string. Specifically, we mean (raw) sequences as produced from the material samples by Genoscope and held on their cloud drive and then archived on ENA.
52
62
53
-
A digital representation of a sampling event, capturing the relevant data and metadata associated with it. There is no fixed idea of what is included in an EMO-BON record, as that depends on the system that these records are being held in; for example, EMO-BON records in EurOBIS and in Blue Cloud will not necessarily be the same.
63
+
-### Processed Sequences / OTUs / ASVs
64
+
These are sequences that have been processed by a bioinformatics code to a stage where they can be/have been compared to taxonomic reference libraries.
54
65
55
-
### Catalogue Asset
56
-
57
-
The smallest unit of "EMO-BON dataset" that goes into a dataset's metadata catalogue, i.e., a specific "EMO-BON record" in a specific catalogue. Can be a single data file or a set of files.
58
-
59
-
### EMO-BON Repository
60
-
61
-
A GitHub repository that contains EMO-BON data and metadata.
62
-
63
-
A GitHub repository represents a storage location for files and their version history, managed using Git version control which allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time.
64
-
65
-
### EMO-BON RO-Crate
66
-
67
-
EMO-BON RO-Crates contain data associated with: logsheets from observatories, MetaGOflow runs, sequencing metadata. Usually, our RO-Crates are single repositories, but for some, one repository contains multiple RO-Crates. An RO-Crate is manifest via a ro-crate-metadata.json file.
68
-
69
-
An RO-Crate is a collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms.
70
-
71
-
### ro-crate-metadata.json
72
-
73
-
A ro-crate-metadata.json file that describes the contents of an EMO-BON RO-Crate.
74
-
75
-
A ro-crate-metadata.json file is a JSON-LD file that provides a detailed description of the contents and structure of an RO-Crate. It maps relationships between files and their metadata, ensuring traceability, context, and purpose for data within research workflows.
76
-
77
-
### Sequence
78
-
79
-
A DNA string. Specifically, we mean (raw) sequences as produced from the material samples by Genoscope and held on their cloud drive and then archived on ENA.
80
-
81
-
### Processed Sequences / OTUs / ASVs
82
-
83
-
These are sequences that have been processed by a bioinformatics code to a stage where they can be/have been compared to taxonomic reference libraries.
84
-
85
-
-**OTU**: Operational Taxonomic Unit - a group of similar sequences
An automation and CI/CD platform built into GitHub that allows workflows to be triggered by repository events.
93
-
94
-
### DVC (Data Version Control)
95
-
96
-
A version control system for data and machine learning models that works alongside Git, storing large files in remote storage while tracking metadata in Git.
97
-
98
-
### SPARQL
73
+
-### GitHub Actions
74
+
An automation and CI/CD platform built into GitHub that allows workflows to be triggered by repository events.
99
75
100
-
A query language for RDF databases, used to query the EMO-BON knowledge graph.
76
+
-### DVC (Data Version Control)
77
+
A version control system for data and machine learning models that works alongside Git, storing large files in remote storage while tracking metadata in Git.
101
78
102
-
### RDF (Resource Description Framework)
79
+
-### SPARQL
80
+
A query language for RDF databases, used to query the EMO-BON knowledge graph.
103
81
104
-
A standard for representing information about resources in the form of subject-predicate-object triples.
82
+
-### RDF (Resource Description Framework)
83
+
A standard for representing information about resources in the form of subject-predicate-object triples.
105
84
106
-
### Turtle (TTL)
85
+
-### Turtle (TTL)
86
+
A textual syntax for RDF that is more human-readable than other RDF formats.
107
87
108
-
A textual syntax for RDF that is more human-readable than other RDF formats.
88
+
-### SHACL (Shapes Constraint Language)
89
+
A language for validating RDF data against a set of conditions (shapes).
109
90
110
-
### SHACL (Shapes Constraint Language)
91
+
-### JSON-LD (JSON for Linking Data)
92
+
A JSON-based format for representing linked data, used in ro-crate-metadata.json files.
111
93
112
-
A language for validating RDF data against a set of conditions (shapes).
94
+
-### ENA (European Nucleotide Archive)
95
+
One of the world's primary repositories for nucleotide sequence data, operated by EMBL-EBI.
113
96
114
-
### JSON-LD (JSON for Linking Data)
97
+
-### MetaGOflow
98
+
A bioinformatics workflow for processing metagenomic and metabarcoding sequence data.
115
99
116
-
A JSON-based format for representing linked data, used in ro-crate-metadata.json files.
100
+
-### UDAL (Universal Data Access Layer)
101
+
A unified interface for querying EMO-BON data across different sources and formats.
117
102
118
-
### ENA (European Nucleotide Archive)
119
-
120
-
One of the world's primary repositories for nucleotide sequence data, operated by EMBL-EBI.
121
-
122
-
### MetaGOflow
123
-
124
-
A bioinformatics workflow for processing metagenomic and metabarcoding sequence data.
125
-
126
-
### UDAL (Universal Data Access Layer)
127
-
128
-
A unified interface for querying EMO-BON data across different sources and formats.
129
-
130
-
### S3 (Simple Storage Service)
131
-
132
-
Amazon's object storage service, used by EMO-BON for storing large data files.
133
-
134
-
## Standards and Ontologies
135
-
136
-
### SOSA (Sensor, Observation, Sample, and Actuator)
137
-
138
-
An ontology for modeling observations, samples, and sampling.
139
-
140
-
### Dublin Core
141
-
142
-
A metadata standard providing a core set of vocabulary elements for describing resources.
143
-
144
-
### Darwin Core
145
-
146
-
A standard for sharing biodiversity information, particularly species occurrences.
147
-
148
-
### MIxS (Minimum Information about any Sequence)
149
-
150
-
A standard for describing genomic, metagenomic, and marker gene sequences.
151
-
152
-
### Schema.org
153
-
154
-
A collaborative project providing schemas for structured data on web pages.
103
+
-### S3 (Simple Storage Service)
104
+
Amazon's object storage service, used by EMO-BON for storing large data files.
0 commit comments