Using CSVW to annotate Soil Observation Data #181
Replies: 3 comments 3 replies
-
|
Hi Paul, CSVWcsvw does look like a nice light weight solution for table annotation. It seems very similar to YAML and JSON in many respects and comparing the three could be really interesting. Why did you pick csvw rather then yaml or json here, it seems like an obscure choice? This triggered a lot of other thoughts for me about metadata and what we ask of data providers that I'll go into in other comments to try to keep the reply threads cleaner. -Kathe |
Beta Was this translation helpful? Give feedback.
-
No metadata, only dataI'm increasingly coming to the view that there is no metadata only data when considering researcher provided information. This has lead to us adopting a generalized data tuple Most of the meta-data in this case then becomes the vocabulary that supports these fields rather then information from the researcher. I'll admit that we are just starting to really test this generalized format in production but it's prototyped well so far. This puts the burden of stitching data together from multiple sources on the re-use team not on the original data providers. |
Beta Was this translation helpful? Give feedback.
-
Unstandardized standardsData providers shouldn't be using templates or standardized data models, especially for research results. From a philosophical prospective, research is all about novelty so this often translates into a new non-standard measurement, sample prep, or treatment. Using something like drop-down terms/codes then become problematic when the exact variant isn't there. In addition, the level of detail needed to be captured varies for each synthesis end use. All of this leads to very frustrated data providers in my experience. Instead we should honor how researchers already share data: tables, figures, methods, and protocols. Researchers organize their data according to their mental model of the system, are already trained to write reproducible documentation (methods sections), and many even create protocols for lab technicians or colleagues to follow when collecting the data. Digitizing and linking these data into coherent collections is then fit for purpose for the reanalysis. All that being said, I do have one ask to our data providing colleagues: Please use a flat text file rather then some proprietary spreadsheet/database! So I guess I do have standards here :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team, let me pitch an idea here which we're investigating to share and later harmonize soil observation data (we is a cluster of ISRIC/Wageningen university in the scope of Soilwise HE project)
Background
Many researchers share their data using tabular formats (excel, csv, dbf), usually in combination with a report or readme where the individual fields are explained (featueofinterest, observed property, unit of measure, procedure). Our thought was, if we can endorse researchers to use a machine readable format for the readme, machines would be able to combine both facts into a rich data structure.
The W3C csvw approach seems a standardized approach to cover this scenario.
Approach
We identified csvwlib as a lean solution to work with csvw annotated data. We used the tool to set up a number of data experiments in the soilwise repository. Example 3 shows nicely how a csv is converted to SOSA triples using the csvw annotations. We also have an approach using schema.org ontology. Another experimental script will be able to transform the graph to a relational database following the iso28258 structure. Also interesting is a shacl validation to validate any SOSA graph.
Findings
Although the approach seems fully valid and capable to manage the case, we noticed it is challenging for soil scientists to compile a csvw annotations file. So we are designing tooling which will support that activity. A vba tool within excel, a web tool and a llm based tool. But also an intermediate step, in which scientists add the required information (featueofinterest, observed property, unit of measure, procedure) in a basic CSV format.
Welcoming your thoughts/ideas, bye Paul.
Beta Was this translation helpful? Give feedback.
All reactions