You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A data challenge that I wrestle with is integrating multiple streams of linked or related data. An example would be a research effort that involves collecting environmental samples then running a series of analyses on those samples. The analyses could be field measurements, a manual process conducted in the lab, addressed using instrumentation, or many others, and any combination of those. The outcome of each step or analysis must be related to other outcomes or workflows. I use custom web applications and databases to address this but that approach is complicated and a lot of work. It would be great if there was a platform or tool that could be used for such workflows that would be generalizable enough to cover a wide array of situations and use- cases.
The text was updated successfully, but these errors were encountered:
Some of the tidyr functions in R (notably gather, spread, separate and unite, linked with base R merge) could be helpful with this.... The big challenge will be that the links that glue things together for merging are often fuzzier than one might like.... I've got one dataset where some data is reported to the year, month and day, but other related data are only reported to the nearest year and month. Coding of stations can also be inconsistent..... An interesting problem!
NEON uses "named locations", date/time, sample ID, and sample class to link together this type of data on the OS side. For the IS side we use a "measurement stream", which is a combination of sensor (e.g., all air temperature sensors are given a unique ID and also an ID for the part number that they all share), sensor stream (i.e., temperature or pressure, etc.), and named location. NEON's isn't necessarily the best framework, but whatever is used, a consistent ontology and database to track terms and IDs is essential.
A data challenge that I wrestle with is integrating multiple streams of linked or related data. An example would be a research effort that involves collecting environmental samples then running a series of analyses on those samples. The analyses could be field measurements, a manual process conducted in the lab, addressed using instrumentation, or many others, and any combination of those. The outcome of each step or analysis must be related to other outcomes or workflows. I use custom web applications and databases to address this but that approach is complicated and a lot of work. It would be great if there was a platform or tool that could be used for such workflows that would be generalizable enough to cover a wide array of situations and use- cases.
The text was updated successfully, but these errors were encountered: