You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FCE LTER scientists often concatenate multiple years of data into a single dataset to submit to me. Each year of data may have been managed in a slightly different way, e.g. using different date formats, different codes for missing data, different site codes, etc. It would be nice if there was a tool that could parse an Excel file and report which date formats are found, what the list of unique site names or species names is, what missing value codes are used (based on a known set of likely candidates), if there are empty cells with spaces in them, if there are some extraneous characters in the file not under column headers, and so on. It would also be nice to have code that would make educated guesses about the dataset and produce an EML attribute list as a starting point for documenting the table.
The text was updated successfully, but these errors were encountered:
This kind of problem rears its head in almost every data-ish field in which I've participated and working on approaches to address the specific problems like the ones you mention would be very practical. OpenRefine (http://openrefine.org/) is a non-scripting tool that could be helpful with some of these problems.
FCE LTER scientists often concatenate multiple years of data into a single dataset to submit to me. Each year of data may have been managed in a slightly different way, e.g. using different date formats, different codes for missing data, different site codes, etc. It would be nice if there was a tool that could parse an Excel file and report which date formats are found, what the list of unique site names or species names is, what missing value codes are used (based on a known set of likely candidates), if there are empty cells with spaces in them, if there are some extraneous characters in the file not under column headers, and so on. It would also be nice to have code that would make educated guesses about the dataset and produce an EML attribute list as a starting point for documenting the table.
The text was updated successfully, but these errors were encountered: