Code to report problem cells in Excel spreadsheets #6

vanderbi · 2018-06-01T23:18:23Z

FCE LTER scientists often concatenate multiple years of data into a single dataset to submit to me. Each year of data may have been managed in a slightly different way, e.g. using different date formats, different codes for missing data, different site codes, etc. It would be nice if there was a tool that could parse an Excel file and report which date formats are found, what the list of unique site names or species names is, what missing value codes are used (based on a known set of likely candidates), if there are empty cells with spaces in them, if there are some extraneous characters in the file not under column headers, and so on. It would also be nice to have code that would make educated guesses about the dataset and produce an EML attribute list as a starting point for documenting the table.

dpkode · 2018-06-11T01:59:45Z

This kind of problem rears its head in almost every data-ish field in which I've participated and working on approaches to address the specific problems like the ones you mention would be very practical. OpenRefine (http://openrefine.org/) is a non-scripting tool that could be helpful with some of these problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code to report problem cells in Excel spreadsheets #6

Code to report problem cells in Excel spreadsheets #6

vanderbi commented Jun 1, 2018

dpkode commented Jun 11, 2018

Code to report problem cells in Excel spreadsheets #6

Code to report problem cells in Excel spreadsheets #6

Comments

vanderbi commented Jun 1, 2018

dpkode commented Jun 11, 2018