Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code to report problem cells in Excel spreadsheets #6

Open
vanderbi opened this issue Jun 1, 2018 · 1 comment
Open

Code to report problem cells in Excel spreadsheets #6

vanderbi opened this issue Jun 1, 2018 · 1 comment

Comments

@vanderbi
Copy link

vanderbi commented Jun 1, 2018

FCE LTER scientists often concatenate multiple years of data into a single dataset to submit to me. Each year of data may have been managed in a slightly different way, e.g. using different date formats, different codes for missing data, different site codes, etc. It would be nice if there was a tool that could parse an Excel file and report which date formats are found, what the list of unique site names or species names is, what missing value codes are used (based on a known set of likely candidates), if there are empty cells with spaces in them, if there are some extraneous characters in the file not under column headers, and so on. It would also be nice to have code that would make educated guesses about the dataset and produce an EML attribute list as a starting point for documenting the table.

@dpkode
Copy link

dpkode commented Jun 11, 2018

This kind of problem rears its head in almost every data-ish field in which I've participated and working on approaches to address the specific problems like the ones you mention would be very practical. OpenRefine (http://openrefine.org/) is a non-scripting tool that could be helpful with some of these problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants