Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest data w/multiple timestamp formats; detect; QA/QC; store as ISO #10

Open
jamesrco opened this issue Jun 10, 2018 · 3 comments
Open

Comments

@jamesrco
Copy link
Member

I often find I'm given multiple, timestamped datasets that belong to the same or complementary projects. Future users of these data (including me) often want to compare/evaluate these data side-by-side since they're related, but issues with the timestamps sometimes make this a huge pain. I end up spending a large amount of time manually reformatting timestamps or cajoling them into the "right" format. I've found a lot of the existing tools in R, Python, and MATLAB which purport to assist with these kinds of tasks to be lacking.

The data I receive often have one or more of the following features:

  1. Timestamps are in different formats completely
  2. Some times are in GMT; others are local to a particular time zone (and often, this time zone is not even explicitly specified)
  3. Missing timestamps, or timestamps simply missing one component of the timestamp
  4. Data messed up by Microsoft Excel in which points collected "on the hour" produce entries that omit minutes and seconds

I propose developing a set of tools in R and/or Python to assist in ingesting/formatting, and storing timestamps in a common format. Could make use of some existing functionality in existing R or Python packages; our product could even been a modification/contribution to the existing package code.

@jpshanno
Copy link
Member

Some of the timestamp issues are taken care of lubridate in R. I think the function is parse_datetime(), it lets you specify multiple timestamp formats and set how many levels of detail can be truncated before throwing an error.

@jamesrco
Copy link
Member Author

Thanks @jpshanno -- I have played around with lubridate quite a bit but will give it another look! I've found the primary issue is often that information such as the correct time zone/offset is easily lost when different packages or functions are used for interconversion of POSIXlt and POSIXct objects. Unfortunately, I've found the use of multiple functions/packages necessary since they don't all play nice with ISO 8601... something I've found surprising.

@jpshanno
Copy link
Member

Yeah, multiple packages can definitely wreak some havoc, maybe we can find a solution this week.

If I remember right the functionality was a little buried, but I have some lubridate code that uses the multiple formats that I can share with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants