Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timestamp HHMMSS using processqueue.py -p #96

Open
dnadeau-lanl opened this issue Nov 20, 2021 · 2 comments
Open

Handle timestamp HHMMSS using processqueue.py -p #96

dnadeau-lanl opened this issue Nov 20, 2021 · 2 comments
Labels
enhancement New feature or feature request help wanted Extra attention is needed

Comments

@dnadeau-lanl
Copy link
Contributor

Use more than daily date for processing hourly files. Some files are received every hour, and are discarded from the process queue script and only the newest file is being procesed.

Relation to an issue

#38

Proposed enhancement

Process file within a day.

For example the following files are not correctly processed by dbprocessing.
The inspector extract the Day-of-Year (288) and convert into date for unixtime. Even if we pass the HHMMSS to unix_start_time and unix_stop_time they are discarded when running the process queue script to start processing.

 STP6_2020288230002_SENH_VC34.0
 STP6_2020288220002_SENH_VC34.0
 STP6_2020288210002_SENH_VC34.0

When running processqueue.py -p we get the following message and only the last file is retain.

https://github.com/spacepy/dbprocessing/blob/master/dbprocessing/dbprocessing.py#L368

  • Was not newest version in buildChildren

Proposed code

https://github.com/spacepy/dbprocessing/blob/master/dbprocessing/DButils.py#L683

dates = [file.utc_start_time, file.utc_stop_time]
latest = self.getFilesByProductTime(product_id, dates, newest_version=True)

https://github.com/spacepy/dbprocessing/blob/master/dbprocessing/dbprocessing.py#L296-L301

start_time = sq.utc_start_time
stop_time = sq.utc_stop_time

https://github.com/spacepy/dbprocessing/blob/master/dbprocessing/runMe.py#L294-L295
Comment out thise lines

#if isinstance(utc_file_date, datetime.datetime):
#    utc_file_date = utc_file_date.date()

Alternatives

They are currently no alternatives.

I tried using "RUN", "FILE" as output_timebase instead of "DAILY". Using the changes above, I am currently processing using output_timebase="FILE". I would like to use output_timebase="RUN" with no output_product.

OS, Python version, and dependency version information:

Version of dbprocessing

spacepy/master branch

Closure condition

This issue should be closed when:

  1. code can handle to files with same date but different timestamp (HHMMSS)
  2. tests are written and updated to handle these cases. Test will fail since they check only date in the format YYY-MM-DD without checking the timestamp (HHMMSS)
@dnadeau-lanl dnadeau-lanl added the enhancement New feature or feature request label Nov 20, 2021
@dnadeau-lanl dnadeau-lanl added the help wanted Extra attention is needed label Nov 20, 2021
@jtniehof
Copy link
Member

This is going to need substantial work, since it requires a change to a key dbprocessing design concept: product plus utc_file_date plus version is unique, and (similarly) there is only one newest version for a given utc_file_date and product.

The utc_file_date is intentionally distinct from the utc_start_time and utc_stop_time. utc_file_date is the "characteristic" date of the file (generally represented in the filename) and the utc_start_time/utc_stop time are the actual first and last timestamps in the file. In some cases they don't line up perfectly (ECT L0 and L0.5 files in particular.) This is where #97 falls apart.

One of the things we had in mind was to extend out the timebase support to MONTHLY and YEARLY. HOURLY doesn't fit in quite the same mold, but is possible. It will certainly require database changes.

So I think there's a lot of preparing-the-ground work for this:

@jtniehof
Copy link
Member

I'm thinking that if utc_file_date changes to something like utc_file_start or utc_file_period_start or something like that, this is probably pretty doable. I'll keep calling it utc_file_date for now, but it would represent the "characteristic" start of the file, i.e. what "should" be in it. So for a DAILY file, it would be the same, YYYY-MM-DD 00:00:00, with the idea anything before YYYY-MM-DD+1 00:00:00 "belongs" in there (for the sake of doing searches for input files.) HOURLY would have, say YYYY-MM-DD 00:00:00 but then that file is anything before YYYY-MM-DD 01:00:00. MONTHLY is an obvious extension.

WEEKLY gets tricky, but can be punted as we don't need it right now.

You don't have anything weird that requires, say, two-hour files, do you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or feature request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants