Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_screen fails #56

Closed
dwr-psandhu opened this issue Oct 4, 2024 · 10 comments
Closed

auto_screen fails #56

dwr-psandhu opened this issue Oct 4, 2024 · 10 comments
Assignees

Comments

@dwr-psandhu
Copy link
Contributor

  • version: Sept 27th, 2024 commit
  • Python version: 3.11
  • Operating System: windows

Description

auto_screen fails

auto_screen --config %SCREEN_CONFIG% --fpath %SRCDIR% --dest screened --plot_dest plots --params ssc

Relevant failure stack trace

{'method': 'median_test_twoside', 'args': {'level': 5, 'filt_len': 7, 'quantiles': [0.005, 0.995]}}
Traceback (most recent call last):
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\auto_screen-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line [566](https://dwrmsapp0246.ad.water.ca.gov:8080/view/Datastore/job/datastore_auto_screen/151/pipeline-console/?start-byte=0&selected-node=155#log-566), in main
    auto_screen(
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line 266, in auto_screen
    screened = screener(
               ^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line 74, in screener
    anomaly = method(ts_process, **args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\vtools\functions\error_detect.py", line 279, in median_test_twoside
    filt = dds.rolling(filt_len,center=True).apply(lambda x: np.nanmedian(x[medseq]),raw=True,engine='numba').compute()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask_expr\_collection.py", line 481, in compute
    return DaskMethodsMixin.compute(out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask\base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask\base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask_expr\_expr.py", line 1043, in _combined_parts
    raise NotImplementedError(msg)
NotImplementedError: Partition size is less than overlapping window size. Try using ``df.repartition`` to increase the partition size.```
@dwr-psandhu
Copy link
Contributor Author

Now it fails on the formatted step.

y:\jenkins_repo_staging\continuous>call d:\ProgramData\miniconda3\condabin\conda activate dms_datastore   & call usgs_multi --fpath formatted 
2024-10-04 02:50:40 - Entering process_multivariate_usgs
Traceback (most recent call last):
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\usgs_multi-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 278, in main
    process_multivariate_usgs(fpath=fpath,pat=pat,rescan=True)
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 159, in process_multivariate_usgs
    df = usgs_multivariate(pat,'usgs_subloc_meta_new.csv')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 113, in usgs_multivariate
    for s in series:
             ^^^^^^
UnboundLocalError: cannot access local variable 'series' where it is not associated with a value
[Pipeline] }

@dwr-psandhu
Copy link
Contributor Author

stale issue. closing

@dwr-psandhu
Copy link
Contributor Author

@water-e @esatel This issue is back again. The auto screening is failing but the problem may have started elsewhere?

Performing: median_oneside_forward
ccy default ph
step:
{'method': 'median_test_oneside', 'label': 'median_oneside_forward', 'args': {'level': 5, 'filt_len': 5, 'quantiles': [0.03, 0.97], 'reverse': False}}
Traceback (most recent call last):
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\auto_screen-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line 566, in main
    auto_screen(
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line 266, in auto_screen
    screened = screener(
               ^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\auto_screen.py", line 74, in screener
    anomaly = method(ts_process, **args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\vtools\functions\error_detect.py", line 140, in median_test_oneside
    res = (dds.ts - dds.pred).compute()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask_expr\_collection.py", line 480, in compute
    return DaskMethodsMixin.compute(out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask\base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask\base.py", line [660](https://dwrmsapp0246:8080/job/datastore_auto_screen/265/pipeline-console/?start-byte=0&selected-node=173#log-660), in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dask_expr\_expr.py", line 1042, in _combined_parts
    raise NotImplementedError(msg)
NotImplementedError: Partition size is less than overlapping window size. Try using ``df.repartition`` to increase the partition size.
script returned exit code 1

@dwr-psandhu
Copy link
Contributor Author

Looks like there was failure in the downloading of the raw data. The auto screening is where the failure took place.
@water-e we just need to implement the test and checks for each stage so we don't run subsequent stages like reformat or screening if the download has issues. e.g. empty files etc.

@water-e
Copy link
Contributor

water-e commented Jan 24, 2025

There should be checking of empty files for all/most providers. An empty raw file should be discovered and removed if it doesn't meet a minimum file size in write_ts (which is very small). I'm surprised reformat doesn't catch it as well. Are there specifics to this?

@dwr-psandhu
Copy link
Contributor Author

No indication beyond the screening of ph and do files failing because there was no data. See the traceback above.

i am pasting possible relevant info from that traceback below

Performing: median_oneside_forward
ccy default ph
step:
{'method': 'median_test_oneside', 'label': 'median_oneside_forward', 'args': {'level': 5, 'filt_len': 5, 'quantiles': [0.03, 0.97], 'reverse': False}}
....

@water-e
Copy link
Contributor

water-e commented Jan 24, 2025

It is a vague message and situation, and we will probably have to fix that if we want to make progress. Are there logging messages or prints? Is there an opportunity to print out a file name. We need to know what file we are talking about. I don't see that. Additionally, it would actually be an advantage if this would NOT get restarted. I fear the missing file will just succeed on the second pass.

The fact that it results in a hard stop is something we could probably adjust with try/except stuff, but it takes care. The problem is that this cheating compounds over time -- if I gloss over something quirky I usually just don't get the file forever after.

@water-e
Copy link
Contributor

water-e commented Jan 24, 2025

Generally speaking I feel like we should be protected against writing trivially small files. However I just made up the required number of lines -- maybe the median test requires a particular minimum number of values. Let me know if you need me to look for file info to add so we get a better message. I don't think we can fix the last time but we might avoid the next incident.

@dwr-psandhu
Copy link
Contributor Author

I looked into the station on the dashboard and it looks like this station just started reporting ph. So might be a genuine case of a sensor coming online.
https://dwrbdodatastore.azurewebsites.net/repoui?sdate=2025-01-01T00%3A00&edate=2025-01-24T21%3A55&repo_level=%5B%27screened%27%5D&selections=ccy%7C%7Cph

@water-e
Copy link
Contributor

water-e commented Jan 26, 2025

Is this still an issue? I see that the station is mentioned on the traceback, so that was the part I was missing. However the screened file for 2025 seems to be in continuous/repo/screened. Did you take an action?

As mentioned above, these things are hard to tackle without a reproducible case. For reformat and maybe other tasks I used something called /quarantine. If a process got stuck, I put the offending file there.

Otherwise if it works the next time we run it the offiending file is wiped out and we don't figure out what went wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants