Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spool.select behaves errors out when fed array/when samples parameter is False #447

Open
aissah opened this issue Oct 11, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@aissah
Copy link
Collaborator

aissah commented Oct 11, 2024

Description

When the samples parameter is set to True, the subspool returned by spool.select contains no patches. This issue seems to behave differently with different formats. In my situation, I have a file format PRODML, and I can get around this with a suggestion from @d-chambers to set the samples parameter to False and modify the query accordingly. However, when I try this with some of the dascore example patches, I get an error as long as the query is an array as opposed to numbers indicating the start and end of the select range.

Example

set-up
import dascore as dc
import numpy as np

mem_spool = dc.examples.random_spool()
dir_spool = dc.examples.spool_to_directory(mem_spool)
spool = dc.spool(dir_spool)

distance_coords = spool[0].coords.get_array('distance')
select_distance = distance_coords[np.arange(0, 298)]

This produces an error:
sub_spool = spool.select(distance=(select_distance))
print(sub_spool[0])

This does not produce an error:
start, end = 0, 100
sub_spool = spool.select(distance=(start, end))
print(sub_spool[0])

In the case of the PRODML files, the above cases work, but this case produces an error:
select_channels = np.arange(0, 298)
sub_spool = spool.select(distance=(select_channels), select=True)
print(sub_spool[0])

Expected behavior

Select data from some channels or distances into a new spool.

Versions

  • OS: Windows 11
  • DasCore Version: 0.1.3
  • Python Version: 3.11
@aissah aissah added the bug Something isn't working label Oct 11, 2024
@d-chambers
Copy link
Contributor

This produces an error:
sub_spool = spool.select(distance=(select_distance))
print(sub_spool[0])

Ok, so select_distance here is an array right? Currently DASCore can't do that because we don't know anything beyond the start/stop range of a Patch's coords until we load the patch.

Is this the same issue with using samples in spool.select you mentioned earlier?

@aissah
Copy link
Collaborator Author

aissah commented Oct 18, 2024

Yes, select_distance is an array and this is the same issue we talked about. @ahmadtourei mentioned that a way of doing this might be to sub-select in the individual patches as they are accessed. Do you agree with that?

@ahmadtourei
Copy link
Collaborator

import dascore as dc
import numpy as np

Hey @aissah, as I mentioned, you have found a bug in spool.select.

I have tested this on a directory spool and got the following ParameterError error related to the samples argument:

---------------------------------------------------------------------------
ParameterError                            Traceback (most recent call last)
Cell In[28], [line 10](vscode-notebook-cell:?execution_count=28&line=10)
      [7](vscode-notebook-cell:?execution_count=28&line=7) sp = dc.spool("/u/pa/nb/tourei/scratch/dascore_ambient_noise_pipeline/Kafadar_data_dasdae/")
      [9](vscode-notebook-cell:?execution_count=28&line=9) sub_sp = sp.select(distance=(1, 3), samples=True)
---> [10](vscode-notebook-cell:?execution_count=28&line=10) sub_sp[0]

File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:387, in DataFrameSpool.__getitem__(self, item)
    [381](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:381)     out = self.new_from_df(
    [382](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:382)         df=new_df,
    [383](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:383)         instruction_df=new_inst,
    [384](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:384)         source_df=new_source,
    [385](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:385)     )
    [386](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:386) else:  # a single index was used, should return a single patch
--> [387](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:387)     out = self._unbox_patch(self._get_patches_from_index(item))
    [388](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:388) return out

File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:416, in DataFrameSpool._get_patches_from_index(self, df_ind)
    [414](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:414) assert not df1.empty
    [415](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:415) joined = df1.join(source.drop(columns=df1.columns, errors="ignore"))
--> [416](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:416) return self._patch_from_instruction_df(joined)

File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:426, in DataFrameSpool._patch_from_instruction_df(self, joined)
    [423](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:423) for patch_kwargs in df_dict_list:
    [424](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:424)     # convert kwargs to format understood by parser/patch.select
    [425](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:425)     kwargs = _convert_min_max_in_kwargs(patch_kwargs, joined)
--> [426](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:426)     patch = self._load_patch(kwargs)
    [427](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:427)     # If the limits of the source patch were not modified, we can just
    [428](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:428)     # use the select kwargs. This is important for missing coordinates
    [429](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:429)     # (NaN values) to not get trimmed out.
    [430](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:430)     if kwargs.get("_modified"):

File /wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:129, in DirectorySpool._load_patch(self, kwargs)
    [127](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:127) final_kwargs = dict(kwargs)
    [128](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:128) final_kwargs.update(self._select_kwargs)
--> [129](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:129) patch = dc.read(**final_kwargs)[0]
    [130](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:130) return patch

File /wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:633, in read(path, file_format, file_version, time, distance, **kwargs)
    [631](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:631) required_type = fiber_io.read._required_type
    [632](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:632) path = man.get_resource(required_type)
--> [633](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:633) out = fiber_io.read(
    [634](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:634)     path,
    [635](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:635)     file_version=file_version,
    [636](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:636)     time=time,
...
    [674](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:674) ):
    [675](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:675)     msg = "When samples=True, values must be integers."
--> [676](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:676)     raise ParameterError(msg)

ParameterError: When samples=True, values must be integers.

We need to fix this. Meanwhile, I suggested for now you apply select method on each patch (in a for loop as a preprocessing step), or you apply select on the spool but set the samples argument False and adjust your select range. I hope this clarifies our earlier conversation.

@ahmadtourei
Copy link
Collaborator

related to #436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants