-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spool.select behaves errors out when fed array/when samples parameter is False #447
Comments
Ok, so Is this the same issue with using samples in spool.select you mentioned earlier? |
Yes, select_distance is an array and this is the same issue we talked about. @ahmadtourei mentioned that a way of doing this might be to sub-select in the individual patches as they are accessed. Do you agree with that? |
Hey @aissah, as I mentioned, you have found a bug in I have tested this on a directory spool and got the following ---------------------------------------------------------------------------
ParameterError Traceback (most recent call last)
Cell In[28], [line 10](vscode-notebook-cell:?execution_count=28&line=10)
[7](vscode-notebook-cell:?execution_count=28&line=7) sp = dc.spool("/u/pa/nb/tourei/scratch/dascore_ambient_noise_pipeline/Kafadar_data_dasdae/")
[9](vscode-notebook-cell:?execution_count=28&line=9) sub_sp = sp.select(distance=(1, 3), samples=True)
---> [10](vscode-notebook-cell:?execution_count=28&line=10) sub_sp[0]
File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:387, in DataFrameSpool.__getitem__(self, item)
[381](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:381) out = self.new_from_df(
[382](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:382) df=new_df,
[383](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:383) instruction_df=new_inst,
[384](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:384) source_df=new_source,
[385](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:385) )
[386](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:386) else: # a single index was used, should return a single patch
--> [387](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:387) out = self._unbox_patch(self._get_patches_from_index(item))
[388](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:388) return out
File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:416, in DataFrameSpool._get_patches_from_index(self, df_ind)
[414](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:414) assert not df1.empty
[415](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:415) joined = df1.join(source.drop(columns=df1.columns, errors="ignore"))
--> [416](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:416) return self._patch_from_instruction_df(joined)
File /wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:426, in DataFrameSpool._patch_from_instruction_df(self, joined)
[423](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:423) for patch_kwargs in df_dict_list:
[424](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:424) # convert kwargs to format understood by parser/patch.select
[425](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:425) kwargs = _convert_min_max_in_kwargs(patch_kwargs, joined)
--> [426](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:426) patch = self._load_patch(kwargs)
[427](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:427) # If the limits of the source patch were not modified, we can just
[428](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:428) # use the select kwargs. This is important for missing coordinates
[429](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:429) # (NaN values) to not get trimmed out.
[430](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/core/spool.py:430) if kwargs.get("_modified"):
File /wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:129, in DirectorySpool._load_patch(self, kwargs)
[127](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:127) final_kwargs = dict(kwargs)
[128](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:128) final_kwargs.update(self._select_kwargs)
--> [129](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:129) patch = dc.read(**final_kwargs)[0]
[130](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/clients/dirspool.py:130) return patch
File /wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:633, in read(path, file_format, file_version, time, distance, **kwargs)
[631](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:631) required_type = fiber_io.read._required_type
[632](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:632) path = man.get_resource(required_type)
--> [633](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:633) out = fiber_io.read(
[634](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:634) path,
[635](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:635) file_version=file_version,
[636](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/io/core.py:636) time=time,
...
[674](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:674) ):
[675](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:675) msg = "When samples=True, values must be integers."
--> [676](https://vscode-remote+ssh-002dremote-002bwendian-002emines-002eedu.vscode-resource.vscode-cdn.net/wendianHome/u/pa/nb/tourei/dascore/dascore/utils/misc.py:676) raise ParameterError(msg)
ParameterError: When samples=True, values must be integers. We need to fix this. Meanwhile, I suggested for now you apply |
related to #436 |
Description
When the samples parameter is set to True, the subspool returned by spool.select contains no patches. This issue seems to behave differently with different formats. In my situation, I have a file format PRODML, and I can get around this with a suggestion from @d-chambers to set the samples parameter to False and modify the query accordingly. However, when I try this with some of the dascore example patches, I get an error as long as the query is an array as opposed to numbers indicating the start and end of the select range.
Example
set-up
import dascore as dc
import numpy as np
mem_spool = dc.examples.random_spool()
dir_spool = dc.examples.spool_to_directory(mem_spool)
spool = dc.spool(dir_spool)
distance_coords = spool[0].coords.get_array('distance')
select_distance = distance_coords[np.arange(0, 298)]
This produces an error:
sub_spool = spool.select(distance=(select_distance))
print(sub_spool[0])
This does not produce an error:
start, end = 0, 100
sub_spool = spool.select(distance=(start, end))
print(sub_spool[0])
In the case of the PRODML files, the above cases work, but this case produces an error:
select_channels = np.arange(0, 298)
sub_spool = spool.select(distance=(select_channels), select=True)
print(sub_spool[0])
Expected behavior
Select data from some channels or distances into a new spool.
Versions
The text was updated successfully, but these errors were encountered: