Accessing non-scalar, non-innate features in RTDC_Hierarchy objects leads to Error #165

maxschloegel · 2022-02-04T17:47:28Z

When registering a non-scalar feature (i.e. temporary feature or plugin feature) and creating a RTDC_Hierarchy object, leads to *** TypeError: Boolean indexing array has incompatible shape.

This seems to be due to the indexing behaviour of RTDC_Hierarchy instances in its __getitem__()-function
This is caused by more complex indexing behaviour of higher-dimensional np.ndarrays.

The text was updated successfully, but these errors were encountered:

maxschloegel · 2022-02-07T12:15:10Z

One option to solve this issue is changing the boolean mask, used in __getitem__()-function to a numpy-array of indices, by using np.where(mask).

This can be done by changing:

            item = self.hparent[key]
            return item[self.hparent.filter.all]

to

            item = self.hparent[key]
            idxs = np.where(self.hparent.filter.all)
            return item[idxs]

The "only" downside of that is, that the loaded non-scalar features are not loaded as HDF5-dataset (like other non-scalar features), but as np.ndarray instead. This implies that the array is loaded completely into memory.
Since this use case is only an edge case and I will only need it once for exporting the Hierarchy-Datasets as hdf5-datasets (where the length of the Hierarchy-Datasets is not very large), this might not be an issue, but on the other hand it is not a 100% clean solution.
What do you think @paulmueller

maxschloegel · 2022-02-07T12:22:49Z

While running the tests, I had the feeling that it might have taken a bit longer than normally. I will run the tests for the original and the new coda and compare the time. I don't want to introduce inefficient code here.

paulmueller · 2022-02-07T12:59:51Z

Another option would be to introduce a wrapper class that returns a lazy object which can then be indexed via __getitem__ - similar to how it should work with image and mask.

EDIT: I think you would want to (a) generalize the ChildImage, ChildImageBG and ChildMask classes (pass feature name to __init__) into one class and (b) use the plugin feature's "feature shape" metadata (it should not contain nan's, otherwise raise a NotImplementedError) to determine whether we have to return N-dimensional data.

I did this for data export like so: https://github.com/ZELLMECHANIK-DRESDEN/dclab/blob/0.39.14/dclab/rtdc_dataset/writer.py#L180-L207

maxschloegel · 2022-02-08T18:15:00Z

Ok, so I implemented a generalized ChildNonScalar class, including non-scalar PluginFeatures, TemporaryFeatures and "image", "image_bg" and "mask".
Since this is not specific to a PluginFeature, but the issue also occurred when using TemporaryFeatures, I wanted to use shape or ndim of ch.hparent[key], but that threw problems with TDMS-tests (specifically for ImageColumn, where in one test, the TDMS-dataset did not contain the feature "image", so the "image"-list was empty and the shape could not be computed.)Anyways, we can talk about that at some point.
Due to these problems I now tested it slightly different:

def __getitem__(self, key):
    # contour, image, and traces are added automatically
    # to `self._events` in `self.apply_filter`.
    if key in self._events:
        return self._events[key]
    elif key in self.features_scalar or key in dfn.FEATURES_NON_SCALAR:
        item = self.hparent[key]
        return item[self.hparent.filter.all]
    else:
        self._events.setdefault(key, ChildNonScalar(self, key))
        return self[key]

Where or key in dfn.FEATURES_NON_SCALAR was included to deal with the TDMS-problem.

I feel not 100% happy with the solution, because I basically had to add or key in dfn.FEATURES_NON_SCALAR just for the TDMS-problem, but it felt like going down the road of finding this issue would be quite some deep rabbit hole.

maxschloegel · 2022-02-08T18:16:16Z

For completeness sake, here the implementation of ChildNonScalar:

class ChildNonScalar(ChildBase):
    def __init__(self, child, key):
        super(ChildNonScalar, self).__init__(child)
        self.key = key

    def __getitem__(self, idx):
        pidx = map_indices_child2parent(child=self.child,
                                        child_indices=idx)
        hp = self.child.hparent
        return hp[self.key][pidx]

    @property
    def shape(self):
        hp = self.child.hparent
        return tuple([len(self)] + list(hp[self.key][0].shape))

Other suggestions for the name are more than welcome!

paulmueller · 2022-02-08T20:57:07Z

Maybe ChildNDArray (to isolate it from ChildContour and ChildTrace) would be a better name.

The tdms thing looks like a bug and should probably be handled in a separate new issue. It might make sense to extend that not-100%-clean case you added by checking that the root parent of the hierarchy child is a tmds dataset and link to the new issue (until that is fixed).

Regarding temporary features, maybe the user should specify the dimension here as well. There should be a general consensus on how to address the dimensionality of features in dclab. Maybe we will have to make sure that either (A) users are forced to specify the number of dimensions or (B) dclab gets smarter in figuring out the dimensionality. It is probably smart to turn this into another new issue. (dclab became smarter when I fixed #166, at least when it comes to data export to HDF5).

maxschloegel · 2022-02-09T15:18:03Z

ChildNDArray sounds good, I will use it instead.
The bug in the tdms-problem was addressed in PR #167.

Regarding the registration of features happening in two different places (apply_filter() and __getitem__()), I created another issue issue #168 . For now I will implement the changes needed to resolve this current issue and issue #168 addresses the improvement of the code's cleanness.

…lass DC-analysis#165

…hy class (DC-analysis#165)

maxschloegel mentioned this issue Feb 4, 2022

Exporting RTDC_HDF5 object with non-scalar, non-innate feature and active filters will throw Error #166

Closed

maxschloegel self-assigned this Feb 7, 2022

maxschloegel mentioned this issue Feb 9, 2022

Refactor registration of features in RTDC-Hierarchy class #168

Open

maxschloegel added a commit to maxschloegel/dclab that referenced this issue Feb 9, 2022

fix: implementing non-scalar features for children DC-analysis#165

9af9002

maxschloegel added a commit to maxschloegel/dclab that referenced this issue Feb 9, 2022

fix: implementing non-scalar features for children DC-analysis#165

d56dc77

maxschloegel added a commit to maxschloegel/dclab that referenced this issue Feb 9, 2022

fix: implement non-scalar features for RTDC-Hierarchy DC-analysis#165

677fa86

maxschloegel added a commit to maxschloegel/dclab that referenced this issue Feb 9, 2022

fix: fix accessing inherited non-scalar features for RTDC-Hierarchy c…

673edfa

…lass DC-analysis#165

maxschloegel mentioned this issue Feb 9, 2022

Issue 165 non scalar feat hierarchy #169

Merged

maxschloegel added a commit to maxschloegel/dclab that referenced this issue Feb 9, 2022

fix: fix accessing inherited non-scalar features for the RTDC_Hierarc…

043f6a1

…hy class (DC-analysis#165)

paulmueller closed this as completed in #169 Feb 10, 2022

paulmueller pushed a commit that referenced this issue Feb 10, 2022

fix: implementing non-scalar features for children (close #165)

a48f372

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing non-scalar, non-innate features in RTDC_Hierarchy objects leads to Error #165

Accessing non-scalar, non-innate features in RTDC_Hierarchy objects leads to Error #165

maxschloegel commented Feb 4, 2022 •

edited

Loading

maxschloegel commented Feb 7, 2022

maxschloegel commented Feb 7, 2022

paulmueller commented Feb 7, 2022 •

edited

Loading

maxschloegel commented Feb 8, 2022

maxschloegel commented Feb 8, 2022

paulmueller commented Feb 8, 2022 •

edited

Loading

maxschloegel commented Feb 9, 2022

Accessing non-scalar, non-innate features in RTDC_Hierarchy objects leads to Error #165

Accessing non-scalar, non-innate features in RTDC_Hierarchy objects leads to Error #165

Comments

maxschloegel commented Feb 4, 2022 • edited Loading

maxschloegel commented Feb 7, 2022

maxschloegel commented Feb 7, 2022

paulmueller commented Feb 7, 2022 • edited Loading

maxschloegel commented Feb 8, 2022

maxschloegel commented Feb 8, 2022

paulmueller commented Feb 8, 2022 • edited Loading

maxschloegel commented Feb 9, 2022

maxschloegel commented Feb 4, 2022 •

edited

Loading

paulmueller commented Feb 7, 2022 •

edited

Loading

paulmueller commented Feb 8, 2022 •

edited

Loading