Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray.__eq__ with str dtype #4

Open
kwgoodman opened this issue Jul 24, 2010 · 5 comments
Open

DataArray.__eq__ with str dtype #4

kwgoodman opened this issue Jul 24, 2010 · 5 comments
Labels

Comments

@kwgoodman
Copy link
Contributor

This looks good:

>> x = DataArray([1, 2])
>> x == 1
   DataArray([ True, False], dtype=bool)
   (None,)

This doesn't (should return DataArrays):

>> x = DataArray(['a', 'b'])
>> x == 'a'
   array([ True, False], dtype=bool)       
>> x == 1
   False
@fperez
Copy link
Collaborator

fperez commented Jul 29, 2010

This is what numpy does:

In [38]: x = np.array([1,2])

In [39]: x  == 1
Out[39]: array([ True, False], dtype=bool)

In [40]: x = np.array(['a', 'b'])

In [41]: x == 'a'
Out[41]: array([ True, False], dtype=bool)

In [42]: x == 1
Out[42]: False

@fperez
Copy link
Collaborator

fperez commented Jul 29, 2010

We may not be able to change the fact that the last case drops to a boolean False and doesn't return an array, but at least we should fix the second example so that we return a DatArray and not a plain array.

@terhorst
Copy link
Contributor

terhorst commented Jun 2, 2011

Hrm, this seems like a potential headache. If you dig through the Numpy source, comparisons between character arrays follow a special code path that is different from the other comparison logic (see compare_chararrays() in numpy/core/src/multiarray/multiarraymodule.c).

One potential fix would be to override __eq__ in DataArray; not sure if this is a good idea.

@terhorst
Copy link
Contributor

terhorst commented Jun 2, 2011

Here is another quirk:

>>> A = DataArray([[1, 2, 3], [4, 5, 6]])
>>> B = DataArray([[1, 2, 3], [4, 5, 6]], 'ab')
>>> C = DataArray([[1, 2, 3], [4, 5, 6]], 'cd')
>>> A == B
DataArray(array([[ True,  True,  True],
       [ True,  True,  True]], dtype=bool),
('a', 'b'))
>>> A == C
DataArray(array([[ True,  True,  True],
       [ True,  True,  True]], dtype=bool),
('c', 'd'))
>>> B == C
False

@fperez
Copy link
Collaborator

fperez commented Jun 3, 2011

I'm not too crazy about the idea of overriding __eq__, the more special methods we override, the trickier merging back with numpy will be. What I don't understand is, why do we end up witha base array on output for chararrays? Even if numpy takes a different codepath for char arrays, it should still honor the policy of using our finalizers to return our own class instead of the base one, no? I may be mistaken, but this sounds to me more like a numpy problem than a datarray one...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants