Skip to content

Undocumented change to Set.from{Asc,Desc}List in containers-0.8? #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jonathanknowles opened this issue Mar 2, 2025 · 15 comments
Closed

Comments

@jonathanknowles
Copy link
Contributor

jonathanknowles commented Mar 2, 2025

There seems to be a small difference in the behaviour of Set.from{Asc,Desc}List between containers-0.7 and containers-0.8, specifically when these functions are applied to lists with duplicate elements.

  • In containers-0.7 these functions keep the first of any duplicates.
  • In containers-0.8 these functions keep the last of any duplicates.

I'm guessing the change was introduced in #1083, which fixes #1032.

As I understand it, this change makes the behaviour of Set.from{Asc,Desc}List more consistent with the equivalent Map.from{Asc,Desc}List functions, which seems very reasonable!

Though perhaps we could add something about this to the changelog for 0.8? (Even though this behaviour was previously unspecified!)

Perhaps something like this:

Breaking changes

  • The fromAscList and fromDescList functions for Set, when applied to lists with duplicate elements, now keep the last of any duplicates. This is a change from the behaviour of containers-0.7, where these functions would keep the first of any duplicates. The new behaviour is now consistent with the equivalent functions for Map, which also keep the last of any key-value mappings that share a duplicate key.

If people are happy with this wording (or similar), I'd be happy to create a PR to add it.

I ran into this while looking at nonempty-containers, which is affected by this change. See:

I'm guessing that other downstream libraries could also be affected. (But I don't currently have other examples to share.)

@treeowl
Copy link
Contributor

treeowl commented Mar 2, 2025

Uh oh. That's not good. Does anyone know where the change was introduced? It may have been an accident.

@meooow25
Copy link
Contributor

meooow25 commented Mar 2, 2025

Does anyone know where the change was introduced?

I'm guessing the change was introduced in #1083, which fixes #1032.

That is correct.

I do not want to document and commit to this behavior (as I wrote in #1032 (comment)), which is why I did not add a changelog entry for it. Map and Set functions currently do not specify what happens when keys compare equal but aren't really equal (#587). This situation could be improved if someone figures out a reasonable convention to be applied throughout, but I think piecemeal fixes to specify the current behavior will just be confusing. Until then, it is best to leave it unspecified.

Since nonempty-containers is testing unspecified behavior, it should not be surprised to see a change. As another example, one could test that takeWhileAntitone behaves a particular way for a predicate that is not monotonic. Then result would be consistent but unspecified, and may change from version to version if internals change.

@treeowl
Copy link
Contributor

treeowl commented Mar 2, 2025

@meooow25 , I thought we had documentation for at least some functions that indicates the behavior in these cases. Or is that only for Map? I can check later.

@meooow25
Copy link
Contributor

meooow25 commented Mar 2, 2025

Yes I recall seeing it for Data.Set.intersection, but I did not check why it was added there.

For Map we do specify which value is kept if the same key maps to different values which are combined in some way, but we do not specify the behavior for keys.

@jonathanknowles
Copy link
Contributor Author

jonathanknowles commented Mar 4, 2025

I do not want to document and commit to this behavior (as I wrote in #1032 (comment)), which is why I did not add a changelog entry for it. Map and Set functions currently do not specify what happens when keys compare equal but aren't really equal (#587). This situation could be improved if someone figures out a reasonable convention to be applied throughout, but I think piecemeal fixes to specify the current behavior will just be confusing. Until then, it is best to leave it unspecified.

@meooow25 Many thanks for commenting on this—I really appreciate it.

I can definitely understand the preference to keep this unspecified.

Perhaps this an example of Hyrum's Law 😄?

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviours of your system
will be depended on by somebody.

Having said that, I noticed that Data.Map.splitRoot has a disclaimer like this:

No guarantee is made as to the sizes of the pieces; an internal, but deterministic process determines this. However, it is guaranteed that the pieces returned will be in ascending order.

Do you think it would be worth adding a similar disclaimer to the documentation for Set.fromAscList and Set.fromDescList?

Perhaps something like:

  Build a set from an ascending list in linear time.
+ If the list contains duplicate elements, no guarantee is made as to which of the duplicate elements will be retained.

@meooow25
Copy link
Contributor

Perhaps this an example of Hyrum's Law 😄?

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviours of your system
will be depended on by somebody.

It's certainly possible😄, but I would call a change "breaking" only if it breaks the behavior promised in the contract.

Do you think it would be worth adding a similar disclaimer to the documentation for Set.fromAscList and Set.fromDescList?

Perhaps something like:

Build a set from an ascending list in linear time.

  • If the list contains duplicate elements, no guarantee is made as to which of the duplicate elements will be retained.

Personally I wouldn't mind if it said that. But one could argue that the same disclaimer should be added to every function with this issue, which seems like a step too far given that there are many of them and we haven't taken a stance on the right behavior for them.

(btw sorry it took a while to respond, I took a break to focus on other things)

@jonathanknowles jonathanknowles changed the title Undocumented breaking change for Set.from{Asc,Desc}List in containers-0.8? Undocumented change for Set.from{Asc,Desc}List in containers-0.8? Mar 16, 2025
@jonathanknowles jonathanknowles changed the title Undocumented change for Set.from{Asc,Desc}List in containers-0.8? Undocumented change to Set.from{Asc,Desc}List in containers-0.8? Mar 16, 2025
@jonathanknowles
Copy link
Contributor Author

jonathanknowles commented Mar 17, 2025

I would call a change "breaking" only if it breaks the behavior promised in the contract.

My apologies -- I've adjusted the title of this issue to remove "breaking".

Personally I wouldn't mind if it said that. But one could argue that the same disclaimer should be added to every function with this issue, which seems like a step too far given that there are many of them and we haven't taken a stance on the right behavior for them.

That's a fair point. Do we know how many functions this actually affects?

Looking over the Set interface, it seems that many functions requiring de-duplication already include a public statement about how they handle duplicates:

  • insert :: Ord a => a -> Set a -> Set a
    If the set already contains an element equal to the given value, it is replaced with the new value.

  • alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> Set a -> f (Set a)
    Note that unlike insert, alterF will not replace an element equal to the given value.

  • union :: Ord a => Set a -> Set a -> Set a
    The union of two sets, preferring the first set when equal elements are encountered.

  • intersection :: Ord a => Set a -> Set a -> Set a
    Elements of the result come from the first set.

That leaves the following functions currently without an explicit statement:

  • fromList :: Ord a => [a] -> Set a
  • fromAscList :: Eq a => [a] -> Set a
  • fromDescList :: Eq a => [a] -> Set a
  • map :: Ord b => (a -> b) -> Set a -> Set b

We could clarify intentions about their behaviour with statements like:

  • For fromList, fromAscList, and fromDescList:
    If the list contains duplicate elements, no guarantee is made as to which will be retained.
  • For map:
    If the function produces duplicate values, no guarantee is made as to which will be retained.

Justification: Since some functions already describe their de-duplication strategy while others don't, it might not be obvious to users whether this difference is intentional. This could lead to incorrect assumptions about which elements are preferred, based on similar statements made for other functions.

Perhaps adding explicit "no guarantee" statements would resolve this inconsistency and help to clarify the intended behaviour for users? This could also serve as a record of the current policy for those reviewing/maintaining the source code.

(btw sorry it took a while to respond, I took a break to focus on other things)

No worries -- I hope you were able to enjoy your break!

@meooow25
Copy link
Contributor

  • insert :: Ord a => a -> Set a -> Set a
    If the set already contains an element equal to the given value, it is replaced with the new value.

  • alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> Set a -> f (Set a)
    Note that unlike insert, alterF will not replace an element equal to the given value.

Oh, we already have an inconsistency 🤦

That leaves the following functions currently without an explicit statement:

  • fromList :: Ord a => [a] -> Set a

  • fromAscList :: Eq a => [a] -> Set a

  • fromDescList :: Eq a => [a] -> Set a

  • map :: Ord b => (a -> b) -> Set a -> Set b

This may be the list for Set, but recall that Map has the same problem with keys (#587).

Justification: Since some functions already describe their de-duplication strategy while others don't, it might not be obvious to users whether this difference is intentional. This could lead to incorrect assumptions about which elements are preferred, based on similar statements made for other functions.

That's a good point.

Perhaps adding explicit "no guarantee" statements would resolve this inconsistency and help to clarify the intended behaviour for users? This could also serve as a record of the current policy for those reviewing/maintaining the source code.

Sure, let's go with that, I'll get off the fence. It's a mess, but at least it will be a well-documented mess.

@jonathanknowles
Copy link
Contributor Author

Sure, let's go with that, I'll get off the fence. It's a mess, but at least it will be a well-documented mess.

👍🏻

Thank-you @meooow25 for considering this. I've created a PR here: #1127

Would definitely appreciate feedback -- particularly if anyone can think of a better way to word these statements!

@jwaldmann
Copy link
Contributor

is the meaning of "duplicate" clear enough? If the element/Key type is polymorphic (not Int), there could be non-standard Eq/Ord instances. I think that current implementation only uses Ord, never Eq. Is this part of the API contract? Actually, I am surprised to find:

ghci> import Data.Set
ghci> newtype K = K Int deriving Show
ghci> instance Eq K where (==) = error "not implemented"
ghci> deriving instance Ord K

ghci> fromAscList [K 1, K 2, K 2, K 3]
fromList *** Exception: not implemented
CallStack (from HasCallStack):
  error, called at <interactive>:3:28 in interactive:Ghci3

ghci> fromList [K 1, K 2, K 2, K 3]
fromList [K 1,K 2,K 3]

@jonathanknowles
Copy link
Contributor Author

jonathanknowles commented Mar 25, 2025

@jwaldmann wrote:

I think that current implementation only uses Ord, never Eq. Is this part of the API contract?

Set.fromAscList only has an Eq constraint:

fromAscList :: Eq a => [a] -> Set a

Internally, it only uses == to test for duplicates:

fromAscList xs = ascLinkAll (Foldable.foldl' next Nada xs)
where
next stk !y = case stk of
Push x l stk'
| y == x -> Push y l stk'
| Tip <- l -> ascLinkTop stk' 1 (singleton x) y
| otherwise -> Push y Tip stk
Nada -> Push y Tip stk

Of course, this only works if the precondition for fromAscList is satisfied, namely that the input list is already in ascending order.


As for Set.fromList (which does require Ord):

Internally this repeatedly applies insertB, which relies on Ord.compare:

insertB :: Ord a => a -> SetBuilder a -> SetBuilder a
insertB !y b = case b of
BAsc stk -> case stk of
Push x l stk' -> case compare y x of
LT -> BSet (insert y (ascLinkAll stk))
EQ -> BAsc (Push y l stk')
GT -> case l of
Tip -> BAsc (ascLinkTop stk' 1 (singleton x) y)
Bin{} -> BAsc (Push y Tip stk)
Nada -> BAsc (Push y Tip Nada)
BSet m -> BSet (insert y m)

@jwaldmann
Copy link
Contributor

Oh right, those types make sense. Still, they imply different meanings for "duplicate". I am not sure whether this needs to be spelled out in the docs.

@jonathanknowles
Copy link
Contributor Author

jonathanknowles commented Mar 25, 2025

@jwaldmann Regarding:

is the meaning of "duplicate" clear enough?
they imply different meanings for "duplicate". I am not sure whether this needs to be spelled out in the docs.

I think this is a valid point.

For from{Asc,Desc}List, I believe we only have one valid choice, which is to define "duplication" in terms of Eq.==. We could phrase it like this:

If the list contains duplicate elements, no guarantee is made as to which will be retained. Note that elements x and y are considered duplicates if and only if x == y.

What we might write for fromList seems less obvious, as different implementations of this function could rely on different functions (from Eq or Ord), including compare.

One approach could be to explicitly state whichever function the implementation uses:

If the list contains duplicate elements, no guarantee is made as to which will be retained. Note that elements x and y are considered duplicates if and only if compare x y == EQ.

@meooow25
Copy link
Contributor

is the meaning of "duplicate" clear enough?

That seems fine to me, we don't need to spell it out further.

Still, they imply different meanings for "duplicate".

I think we can safely assume Eq and Ord instances agree on equality. While it's not a law, it is "expected to hold" according to Ord docs.

@meooow25
Copy link
Contributor

Closing this given the Set behavior is now documented. (I'm still thinking about the bigger question)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants