Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename MultiSet to Bag, MultiDict => MultiMap? #17

Open
joshlemer opened this issue Apr 25, 2019 · 16 comments
Open

Rename MultiSet to Bag, MultiDict => MultiMap? #17

joshlemer opened this issue Apr 25, 2019 · 16 comments

Comments

@joshlemer
Copy link
Contributor

Bag is a lot less verbose than MultiSet, and MultiDict is also pretty awkward to try and say aloud.

@julienrf
Copy link
Collaborator

The name MultiDict was chosen to not conflict with the existing MultiMap.

Is Bag an established name for sets containing possibly multiple occurrences of their elements?

@joshlemer
Copy link
Contributor Author

Bag does appear to be an established name:

https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset
https://en.wikipedia.org/wiki/Multiset
https://commons.apache.org/proper/commons-collections/userguide.html#Bags

but you're likely right, that MultiSet is more commonly used, I'm not sure.

That's too bad about the MultiMap conflict.

@joshlemer
Copy link
Contributor Author

This does raise the question of why we need a mutable.MultiMap in stdlib as well as a mutable.MultiDict in collections-contrib. Is there an end game of deprecating the stdlib version?

@julienrf
Copy link
Collaborator

I’m in favor of deprecating mutable.MultiMap but anyways we need to make both solutions cohabit together during the transition period.

@Ichoran
Copy link

Ichoran commented Apr 29, 2019

Do we need both a counted set and a set which retains original copies? Although both can occasionally be useful, I would tend to say that most uses of the latter would just as well be served by a MultiMap. It's only the former that is really awkward and inefficient.

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So 👎 for MultiSet to Bag naming, assuming that MultiSet is a counted set. If it is not a counted set, 👎 for having it at all.

Also 👎 to introducing a name collision between the old MultiMap trait and a new MultiMap that may have both immutable and mutable variants. MultiDict is awkward, though. In principle, 👍 for finding something better. ManyMap or KeyBag perhaps. If KeyBag, then I guess MultiSet as a plain Bag is okay. Or maybe have all three: CountedBag, Bag, and KeyBag. But they will all be distinct APIs, I suspect, so maybe Bag is again bad.

@julienrf
Copy link
Collaborator

@joshlemer
Copy link
Contributor Author

joshlemer commented Apr 30, 2019

Thanks @Ichoran ! Some points:

However, I don't think the counted set is very intuitively a Bag. That's not how physical bags work: you might not care that apples are different, but you get the same one out that you put in. So 👎 for MultiSet to Bag naming

So if I interpret correctly, you're saying that Bag is unintuitive, because when you insert an element into it, you expect that exact object (up to reference identity) should be found in the resulting Bag. This critique is fair enough in isolation but in Scala-land the ethos is generally that we don't care about reference equality and only care about value equality. For instance, you might have made the same critique of Sets -- that you expect that when you put an apple in a Set and get it back out, you get the same one you put in.

Now for some quick points for the name change:

  • Bag has 3 letters vs MultiSet's 8, so it's much more convenient to type, which makes a pretty big difference when we treat collection apply as "literal" syntax (Bag(...)). The length savings is the same difference as Seq vs Sequence for instance. This may seem like a silly point, but anecdotally, I've even heard of people using Seq over List just to save one char.

  • While MultiSet is one accepted term and possible the most popular, Bag is not too much less popular, and avoids potentially confusing people into thinking that a MultiSet is some kind of special case of Set (a la HashSet/TreeSet/etc), when in fact MultiSets/Bags have nothing to do with Sets in the Iterable hierarchy, and do not behave like Sets.

  • I think that the Bag analogy very nicely visualizes what it is we're describing. A bag is an unordered collection of data, and a bag is equal to an other bag if and only if the two contain the same elements (including arity), ignoring order. The term Bag really drives home that order is ignored, IMO, because bags are all jumbled up/unstructured inside.

@Ichoran
Copy link

Ichoran commented Apr 30, 2019

You can't get elements out of sets, and if you manage to, you do get one that you put in (if you put in duplicates, you obviously only get one of them out).

The bag is acting more like a bank: you deposit an item, and you get out an equivalent item, but not necessarily the one you put in. (Though it's weirder if counted--it will give you back the same item multiple times, rather than drawing separate instances from a vault of available instances.)

The point about the length of the name is a good one, though. That may be important enough to override all the other concerns.

@Ichoran
Copy link

Ichoran commented Apr 30, 2019

Oh, also, next time my groceries end up packed all jumbled up/unstructured, I will remind myself that it is because they were put in a bag 😂

@julienrf
Copy link
Collaborator

julienrf commented May 1, 2019

I’m convinced that Bag is a good name for counted sets. But the problem is more on the side of MultiDict. I’m not sure that KeyBag makes sense here. A MultiDict (or MultiMap) associates a set of values to a key.

@joshlemer
Copy link
Contributor Author

@julienrf I also don't think KeyBag is an improvement over MultiDict. MultiMap is probably best, save for the name collision. If I had to think of a different name, and I do prefer to go with practical/"blue collar" names over academic ones, I might consider something like OneToMany but I'm not sure that's necessarily any better than MultiDict (though it has exact same length). 🤷‍♂

joshlemer added a commit to joshlemer/scala-collection-contrib that referenced this issue Jul 3, 2019
joshlemer added a commit to joshlemer/scala-collection-contrib that referenced this issue Jul 3, 2019
joshlemer pushed a commit to joshlemer/scala-collection-contrib that referenced this issue Oct 16, 2019
@er1c
Copy link

er1c commented Jun 30, 2020

Copied from #98

TLDR; I think it should be named SetMultiMap

I think it may make the most sense to rename MultiDict to SetMultiMap, and potentially make MultiMap into a generic trait.

  1. It helps remove the @Deprecation warning and name conflict with the 2.12/2.13 MultiMap
  2. "MultiMap" is a more universally used terminology for this type of collection. (e.g. Guava has ArrayListMultimap, ListMultimap, SetMultimap, and etc; C++ STL is std::multimap; Rust is MultiMap, Apache Commons is MultiMap, etc)
  3. The existing name also does support or convey the collection type of the key's values. (e.g. it currently implements Set, but there is no way to extend it to use an Array, Vector or List instead)
  4. The code still uses MultiMap in various locations (like in the hashcode!), and the CC[_] is MapFactory (e.g. not DictFactory)
  5. There are no other Dictionary terms in any of the Scala collections. It's Map for any scala code, only the java converters have the mention of a dictionary. MultiDict isn't scala idiomatic.

@joshlemer
Copy link
Contributor Author

@er1c I agree that MultiMap and MultiSet (I would prefer Bag) should be abstract data types (akin to Seq, Map, Set). I also tried to point this out in #25. I think that everyone agrees that 'MultiDict' is much worse a name than MultiMap, but it was named that way to avoid collisions with the stdlib implementation.

I agree that there would be valid usecases for MultiMaps where the underlying value collection is a Set or a Seq or even a Bag. Perhaps the best way to achieve this would be to utilize the existing Iterable Factories so that users can pass in whatever kind of collection they like? Maybe this is possible:

trait MultiMap[K, V, C[_]] { ... }

object MultiMap {
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}

// user code
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2, 1 -> 3)

Perhaps some reasonable default value collection should be chosen (probably Set?) so that users can still initialize a MultiMap without specifying the value:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = ???
}


// user code:

MultiMap(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Set]
MultiMap.withValuesIn(List)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, List]
MultiMap.withValuesIn(Bag)(1 -> 1, 1 -> 2) // MultiMap[Int, Int, Bag]

If that route is taken then I would suggest making a static factory for each of the most common collection types so that we avoid allocating on each MultiMap creation, so:

object MultiMap extends Factory[(K, V), MultiMap[K, V, Set]] {
  /// implement factory...

  // additional method
  private final val ListMultiMapFactory = ???
  private final vall SetMultiMapFactory = ???
  private final val VectorMultiMapFactory = ???
  def withValuesIn[C[_]](factory: IterableFactory[C]): Factory[(K, V), MultiMap[K, V, C] = factory match {
    case List => ListMultiMapFactory
    case Vector => VectorMultiMapFactory
    case Set => SetMultiMapFactory
    case other => new MultiMapFactory(other)
  }
}

Care should also ideally be taken to design the api so that users that just want to use the default collection don't have to specify the higher kinded type parameter everywhere. This is the same principle that was applied to the new views which make them much more ergonomic (so, View[Int] rather than View[Int, List[Int]]). Maybe MultiMap[K, V] extends a more general trait like MultiMapWithValues[K, V, C[_]].

Or maybe I'm just over complicating things, just giving ideas.

@er1c
Copy link

er1c commented Jun 30, 2020

trait MultiMap[K, V, C[_]] { ... }
👍

This is exactly what I was thinking. I think the only other option is MultiMap[K, V, C <: Iterable[V]]

My brain starts going into CanBuildFrom implicits when thinking about it, so I probably need to brush up more on the 2.13 collections to see a better builder strategy :)

@SethTisue
Copy link
Member

abandoned draft PR: #24

@joshlemer joshlemer changed the title Consider renaming MultiSet => Bag, MultiDict => MultiMap? Rename MultiSet to Bag, MultiDict => MultiMap? Feb 10, 2021
@som-snytt
Copy link
Contributor

Bagwell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants