-
Notifications
You must be signed in to change notification settings - Fork 25
Quick sort and related Arrays utilities #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The latter doesn’t work well in CI here because the /verificationLogger options cause extra, unpredictable output lines
src/Comparison.dfy
Outdated
| cmp(t0, t1) | ||
| } | ||
|
|
||
| predicate Complete??(t0: T, t1: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the ?? suffix. We may want to engage in some bikeshedding about style here before we introduce it. Based on my experience even Valid? below is not common, Valid is much more common and is even baked into the language through {:autocontracts}.
I'm not necessarily to using ? in user-defined constructs, I'd just like to agree on concrete guidance in the style guide on it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see where the idea came from at least - you almost want to overload Complete? to accept either a pair or a set.
Could this be CompleteOnPair perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the ?? was due to have two layers. Happy with calling them Complete and Complete' (the OCaml style) or Complete1 (the Lisp style) or even Complete_ or Complete2. CompletePair is long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's go with removing all of the single ? suffixes (except for Le? and Ge? above), and replacing all of the double ?? suffixes with a ' instead.
So in this case Complete?? will become Complete', and Complete? will become Complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So naturally I've focussed most of my reviewing capacity on arguing about question marks. :) I do actually think it's a really important point for readability though, so I'd love to nail that down first before I look more deeply at the proofs.
| reads arr | ||
| ensures t in arr[lo..hi] | ||
| { | ||
| arr[lo] // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either implement this or drop the function and just inline this choice of pivot above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's implement it — it's not hard and it's an important part of making the algorithm safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per offline conversation we decided not to block on this, but in that case we should add a comment on QuickSort that it doesn't yet have the intended O(n log n) worst case runtime yet (which is what @cpitclaudel was referring to as "safe").
src/Collections/Arrays/Arrays.dfy
Outdated
| } | ||
| } | ||
|
|
||
| lemma Sortable_Slice(arr: array<T>, lo: int, hi: int, lo': int, hi': int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| lemma Sortable_Slice(arr: array<T>, lo: int, hi: int, lo': int, hi': int) | |
| lemma SortableSlice(arr: array<T>, lo: int, hi: int, lo': int, hi': int) |
Only instance of an underscore I see here and it really sticks out.
Also consider SortableSubslice instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SortableSlice or SortableSubslice both makes sense putting Subslice describes it better.
| import opened Comparison | ||
| import opened Wrappers | ||
|
|
||
| trait Sorter<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs an example in examples to demonstrate using it. I assume you have to not only provide a concrete class that implements Sorter<T> and fills in Cmp, but perhaps even has to invoke a lemma or two from the trait in order to prove that it works.
Is it possible to also provide a PredicateSorter class that is constructed around a (T, T) -> Cmp function value? Perhaps it could even be a partial function if you also provide a ghost set of all values you intend to sort with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, I'd strongly prefer to have an example, but the PredicateSorter idea can wait for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yikes, I went to write an example just to play around with it and immediately got the A class may only extend a trait in the same module, unless the parent trait is annotated with {:termination false} error. That means this is currently useless unless we add {:termination false}.
Unfortunately we should treat that as a hard blocker given there's already been resistance to putting {:termination false} on any standard library code: #22 And especially given the discovery of dafny-lang/dafny#2500 I'm not inclined to push back on that.
We can think about whether there's a good alternative for this code that doesn't use traits, but it might be better to stick with this if it's good UX and at least wait for the solution to dafny-lang/dafny#2500.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not right: the proper API to use is PredicateSorter, which is a regular class, defined right below Sorter. That doesn't require :termination false.
Original code here, with examples: https://github.com/dafny-lang/compiler-bootstrap/blob/4f616822209828e48cf63d3da66ee1c010f689d4/src/Utils/Lib.Sort.dfy#L396-L420
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cpitclaudel, I see the issue now is this PR doesn't include the PredicateSorter class. We should just add that as well.
| const Le? := this != Gt | ||
| const Ge? := this != Lt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely like this use of ? at least, since it's a very natural extension of the built-in Lt?/Eq?/Gt? discriminator syntax.
| Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts) | ||
| } | ||
|
|
||
| predicate Sorted(sq: seq<T>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this have a ? as well then?
I'll be happiest if we come up with a simple rule for where ? belongs. I definitely don't think it should be every predicate, especially since changing Valid to Valid? will be very disruptive for arguably not a lot of benefit. It looks like the pattern here is if the predicate is "about" the receiver rather than the arguments, but that feels very fuzzy to me, and I don't find it improves readability personally.
Perhaps to start, the straw-person proposal could be to only use it in symbols that aren't used like function calls, such as the Le? and Ge? constants above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment.
src/Comparison.dfy
Outdated
| cmp(t0, t1) | ||
| } | ||
|
|
||
| predicate Complete??(t0: T, t1: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see where the idea came from at least - you almost want to overload Complete? to accept either a pair or a set.
Could this be CompleteOnPair perhaps?
| requires 0 <= lo <= hi <= arr.Length | ||
| reads arr | ||
| { | ||
| multiset(arr[lo..hi]) == old(multiset(arr[lo..hi])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember adding this to almost every every function method in MergeSort.dfy will need to replace that with shuffled instead.
Co-authored-by: Robin Salkeld <[email protected]>
| import opened Comparison | ||
| import opened Wrappers | ||
|
|
||
| trait Sorter<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, I'd strongly prefer to have an example, but the PredicateSorter idea can wait for a future PR.
src/Comparison.dfy
Outdated
| cmp(t0, t1) | ||
| } | ||
|
|
||
| predicate Complete??(t0: T, t1: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's go with removing all of the single ? suffixes (except for Le? and Ge? above), and replacing all of the double ?? suffixes with a ' instead.
So in this case Complete?? will become Complete', and Complete? will become Complete.
src/Comparison.dfy
Outdated
| {} | ||
|
|
||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | |
| } |
src/Comparison.dfy
Outdated
| } | ||
|
|
||
| predicate {:opaque} Valid?(ts: set<T>) { | ||
| Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts) | |
| Complete?(ts) && */ Transitive?(ts) |
Just following the general rule of not checking in commented out code. We could put a comment saying something like "we want to include Antisymmetric(ts) as well but it isn't necessary and makes verification timeout" (for example, I don't actually know in this case).
| } | ||
|
|
||
| predicate Complete??(t0: T, t1: T) { | ||
| cmp(t0, t1) == cmp(t1, t0).Flip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It threw me a bit that I didn't recognize this isn't one of the standard properties of total orderings, until I realized that "Transitive" really meant "cmp(...).Le? is a transitive binary relation" and similarly for "Reflexive". Is "complete" a standard term for this property of this kind of comparator?
I expect at the point where we try to connect Relations to this version, we'll have a lemma that proves that C.Valid(s) ==> Relations.TotalOrdering((t1, t2) => C.cmp(t1, t2).Le?)
src/Collections/Arrays/Arrays.dfy
Outdated
| ensures Shuffled(arr, lo, hi) | ||
| {} | ||
|
|
||
| twostate predicate SameElements(arr: array<T>, lo: int, hi: int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider UnchangedExceptSliceShuffled instead.
| reads arr | ||
| ensures t in arr[lo..hi] | ||
| { | ||
| arr[lo] // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per offline conversation we decided not to block on this, but in that case we should add a comment on QuickSort that it doesn't yet have the intended O(n log n) worst case runtime yet (which is what @cpitclaudel was referring to as "safe").
| Complete?(ts) && /* Antisymmetric?(ts) && */ Transitive?(ts) | ||
| } | ||
|
|
||
| predicate Sorted(sq: seq<T>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I've realized this won't work as is and can't be made to work safely with current Dafny limitations - see my first last comment. We'll have to pause on this for a while.
| cmp(t0, t1) | ||
| } | ||
|
|
||
| predicate CompleteonPair(t0: T, t1: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| predicate CompleteonPair(t0: T, t1: T) { | |
| predicate Complete'(t0: T, t1: T) { |
CompleteonPair was my earlier not-as-good suggestion, let's do Complete' instead :)
|
|
||
| function method Cmp(t0: T, t1: T): Cmp | ||
|
|
||
| twostate predicate UnchangedSlice(arr: array<T>, lo: int, hi: int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just food for thought, or a potential learning exercise: It might make all of this more readable if we had an actual ArraySlice datatype to pass around containing these values. It's a great example of the classic Design Pattern of introducing a type to hold collections of values that are always passed around together. Something like:
datatype ArraySlice_<T> = ArraySlice_(arr: array<T>, lo: int, hi: int) {
predicate Valid() {
0 <= lo <= hi <= arr.Length
}
twostate predicate Unchanged()
requires Valid()
reads arr
{
arr[lo..hi] == old(arr[lo..hi])
}
// etc.
}
type ArraySlice<T> = a: ArraySlice_<T> | a.Valid() witness *AFAICT this would be useful even if we only used it in ghost code to avoid any runtime cost (and even then it's possible we could optimize the wrapper away in the future, just as we now do for single-field datatypes).
| import opened Comparison | ||
| import opened Wrappers | ||
|
|
||
| trait Sorter<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yikes, I went to write an example just to play around with it and immediately got the A class may only extend a trait in the same module, unless the parent trait is annotated with {:termination false} error. That means this is currently useless unless we add {:termination false}.
Unfortunately we should treat that as a hard blocker given there's already been resistance to putting {:termination false} on any standard library code: #22 And especially given the discovery of dafny-lang/dafny#2500 I'm not inclined to push back on that.
We can think about whether there's a good alternative for this code that doesn't use traits, but it might be better to stick with this if it's good UX and at least wait for the solution to dafny-lang/dafny#2500.
Fix #54
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.