Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve names only once #2169

Open
facundominguez opened this issue Apr 25, 2023 · 12 comments
Open

Resolve names only once #2169

facundominguez opened this issue Apr 25, 2023 · 12 comments

Comments

@facundominguez
Copy link
Collaborator

facundominguez commented Apr 25, 2023

#2303 adds a minimal test showing the problem.

Currently, names in specifications are resolved multiple times. Name resolution determines for each name in a spec, what is the Haskell definition it is associated to. Type and data constructors exist both in the logic and in Haskell, and signatures with refinement types are also linked to Haskell definitions.

These names are resolved once when compiling the module containing the specs, and they are resolved again when compiling a module that transitively imports the module containing the specs. This is not ideal for the duplicated work, but it is also problematic because the implementation doesn't ensure that names would resolve in the same way on all of the attempts.

Resolving names requires looking up an identifier in an environment storing all the things that GHC knows to be in scope. When resolving names for transitive dependencies, a single environment is used to resolve the names of all of the transitively imported modules. In order to disambiguate names that could refer to one of many definitions in Haskell, the resolution algorithm considers whether the name has a qualifier, and what is the name in which the spec has been defined. But this strategy not always yields the same resolution as obtained when compiling the imported module, which uses, understandably, a smaller environment to resolve the names.

This issue is about making name resolution more predictable. The current workaround is to manually fully qualify names in specifications, although there are restrictions to do this when it comes to data specifications.

Part of the solution would likely be storing specifications in interface files after their names have been fully qualified with the module of origin of the thing referred by the name.

Another related concern is that variable names should also include the package of origin, in case multiple packages define modules with the same names.

There is work in progress in liquidhaskell and liquid-fixpoint.

@kd1729
Copy link

kd1729 commented Mar 3, 2024

Hi @facundominguez, I would like to take up this idea for GSOC. I am making the proposal, how can I get it reviewed?

@facundominguez
Copy link
Collaborator Author

Hello @kd1729! You can email it to me. I'm also reachable via the liquidhaskell slack.

@ninioArtillero
Copy link

Greetings @facundominguez , I'm also interested in the idea! Would get in touch :)

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 26, 2024

A little bit of analysis follows.

To do name resolution, the current implementation changes the representation of variables from Symbol (in essence a string) to GHC's Var (which is roughly a Name paired with a Type, and perhaps its unfoldings). Name is better than Symbol because it can tell the origin of an identifier, and can disambiguate names that come even from modules with the same name.

Until here, we are good. But then Liquid Haskell discards the Vars to keep Symbols as identifiers before storing the specs in interface files. Vars have multiple references to various bits of information that make them impractical to serialize. But saving Symbols is also problematic, because it forces resolving names again when the specs are read from the interface files.

A possible solution is to use Name instead of Symbol before storing the specs. This allows to recover the Vars from the Names regardless of the particular environment in which the lookup is made.

Alternative: always qualify the Symbols before storing specs

If the symbols are qualified with their module of origin, it should be possible to recover a Var from it. However, this has some disadvantages:

  1. the name might not be available in the environment. This requires more legwork in LH to find the corresponding Var
  2. the name might be ambiguous if there are multiple modules with the same name
  3. the LH developer is responsible for getting every name correctly qualified, it is easier to ensure this by using Name and enrolling the compiler in detecting omissions.

Implementation strategy

This plan aims to serialize Names instead of Symbols.

A goal of the following stages is to keep tests passing all the way to completion. The first two stages are refactorings that prepare the scene to do the behavior changes in small increments during stage 3.

Stage 1

Before serializing a spec, use a pair (Symbol, Name) to identify each variable (instead of just Symbol). In this way we serialize more than we need, and when reading the spec, LH can easily discard the Name in the second component.

Stage 2

Convert the deserialized Names to Vars, and propagate the Vars to all the places where their Symbols are used, but keep using the Symbols. Sticking to the Symbols makes this a refactoring that shouldn't affect the behavior of LH still.

It is unclear to me if Names can be converted to Vars without using the IO monad in the GHC API. If the IO monad is needed, they might need to be converted upfront.

To integrate these changes, the spec of the current module, which starts with Symbols (that is the BareSpec), might need to be augmented to have each symbol accompanied with the corresponding Var. Otherwise, the spec of the current module and the imported specs might not be able to be combined.

Stage 3

One usage site at a time, stop using the Symbol and start using the corresponding Var. See that tests continue to pass after every modification. When all usage sites have been changed, stop serializing the Symbols when storing specs.

Stage 4

Remove redundant imports from tests. I have a few in the stitch-lh benchmark, and more in another code base (an experiment to integrate inline-r with LH).

Tangent: Recovering Vars might be unnecessary for imported specs

Vars are useful to check that a function has a spec compatible with its type, and they might be useful to reflect unfoldings. This is helpful to do for the spec of the current module, but not for the spec of imported modules, where checks have already been done when verifying those modules.

It is likely that LH is needlessly recovering Vars in these cases. Stopping this, however, will require keeping the imported specs separate from the specs of the current module. At the moment they are all put together in a single soup called TargetSpec in the code.

I'd leave this for another issue, though.

@ranjitjhala
Copy link
Member

I like this plan to use Name -- I think it will be a big improvement!

One thing that's unclear to me though -- you write:

This allows to recover the Vars from the Names regardless of the particular environment in which the lookup is made.

meaning, specifically, that the "disadvantages" 1, 2 (for fully qualifying Symbol) do not apply to Name, yes?

@facundominguez
Copy link
Collaborator Author

meaning, specifically, that the "disadvantages" 1, 2 (for fully qualifying Symbol) do not apply to Name, yes?

That's my understanding. A Name is not strictly required to solve (1), but then LH has to imitate GHC to resolve the name, and having a Name should require less ceremony.

@gergoerdi
Copy link
Contributor

Tangent: Recovering Vars might be unnecessary for imported specs

Vars are useful to check that a function has a spec compatible with its type, and they might be useful to reflect unfoldings. This is helpful to do for the spec of the current module, but not for the spec of imported modules, where checks have already been done when verifying those modules.

It is likely that LH is needlessly recovering Vars in these cases. Stopping this, however, will require keeping the imported specs separate from the specs of the current module. At the moment they are all put together in a single soup called TargetSpec in the code.

This is definitely the way to go, because going from Name to Id is anything but straightforward. You'll end up having to look up the Id with the matching name in the imported module contents. Much better if you can make do with just the Name.

@facundominguez
Copy link
Collaborator Author

facundominguez commented Oct 24, 2024

#2407 is a first iteration of the plan as shared above. It implements the first 3 stages for type constructors in refinement types. I'm hoping this work can be imitated without much innovation for the other features of LH that require names.

#2411 repeats the work for names of data constructors in data type specifications. It also fixes #2092 mentioned below.

I can think of: type aliases, expression aliases, names in expressions, asserted specs, assumed specs, reflections and rewrite rules, measures, expressions in predicates. Probably there are a few others that I'm missing now.

@gergoerdi
Copy link
Contributor

Possibly related: #2092

@facundominguez
Copy link
Collaborator Author

facundominguez commented Nov 28, 2024

Ten PRs later, perhaps it is time to give a bit of an update.

The plan has worked pretty well for the namespace of Haskell identifiers. Name resolution is now remembered for any annotation that refers to a Haskell name when specs are imported. And I could implement that piece-wise for each annotation.

Things are still in progress with the logic namespace. This is the namespace where predicates in refinement types grab names from. It contains measures, inline functions, reflected functions, and sometimes local variables.

My first try is to build an environment of logic names, and use it to resolve all the names that appear in the AST of refinement types. The resolved names are easily persisted and recovered when importing specs. However, liquid-fixpoint can only handle strings (symbols) , so I'm converting the resolved names to strings in a non-ambiguous (injective) transformation, before sending the queries to liquid-fixpoint.

But I'm still exploring the details. One recent lesson is that liquidhaskell assumes in quite a few places that logic names are rendered as symbols in exactly the same way as the Haskell names they correspond to, and I was not considering this so far.

@ranjitjhala
Copy link
Member

ranjitjhala commented Dec 4, 2024 via email

@facundominguez
Copy link
Collaborator Author

facundominguez commented Dec 13, 2024

At the moment resolution of all names is persisted except for type constructors used in qualifier sorts. In addition I'd like to revise type and expression aliases to use import aliases as the other names.

Also, #2463 will add a new define annotation that we might need to handle as well.

After these are handled, we could start retiring the old environments that LH is building to qualify names. I already managed to remove the Qualify class and instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants