Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record function returned types #78

Merged
merged 10 commits into from
Oct 5, 2020
Merged

Conversation

aktech
Copy link
Member

@aktech aktech commented Aug 24, 2020

This is an attempt to implement #15 to record function return types.

It delays the call to log_call method and saves the arguments as an attribute (log_call_args) of the Stack class. The Stack now has a previous_stack attribute to check it the previous stack opname was CALL_METHOD, so that delayed call to log_call can be executed with the known returned value.

TODO:

  • Tracing for all op codes except:
    STORE_ATTR: doesn’t returns anything as it sets a value
    DELETE_SUBSCR: doesn’t returns anything
    STORE_SUBSCR: doesn’t returns anything
    BUILD_TUPLE_UNPACK: will return tuple
    BUILD_LIST_UNPACK: will always return list
    BUILD_SET_UNPACK: will always return set
    BUILD_TUPLE_UNPACK_WITH_CALL: will always return tuple
    COMPARE_OP: will always return boolean
  • Do not trace exception return types
  • Union of return types if a function returns more than one type (for e.g: np.add(1, 1) and np.add(1, 1.2))
  • Return type hint in the final output

@aktech aktech requested a review from saulshanabrook August 24, 2020 13:57
@saulshanabrook
Copy link
Contributor

Wow, this is great! I'll take a look.

Copy link
Contributor

@saulshanabrook saulshanabrook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like the right approach! Thank you.

I am curious though how the particular implementation might evolve as we add support for all the other opcodes as well.

I.e. I am thinking instead of storing the whole previous stack we might end up with just storing the opcode and some other structure?

But I think it's hard to know the scope of how general/specific it all should be until we add support for all other opcodes.

How would you feel about continue to work off this same branch on those? I think it's probably easiest to for me to continue reviewing it all as one big PR instead of merging in and then updating it.

@@ -42,7 +42,7 @@ def __main__():


def parse_line(
n: int, function: object, params=None, bound_params=None,
n: int, function: object, params=None, bound_params=None, return_type=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now. We will have to add something to the signature to represent the return type and then we should display it as a return type annotation, see PEP 484:

def greeting(name: str) -> str:
    return 'Hello ' + name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be anything at the moment, as we don't have a specialised object. Do you mean to say adding Any hint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for most opcodes we don't k ow the return type yet, but as you implement them, we will...

The whole "how do we type things and then combine them" is another part of the code, totally separate from the recording.

We can use the ObjectOutput for places we don't know the return type yet. It is the same as typing.Any.

record_api/core.py Show resolved Hide resolved
@aktech
Copy link
Member Author

aktech commented Aug 24, 2020

I.e. I am thinking instead of storing the whole previous stack we might end up with just storing the opcode and some other structure?

That is totally possible, I started with whole stack in case I need something else, currently I am only using the opname attribute.

How would you feel about continue to work off this same branch on those?

That sounds good to me. 👍

@saulshanabrook
Copy link
Contributor

saulshanabrook commented Aug 24, 2020

Ah a couple other things to think about (don't need to worry about these now, but wanted to get them down now that I remember them):

  • We should try this on a function that raises an exception. I think in that case, we should skip tracing that call to the function.
  • We could/should record from numpy import arange as getattr(numpy, "arange"). And a from numpy import * as a bunch of getattr one for each of __all__ on the module or all values on the modules otherwise? Then, if we also record the return types for each of those as the values, we will be able to understand where people imported things from (Put types in parent modules #6) as well as see where ufuncs were defined (We don't know where ufuncs are from! #70)! The analysis step of this could be done in a follow up PR though.

@aktech aktech force-pushed the record-returned-value branch from 297f6c1 to cf8c27c Compare August 28, 2020 12:41
@aktech
Copy link
Member Author

aktech commented Aug 28, 2020

We should try this on a function that raises an exception. I think in that case, we should skip tracing that call to the function.

Wouldn't that halt the execution of the tracing anyways?

@saulshanabrook
Copy link
Contributor

Wouldn't that halt the execution of the tracing anyways?

Not necessarily, if it is caught and execution continues.

@aktech
Copy link
Member Author

aktech commented Aug 28, 2020

Not necessarily, if it is caught and execution continues.

Oh, I see what you mean. By the way I have added the tracing for normal (non method) function calls too.

*((kwargs,) if kwargs else ()),
return_type=type(tos),
)

# special case subscr b/c we only check first arg, not both
def op_BINARY_SUBSCR(self):
self.process((self.TOS1,), op.getitem, (self.TOS1, self.TOS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we record the return type for any of some these special methods as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.

@aktech aktech force-pushed the record-returned-value branch from c09670e to 0066ca7 Compare September 15, 2020 21:34
@aktech
Copy link
Member Author

aktech commented Sep 15, 2020

I have included the tracing for all the functions, where it made sense to, here are notes for the ones that didn't make it

  • STORE_ATTR: doesn’t returns anything as it sets a value
  • DELETE_SUBSCR: doesn’t returns anything
  • STORE_SUBSCR: doesn’t returns anything
  • BUILD_TUPLE_UNPACK: will return tuple
  • BUILD_LIST_UNPACK: will always return list
  • BUILD_SET_UNPACK: will always return set
  • BUILD_TUPLE_UNPACK_WITH_CALL: will always return tuple
  • COMPARE_OP: will always return boolean

Let me know, if that doesn't makes sense. I have added the signature for the return types as well.

TODO: Merge return types (Union of return types if a function returns more than one type)

@aktech
Copy link
Member Author

aktech commented Sep 16, 2020

The test failure is due to this issue: #83

@saulshanabrook
Copy link
Contributor

@aktech That list sounds right on, thank you so much for this! I'll work on fixing the CI.

@aktech
Copy link
Member Author

aktech commented Sep 25, 2020

@saulshanabrook I have added the return types union as well.

@saulshanabrook
Copy link
Contributor

Thank you for working on this! Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants