-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178
Comments
hi @yutakang, is this the data set we spoke about? Can I help you make it? |
Hi @brando90,
Yes. Bits and pieces of the code base for the data extraction for PaMpeR are outdated for Isabelle2020. And unfortunately I did not record on what date I downloaded the AFP articles to build the database I used for the CICM paper. So, I have to build the database again, so that I can match each line of database with the string representing each proof goal. I can use a big server machine in Innsbruck, so in terms of computational resources I think it's alright. |
ok cool let me know if there is something I can do to help to get the proof obligations. Thanks! |
Hi @brando90 , I produced a sample datapoint for this lemma. This sample datapoint is expressed in 4 lines:
Each line representing this datapoint starts with the location information (file name and line number). What do you think? |
Hi yutaka, that looks good for that goal. I do want to keep the hardcoded vector representation that you came up with. The the most important is that for every node in the proof tree that we have it's raw formula. So if we go from assm1 g1 to amms2 g2 using tac then we want:
the next set of goals would be a data point for the next tactic (not included in my example). I think your example might be too simple to see if it's robust but I think it's ok. What we want is to know what tactic was done to prove a specific goal with a specific local context. If the tactic took an argument, we'd also like that to be part of the target e.g.
Does that make sense? |
@yutakang actually perhaps this the best thing:
I still think the arguments to the tactic should be saved. For future work, I think event just saving the entire proof tree from a proof script would be best. That way ML researchers can extract what they want (e.g. parents, steps, anything from the proof tree but perhaps that's for a different discussion but I'd be curious on how easy from an engineering perspective this is to do for Isabelle) |
by proof tree I mean the tree from the tactics (not from the "logicians" perspective) |
this is ultimately the proof I am suggesting:
|
Summary of discussion: We want to create a data set that is as complete as possible (mainly because its hard to recover information that is missing once it's created in an offline setting). I suggest thus we have something like this:
does this sound good? Is there anything missing? |
@yutakang what is the difference between A1 vs assms? |
hi @yutakang! How are you? How is the data set we discussed going? Regards, Brando |
Hi @brando90 , Sorry, my only brother was dying, which was causing lots of problems. I am back in Singapore and resuming Isabelle-related work in my spare time. By the way, the notification from GitHub sometimes doesn't work for me. Best regards, |
No Worries Yutaka, family is really important. Recently my father also had a terrible complication - likely as severe as yours. It's not to late to create the data set we were chatting about but it might take me longer to implement agents to benchmark it or get a team to try that. Let me know if you do have time for that. Regards, Brando |
btw, I did recently made my repo public in case someone wanted to help us out: brando90/isabelle-gym#30 |
Hi @brando90 , I am sorry about your father's situation. I have recovered a little and started working on this repository in my spare time. So, probably I can extract the dataset again. I check this repository every once in a while. |
Sounds good!
Feel free to message me when your data set is ready. No pressure!
Hope you and your family is doing better and you should put that in the priority. This can wait.
Regards, Brando
… On Jul 18, 2021, at 11:57 AM, Yutaka Ng ***@***.***> wrote:
Hi @brando90 <https://github.com/brando90> ,
I am sorry about your father's situation.
I have recovered a little and started working on this repository in my spare time.
Josef at CTU in Prague told me that I can use machines to extract data.
So, probably I can extract the dataset again.
I check this repository every once in a while.
But if you don't receive a reply from me, please send an email at ***@***.*** ***@***.***> or tweet at YutakangE.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#178 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOE6LXZLGNKMR2BRBUWZE3TYMBXDANCNFSM4REIYK5Q>.
|
No description provided.
The text was updated successfully, but these errors were encountered: