Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

yutakang · 2020-09-10T05:51:12Z

No description provided.

brando90 · 2020-10-05T22:33:46Z

hi @yutakang, is this the data set we spoke about? Can I help you make it?

yutakang · 2020-10-07T13:40:29Z

hi @yutakang, is this the data set we spoke about?

Yes.
I created a new directory Neural_PaMpeR at https://github.com/data61/PSL/tree/master/Neural_PaMpeR

Bits and pieces of the code base for the data extraction for PaMpeR are outdated for Isabelle2020.
So, I have to update them.

And unfortunately I did not record on what date I downloaded the AFP articles to build the database I used for the CICM paper.

So, I have to build the database again, so that I can match each line of database with the string representing each proof goal.

I can use a big server machine in Innsbruck, so in terms of computational resources I think it's alright.

brando90 · 2020-10-07T22:33:41Z

Hi @brando90,

hi @yutakang, is this the data set we spoke about?

Yes.
I created a new directory Neural_PaMpeR at https://github.com/data61/PSL/tree/master/Neural_PaMpeR

Bits and pieces of the code base for the data extraction for PaMpeR are outdated for Isabelle2020.
So, I have to update them.

And unfortunately I did not record on what date I downloaded the AFP articles to build the database I used for the CICM paper.

So, I have to build the database again, so that I can match each line of database with the string representing each proof goal.

I can use a big server machine in Innsbruck, so in terms of computational resources I think it's alright.

ok cool let me know if there is something I can do to help to get the proof obligations.

Thanks!

yutakang · 2020-10-27T06:56:46Z

Hi @brando90 ,

I produced a sample datapoint for this lemma.

This sample datapoint is expressed in 4 lines:

1 line for the results of the assertions,
3 lines for the strings of proof obligations. (1 for the first subgoal and 2 for the local assumptions (a.k.a. chained facts))

Each line representing this datapoint starts with the location information (file name and line number).
The lines representing first goals are tagged with "First_Subgoal", and the lines representing chained facts are tagged with "Chained_Fact".

What do you think?

brando90 · 2020-11-09T23:04:51Z

Hi @brando90 ,

I produced a sample datapoint for this lemma.

This sample datapoint is expressed in 4 lines:

1 line for the results of the assertions,

3 lines for the strings of proof obligations. (1 for the first subgoal and 2 for the local assumptions (a.k.a. chained facts))

Each line representing this datapoint starts with the location information (file name and line number).
The lines representing first goals are tagged with "First_Subgoal", and the lines representing chained facts are tagged with "Chained_Fact".

What do you think?

Hi yutaka,

that looks good for that goal. I do want to keep the hardcoded vector representation that you came up with. The the most important is that for every node in the proof tree that we have it's raw formula. So if we go from assm1 g1 to amms2 g2 using tac then we want:

(assm1 g1, Tac)

the next set of goals would be a data point for the next tactic (not included in my example). I think your example might be too simple to see if it's robust but I think it's ok. What we want is to know what tactic was done to prove a specific goal with a specific local context.

If the tactic took an argument, we'd also like that to be part of the target e.g.

(assm1 g1, Tac arg)

Does that make sense?

brando90 · 2020-11-12T19:16:51Z

@yutakang actually perhaps this the best thing:

I think what tactician suggests is better: "tactica1, is saved into the database accompanied both by the proof states before and after tactic execution, in this case, <Γ_a1 |- σ1, tactic_a1, Γ_a2 |- σ2" is good.

I still think the arguments to the tactic should be saved.

For future work, I think event just saving the entire proof tree from a proof script would be best. That way ML researchers can extract what they want (e.g. parents, steps, anything from the proof tree but perhaps that's for a different discussion but I'd be curious on how easy from an engineering perspective this is to do for Isabelle)

brando90 · 2020-11-17T15:53:49Z

by proof tree I mean the tree from the tactics (not from the "logicians" perspective)

brando90 · 2020-11-17T15:55:38Z

this is ultimately the proof I am suggesting:

(current_goal, assumptions, tactic arguments, resulting obligations)

brando90 · 2020-11-17T16:28:48Z

@yutakang

Summary of discussion:

We want to create a data set that is as complete as possible (mainly because its hard to recover information that is missing once it's created in an offline setting). I suggest thus we have something like this:

data = (A1==>G1, assms, Tac args args_names, A2==>G2)

does this sound good? Is there anything missing?

brando90 · 2020-11-23T23:08:40Z

@yutakang what is the difference between A1 vs assms?

brando90 · 2020-12-21T23:00:46Z

hi @yutakang! How are you? How is the data set we discussed going?

Regards, Brando

yutakang · 2021-04-13T16:59:02Z

Hi @brando90 ,

Sorry, my only brother was dying, which was causing lots of problems.
To handle the situation I limited my time for research to the lowest possible level, which was just enough not to be fired... at least so far.

I am back in Singapore and resuming Isabelle-related work in my spare time.
Sorry, it might be already too late for you and your folks by now.

By the way, the notification from GitHub sometimes doesn't work for me.
In case you don't receive a reply from me, please send an email at [email protected] or tweet at YutakangE.

Best regards,
Yutaka

brando90 · 2021-06-22T15:20:47Z

Hi @brando90 ,

Sorry, my only brother was dying, which was causing lots of problems.
To handle the situation I limited my time for research to the lowest possible level, which was just enough not to be fired... at least so far.

I am back in Singapore and resuming Isabelle-related work in my spare time.
Sorry, it might be already too late for you and your folks by now.

By the way, the notification from GitHub sometimes doesn't work for me.
In case you don't receive a reply from me, please send an email at [email protected] or tweet at YutakangE.

Best regards,
Yutaka

No Worries Yutaka, family is really important. Recently my father also had a terrible complication - likely as severe as yours.

It's not to late to create the data set we were chatting about but it might take me longer to implement agents to benchmark it or get a team to try that. Let me know if you do have time for that.

Regards, Brando

brando90 · 2021-06-22T15:21:39Z

btw, I did recently made my repo public in case someone wanted to help us out: brando90/isabelle-gym#30

yutakang · 2021-07-18T16:57:11Z

Hi @brando90 ,

I am sorry about your father's situation.

I have recovered a little and started working on this repository in my spare time.
Josef at CTU in Prague told me that I can use machines to extract data.

So, probably I can extract the dataset again.

I check this repository every once in a while.
But if you don't receive a reply from me, please send an email at [email protected] or tweet at YutakangE.

brando90 · 2021-07-19T13:57:09Z

Sounds good! Feel free to message me when your data set is ready. No pressure! Hope you and your family is doing better and you should put that in the priority. This can wait. Regards, Brando

…

On Jul 18, 2021, at 11:57 AM, Yutaka Ng ***@***.***> wrote: Hi @brando90 <https://github.com/brando90> , I am sorry about your father's situation. I have recovered a little and started working on this repository in my spare time. Josef at CTU in Prague told me that I can use machines to extract data. So, probably I can extract the dataset again. I check this repository every once in a while. But if you don't receive a reply from me, please send an email at ***@***.*** ***@***.***> or tweet at YutakangE. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#178 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOE6LXZLGNKMR2BRBUWZE3TYMBXDANCNFSM4REIYK5Q>.

yutakang self-assigned this Sep 10, 2020

yutakang changed the title ~~PaMpeR2: build the database again with each line tagged with proof obligations.~~ Neural_PaMpeR: build the database again with each line tagged with proof obligations. Oct 7, 2020

yutakang pinned this issue Oct 12, 2020

brando90 mentioned this issue Nov 12, 2020

Make datasets. brando90/isabelle-gym#30

Open

yutakang unpinned this issue Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

yutakang commented Sep 10, 2020

brando90 commented Oct 5, 2020

yutakang commented Oct 7, 2020

brando90 commented Oct 7, 2020

yutakang commented Oct 27, 2020

brando90 commented Nov 9, 2020 •

edited

Loading

brando90 commented Nov 12, 2020

brando90 commented Nov 17, 2020

brando90 commented Nov 17, 2020

brando90 commented Nov 17, 2020 •

edited

Loading

brando90 commented Nov 23, 2020

brando90 commented Dec 21, 2020

yutakang commented Apr 13, 2021

brando90 commented Jun 22, 2021

brando90 commented Jun 22, 2021

yutakang commented Jul 18, 2021

brando90 commented Jul 19, 2021 via email

Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

Neural_PaMpeR: build the database again with each line tagged with proof obligations. #178

Comments

yutakang commented Sep 10, 2020

brando90 commented Oct 5, 2020

yutakang commented Oct 7, 2020

brando90 commented Oct 7, 2020

yutakang commented Oct 27, 2020

brando90 commented Nov 9, 2020 • edited Loading

brando90 commented Nov 12, 2020

brando90 commented Nov 17, 2020

brando90 commented Nov 17, 2020

brando90 commented Nov 17, 2020 • edited Loading

brando90 commented Nov 23, 2020

brando90 commented Dec 21, 2020

yutakang commented Apr 13, 2021

brando90 commented Jun 22, 2021

brando90 commented Jun 22, 2021

yutakang commented Jul 18, 2021

brando90 commented Jul 19, 2021 via email

brando90 commented Nov 9, 2020 •

edited

Loading

brando90 commented Nov 17, 2020 •

edited

Loading