Outside #187

sjanssen2 · 2023-03-07T16:39:58Z

This PR is huge. Sorry. It adds the ability of automatic outside grammar generation to gapc, via the parameter --outside_grammar ALL.

The current implementation can "consume" a subword i,j from the user provided input sequence(s) until every symbol between i and j are parts of candidates (the inside direction). With this PR, we revert the direction: given i,j we "consume" characters 0...i and j...n until we reach 0 and n (the outside direction). In practice, this allows to conceptually split all candidates at a give i,j and compute inside (was already possible) and outside (new) parts which combined will produce complete candidates. With this, we can easily compute e.g. posteriors or other useful information of the candidate space.

I've tried to capsule the code into three phases and create offload most of the new functions into src/outside, but of course changes also touch existing inside parts and thus need changes of existing code - which I tried to keep minimal. The phases are

grammar_transformation: convert the user defined inside grammar into a grammar that additionally contains outside rules which reflect the structure of the inside grammar, but operate from inside to outside
middle_end: the new direction requires running indices from i,j towards 0 and n. Thus, moving boundaries and loops require reverse order. Code in middle_end produces these different non-terminal functions.
codegen: result of the algorithm is no longer the single DP cell (0,n) for the axiom, but values of every (tabulated) NT at every position i,j. Thus, codegen produces a function print_insideoutside_report_fn to report these many values (also consider multi track grammar with more than two dimensions). To limit the output, a use can define which NTs shall be reported via the gapc parameter --outside_grammar X where X is one non-terminal. Repeatedly use of --outside_grammar with different X will lead to multiple NTs being reported. If the user provides ALL as non-terminal, all NTs will be reported.

We ran multiple semantic checks to warn users if outside grammar generation is not meaningful (e.g. empty word cannot be parsed) or algebra functions would use mixed data types

The new function shall work with CYK, CYK+openMP (only single track) and Unger code generation, checkpointing, multitrack.

semantic checks:
- grammar cannot parse the empty word
- are all requested NTs in the grammar, for --outside_grammar?
- does the algebra use mixed types, i.e. answer foo(answer_bar), if so outside cannot be generated as we will lack answer_bar foo(answer) for outside parts
  - complete definition of is_terminal_type for ALL types!
resolve_blocks:
- check if multi_filter really needs to become public
other ToDos
- remove #ifdef LOOPDEBUG parts
- add documentation for middle_end and codegen
- don't forget to revert actions back to master branch of gapc-test-suite!
- cyk
- cyk + checkpointing
- cyk + openMP
- cyk + openMP + checkpointing
- how to deal with non-terminals that have no choice function applied?

P.S. this PR shall replace #122

…ssing

kmaibach

This is indeed a very large PR and I fear that I will not understand everything that's going on here. But I try to work through it and hope it's fine that I'll ask some questions while doing so.

One general question regarding grammar transformation:
Can the grammar be saved to a file or is it only reported to stdout?
Would it be useful to include that so the user can have a look at it later?
Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?
Not necessary for this PR but out of curiosity: Did you check the runtime of the code generation regarding grammar size (number of NTs)? Would be interestring how it changes with larger grammars.

kmaibach · 2023-06-16T14:23:46Z

rtlib/filter.hh

+template<typename alphabet, typename pos_type, typename T>
+inline bool complete_track(
+    const Basic_Sequence<alphabet, pos_type> &seq, T i, T j) {
+  return ((i == seq.n) && (j == seq.n));


What exactly is checked here?

src/grammar.cc

kmaibach · 2023-06-16T15:34:58Z

src/outside/codegen.cc

+
+
+std::list<Symbol::NT*> *NTs_to_report(const AST &ast) {
+  /* define which non-terminals shell be reported to the user


'shall' not 'shell'

kmaibach · 2023-06-16T15:35:13Z

src/outside/codegen.cc

+  /* define which non-terminals shell be reported to the user
+   * order of user arguments (--outside_grammar) shall take precedence over
+   * NTs as occurring in source code of grammar.
+   * - User shell be warned, if outside version of NT has not been generated.


sjanssen2 · 2023-06-18T10:42:51Z

One general question regarding grammar transformation:
Can the grammar be saved to a file or is it only reported to stdout?
Would it be useful to include that so the user can have a look at it later?

The grammar is NOT printed to stdout, but you can "plot" it via --plot-grammar 1. This will not report gap-L code but a visual representation. And yes, I think it is useful to inspect this automatic generation. I've added this hint to the help message of gapc.

sjanssen2 · 2023-06-18T10:44:42Z

2. Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?

You would not manually consider excluding NTs, but yes, there are "inside" productions that lack outside analogues because they have no r.h.s. non-terminals as in your example! (I guess a is a terminal)

sjanssen2 · 2023-06-18T10:47:06Z

3. Not necessary for this PR but out of curiosity: Did you check the runtime of the code generation regarding grammar size (number of NTs)? Would be interestring how it changes with larger grammars.

Theory says that every CFG can be transformed into Chomsky Normal Form (CNF), i.e. max. width is 1 and would have at most 2 outside productions. Thus, the asymptotic run time will not change; only a constant factor is added. For CNF, this yields a factor of ~3. For ADP CFGs with width > 1, this factor can be higher (depends on the number of r.h.s. NTs) but the asymptotic remains the same.

sjanssen2 · 2023-06-18T10:49:02Z

hope it's fine that I'll ask some questions

these are great questions! Please don't hesitate to ask more of those!

kmaibach · 2023-06-21T13:47:09Z

Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?

You would not manually consider excluding NTs, but yes, there are "inside" productions that lack outside analogues because they have no r.h.s. non-terminals as in your example! (I guess a is a terminal)

Yes, a is a terminal.

I think I have seen it somewhere but are there checks to see if an inclusion or exclusion of certain NTs is problematic? Does the user get a warning if they don't include NTs?

sjanssen2 · 2023-06-22T08:12:08Z

Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?

You would not manually consider excluding NTs, but yes, there are "inside" productions that lack outside analogues because they have no r.h.s. non-terminals as in your example! (I guess a is a terminal)

Yes, a is a terminal.

I think I have seen it somewhere but are there checks to see if an inclusion or exclusion of certain NTs is problematic? Does the user get a warning if they don't include NTs?

outside grammar generation is fully automatic, i.e. the user has no saying in which NTs to process. Therefore, he/she cannot do anything wrong here, except designing an inside grammar that cannot parse the empty word - an according warning will be reported to the user.

kmaibach · 2023-06-22T11:13:27Z

Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?

You would not manually consider excluding NTs, but yes, there are "inside" productions that lack outside analogues because they have no r.h.s. non-terminals as in your example! (I guess a is a terminal)

Yes, a is a terminal.
I think I have seen it somewhere but are there checks to see if an inclusion or exclusion of certain NTs is problematic? Does the user get a warning if they don't include NTs?

outside grammar generation is fully automatic, i.e. the user has no saying in which NTs to process. Therefore, he/she cannot do anything wrong here, except designing an inside grammar that cannot parse the empty word - an according warning will be reported to the user.

I think I completely misunderstood this passage in your explanation:

To limit the output, a use can define which NTs shall be reported via the gapc parameter --outside_grammar X where X is one non-terminal. Repeatedly use of --outside_grammar with different X will lead to multiple NTs being reported. If the user provides ALL as non-terminal, all NTs will be reported.

I thought that you could exclude the NTs in the code generation NOT (only) in die visual report. But I think the latter is what you really mean, right?

So, my question would be irrelevant then. Sorry for the confusion.

sjanssen2 · 2023-06-22T15:21:45Z

Just for my understanding: Why would I want to exclude NTs from outside transformation? Are there cases where some NTs don't influence the probabilities of the rest of a grammar? Or are those only NTs of the form X -> a ?

You would not manually consider excluding NTs, but yes, there are "inside" productions that lack outside analogues because they have no r.h.s. non-terminals as in your example! (I guess a is a terminal)

Yes, a is a terminal.
I think I have seen it somewhere but are there checks to see if an inclusion or exclusion of certain NTs is problematic? Does the user get a warning if they don't include NTs?

outside grammar generation is fully automatic, i.e. the user has no saying in which NTs to process. Therefore, he/she cannot do anything wrong here, except designing an inside grammar that cannot parse the empty word - an according warning will be reported to the user.

I think I completely misunderstood this passage in your explanation:

To limit the output, a use can define which NTs shall be reported via the gapc parameter --outside_grammar X where X is one non-terminal. Repeatedly use of --outside_grammar with different X will lead to multiple NTs being reported. If the user provides ALL as non-terminal, all NTs will be reported.

I thought that you could exclude the NTs in the code generation NOT (only) in die visual report. But I think the latter is what you really mean, right?

So, my question would be irrelevant then. Sorry for the confusion.

ah, that is the misunderstanding. Correct, it only controls which results are reported on stdout. One is typically not interested in all cell values of all NTs, thus one might save printing lines. But in general, we cannot know what the user is interested in.

fymue · 2023-06-22T17:31:36Z

@sjanssen2 Regarding the checkpointing test that keeps failing: I can't really tell what exactly is going wrong by just looking at the log, but I assume that the test input is simply too short for this test if it is executed on multiple threads (with OMP). I ran into some issues before whenever I checkpointed programs that had execution times close to 1s, which is the checkpointing interval used in all tests. We could maybe increase the input length a bit so the test runs a bit longer and see if it still keeps failing.

sjanssen2 · 2023-06-27T08:38:07Z

I've increased test input sequences as suggested by @fymue - seems to help :-)
Do you have further issues regarding this PR @fymue @kmaibach ?

fymue · 2023-06-27T10:04:22Z

I've increased test input sequences as suggested by @fymue - seems to help :-)

Very good.

Do you have further issues regarding this PR @fymue @kmaibach ?

Nothing else from my side.

kmaibach

Alright, so here are my final comments.
I looked through everything and have some remarks and questions left.

But other than that it should be it.

src/alt.cc

src/grammar.hh

testdata/gapc_filter/RFmini_data.hh

testdata/gapc_filter/ext_hmm.hh

testdata/gapc_filter/isntimes.hh

testdata/grammar_outside/elmamun.gap

kmaibach · 2023-06-30T14:04:18Z

testdata/grammar_outside/elmamun_derivatives.gap

+  formula = number(INT)
+	  | add(formula, CHAR('+'), formula)
+	  | mult(formula, CHAR('*'), formula)
+	  | nil(EMPTY)


heinz and minus are missing here

interestingly, the compiler does not complain if an algebra defines the body of an algebra function despite the fact that this algebra function is not declared in the signature! Bug or feature? I am using the behavior to smuggle in a normalization algebra function for computation of derivatives, part of #151
Here, I was testing that gapc really does not throw an error about heinz.

minus is not used in the grammar as this would violate Bellman's Principle in combination with the defined algebra functions. It's basically a left over from copy and pasting the code from the teaching example.

kmaibach · 2023-06-30T14:06:03Z

testdata/grammar_outside/mini_complexIL.gap

+  }
+}
+
+//algebra alg_dotBracket implements sig_foldrna(alphabet = char, answer = string) {


multi-line comments would be better

or remove this unused algebra definition altogether :-)

sjanssen2 · 2023-07-04T15:21:04Z

Hi @kmaibach I hope I have addressed your latest issues appropriately. Can you check again?

sjanssen2 added 5 commits March 7, 2023 16:44

add outside subdir

d3fbbbb

set up test suite for outside

2e17ec2

activate outside mode

8fbe380

add first semantic check: can empty words be parsed

f0d55ae

activate tests

0b0091e

sjanssen2 added the WIP work in progress, do not (yet) merge label Mar 7, 2023

sjanssen2 added 24 commits March 7, 2023 18:04

adding new souce file

80be2bc

linting

2dafa78

added header guard

04f04b3

adding a test for user input NTs

f0231da

skip test if no outside was requested at all

470d36e

new semantic check for mixed types

4d88388

fct to check for being terminal or not

b72ea45

Merge branch 'master' of github.com:jlab/gapc into outside

57b455d

logic bug

af3ab16

reactivate resolve_blocks test

47c3a4d

add functionallity to resolve Alt::Blocks in given grammar

691ff7c

linter

bc8d32d

better name

24bfb49

add "resove blocks" test

d82a8e8

added nil(EMPTY) back in

c4d7c52

Merge branch 'master' of github.com:jlab/gapc into outside

12bb493

cleanup

e696d78

intermediate state

f77dbff

intermediate: outside rules present, axiom missing, complete track mi…

31574fa

…ssing

inject new axioms + transition to inside

7f6918d

allow table_dim re-computation

7a214b9

add modtest: complete grammar transformation

37c8a39

use correct version

6661689

add new test that checks for grammar topology

aa36abd

kmaibach reviewed Jun 16, 2023

View reviewed changes

sjanssen2 added 2 commits June 17, 2023 16:03

more elaborate explanation about complete_track filter

287028b

better explanation

8eabb33

sjanssen2 added 2 commits June 18, 2023 12:47

fix typo

f825f9d

add hint to plot-grammar

86be424

Merge branch 'master' into outside

f25951c

increase test to avoid fails on OSx

63bd6bf

fymue approved these changes Jun 27, 2023

View reviewed changes

kmaibach reviewed Jun 30, 2023

View reviewed changes

sjanssen2 added 4 commits July 4, 2023 15:50

linting

4d230d6

remove unused function

5cfa644

linting

a2c366e

remove unused algebra function

681d312

kmaibach approved these changes Jul 5, 2023

View reviewed changes

sjanssen2 added 3 commits July 5, 2023 22:33

updating debian package

524ced9

also execute outside tests on OSX

84e1c61

merge outside mod tests into "normal" workflow

d974b11

sjanssen2 merged commit df87aca into master Jul 5, 2023



		std::list<Symbol::NT> NTs_to_report(const AST &ast) {
		/* define which non-terminals shell be reported to the user

Outside #187

Outside #187

Uh oh!

Conversation

sjanssen2 commented Mar 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmaibach left a comment

Choose a reason for hiding this comment

Uh oh!

kmaibach Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kmaibach Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

kmaibach Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

sjanssen2 commented Jun 18, 2023

Uh oh!

sjanssen2 commented Jun 18, 2023

Uh oh!

sjanssen2 commented Jun 18, 2023

Uh oh!

sjanssen2 commented Jun 18, 2023

Uh oh!

kmaibach commented Jun 21, 2023

Uh oh!

sjanssen2 commented Jun 22, 2023

Uh oh!

kmaibach commented Jun 22, 2023

Uh oh!

sjanssen2 commented Jun 22, 2023

Uh oh!

fymue commented Jun 22, 2023

Uh oh!

sjanssen2 commented Jun 27, 2023

Uh oh!

fymue commented Jun 27, 2023

Uh oh!

kmaibach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kmaibach Jun 30, 2023

Choose a reason for hiding this comment

Uh oh!

sjanssen2 Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

kmaibach Jun 30, 2023

Choose a reason for hiding this comment

Uh oh!

sjanssen2 Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

sjanssen2 commented Jul 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjanssen2 commented Mar 7, 2023 •

edited

Loading