Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to parse some real awk scripts #58

Open
xonixx opened this issue Dec 24, 2023 · 15 comments
Open

Not able to parse some real awk scripts #58

xonixx opened this issue Dec 24, 2023 · 15 comments
Assignees

Comments

@xonixx
Copy link

xonixx commented Dec 24, 2023

I'm not sure how compatible this project aims to be. Out of curiosity I tried it on a couple of my projects: makesure and gron.awk:

$ java -jar soft/jawk-3.0.01-standalone.jar -f makesure.awk
ParserException: Expecting 277 _COLON_. Found: 264 _STRING_ ("@") (makesure.awk:233)

$ java -jar soft/jawk-3.0.01-standalone.jar -f gron.awk
ParserException: Expecting an ID. Got (258): 
     (gron.awk:82)

Both of the scripts are compatible with gawk, bwk, mawk, busybox awk & goawk.

@bertysentry
Copy link
Member

Hi, sorry we didn't notice the issue you registered. Let us look into this!

@bertysentry
Copy link
Member

bertysentry commented Jan 17, 2024

Hi @xonixx

We've just released version 3.1.00 of Jawk which fixes the operator precedence of the parser. It now strictly follows gawk.

gron.awk now kind of works... except we're still encountering issues when parsing JSON objects (arrays seem to be fine). Could you please pinpoint what is going on and what is actually failing?

echo [1, 2, 3] | java -jar target/jawk-3.1.00-standalone.jar -f src/test/resources/xonixx/gron.awk
json=[]
json[0]=1
json[1]=2
json[2]=3

But...

echo {"a": 1} | java -jar target/jawk-3.1.00-standalone.jar -f src/test/res
ources/xonixx/gron.awk
Can't parse JSON at pos 2: a: 1}

makesure.awk still doesn't work because we're missing a few gawk-specific funtions (non-POSIX), like gettimeofday():

java -jar target/jawk-3.1.00-standalone.jar -f src/test/resources/xonixx/makesure.awk
SemanticException: function org.sentrysoftware.jawk.frontend.AwkParser$FunctionProxy@593634ad (gettimeofday) not defined (src/test/resources/xonixx/makesure.awk:717)

@xonixx
Copy link
Author

xonixx commented Jan 17, 2024

Thank you @bertysentry, excellent work! Sure, I’ll check when I have time.

@xonixx
Copy link
Author

xonixx commented Jan 17, 2024

Regarding the missing function gettimeofday(). Yeah, this is expected. But you can still check this one by adding one more -f flag and referencing https://github.com/xonixx/makesure/blob/main/mawk_ext.awk with this function stub.

@xonixx
Copy link
Author

xonixx commented Jan 18, 2024

So here is one bug I've triaged so far:

$ java -jar soft/jawk-3.1.00-standalone.jar 'BEGIN { print !xxx }'
0
$ awk 'BEGIN { print !xxx }'
1

@bertysentry
Copy link
Member

@xonixx Please see #91 ;-)

@xonixx
Copy link
Author

xonixx commented Jan 18, 2024

One other bug please:

$ java -jar soft/jawk-3.1.02-standalone.jar 'BEGIN { exit 17 }'; echo $?
0
$ awk 'BEGIN { exit 17 }'; echo $?
17

@bertysentry
Copy link
Member

Thank you @xonixx! See #94

@xonixx
Copy link
Author

xonixx commented Feb 1, 2024

One more bug with the latest version:

$ java -jar soft/jawk-3.2.00-standalone.jar 'BEGIN { D=2; print D" "(D--) }'
1 2

$ awk 'BEGIN { D=2; print D" "(D--) }'
2 2

@bertysentry
Copy link
Member

Thanks @xonixx! That's a tough one... Let's see how we can fix this. It pertains to the order of execution of string concatenation. Probably...

@xonixx
Copy link
Author

xonixx commented Feb 1, 2024

I believe it's a problem with the order of evaluation of sub-expressions. The actual bug in gron.awk is in form:

$ java -jar soft/jawk-3.2.00-standalone.jar 'BEGIN { D=2; f(D,D--) } function f(a,b){ print a; print b }'
1
2

$ awk 'BEGIN { D=2; f(D,D--) } function f(a,b){ print a; print b }'
2
2

@bertysentry
Copy link
Member

Thank you @xonixx for reporting this bug (about the order of evaluation of expressions). It was a massive design problem: all expressions were eval'ed in the wrong order!

It was kind of funny though:

print a++, a++, a++, a++

would produce:

3 2 1 0

😅

It's fixed now in version 3.3.00!

@xonixx
Copy link
Author

xonixx commented Feb 5, 2024

Nice!

Couple more bugs with the fresh version:

Bug 1

$ java -jar soft/jawk-3.3.00-standalone.jar 'BEGIN { print c >= "0" }'
ParserException: Expecting statement terminator. Got _GE_: >= (<command-line-supplied-script>:0)

$ awk 'BEGIN { print c >= "0" }'
0

Bug 2

$ java -jar soft/jawk-3.3.00-standalone.jar 'BEGIN { x = c >= "0"; print x }'
1
$ awk 'BEGIN { x = c >= "0"; print x }'
0

@bertysentry
Copy link
Member

Thank you for spotting these issues, @xonixx!

Bug 2 is actually related to #110. In AWK, whether strings should be converted to numbers isn't quite intuitive. Uninitialized variables are considered equal to zero (number), and "0" == 0 if "0" comes from file input or stdin. In this case, "0" must not be converted to the number zero, and thus Uninitialized variable is "less than" the non-empty string.

Bug 1 is currently a limitation, where the parser doesn't now if we're printing to a file (using the > redirection symbol), or if the > is part of an expression. This needs to be fixed somehow.

@bertysentry
Copy link
Member

@xonixx Issue #120 has been fixed! Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants