-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EBNFUnitFormat won't accept the letter "e" as a unit alias. #250
Comments
I think "e" is a special case because it also exists in the textual representation of numbers like "1.60217662e-19". If you have a unit called "e" the "1.60217662e" would make it difficult to tell apart whether it is a number or a quantity. |
I don't think it's that hard, given that you seem to be using a JavaCC parser. I've implemented a number of parsers that handle floating point numbers and variables named 'e' or 'E' using ANTLR. I can send a simple example if that would help. If you could make your JavaCC grammar (I couldn't find it in the repo), I'm pretty sure I could provide an example. |
Here's a fragment of an ANTLR grammar that will parse strings, including 'e' into a WORD token as well as full floating point numbers with exponential notation. Again, if you could provide the JavaCC grammar, I think I could add something like this so that you could distinguish 'e' in a floating point number from 'e' as a variable name. ` REAL E PLUS_OR_MINUS PLUS MINUS POINT WORD: LETTER (LETTER|DIGIT|'_'|'-'|'.'|'/'|'\')*; |
We haven't been using this for a long time, EBNFUnitParser was maintained manually. Do you think we could use the example mentioned here with something like https://github.com/phax/ph-javacc-maven-plugin? |
I've never used it, but I don't see why not. We used the maven antlr plugin as well as some custom ant tasks to build the parsers in our project. |
You think Ant is necessary? It may not be a problem either because at least in https://github.com/unitsofmeasurement/unit-api we already use it for the Java Mondule system (there could be newer Maven plugins for that soon) |
I don't know if ant is necessary. Our application did some pretty involved stuff (we used antlr to parse a proprietary grammar format, converting the proprietary grammar to a an antlr grammar and then built a parser from the generated using antlr, so we had to create some custom ant tasks). Just running JavaCC (or ANTLR) once to create parser code for the application should be fairly straightforward (with the caveat that I have not used JavaCC myself). |
This could be used as a starting point btw, it was not directly used in JSR 275 either but the codebase still exists: https://github.com/unitsofmeasurement/jsr-275/blob/master/src/main/javacc/javax/measure/unit/format/UnitParser.jj A generated output should more or less act as drop-in replacement for |
Bingo. I'll have a look. |
Thanks, btw, for anything beyond code snippets and exchanging issue tickets, because Indriya is the RI of JSR 385, to raise actual PRs you may have to join the JCP at least as Associate Member, or are you already? See https://github.com/unitsofmeasurement/unit-api/wiki/Contribution-Guide |
I am not, I'm just some random guy on the internet. I'll have a look. |
Still reviewing this grammar (and learning JavaCC), but it seems the problem is the grammar supports log(unit), ln(unit) and e^unit. While the parser accepts these constructs, it does not seem to process them correctly (they return incorrect dimensions). `import java.io.PrintStream; import tech.units.indriya.format.EBNFUnitFormat; /**
} The output of this test code is:
I don't know how to interpret [W?] (this is printing out in unicode), but the dimensions are certainly incorrect. Is supporting these transcendental units part of the spec? Would it be possible to eliminate ln, log and e^ from the unit parser? |
Still reviewing this. The problem is that defining an alias 'e' conflicts with the reserved character 'e' used for Euler's number. The parser allows constructs like 'e^identifier'. With a unit aliased 'e', the parser doesn't know what to do, and throws an error. BTW, is not listed as a reserved character in the javadoc for EBNFUnitFormat ("log", "log" and "ln" are). So how to solve?
I'll try to get some small demo parsers of each case along with a maven build script. Note that I will not try to correct the rest of the codebase to support transcendentals of units. I've asked the bosses about whether I can sign the stuff necessary to be a JSR member, will keep you posted. If I can't, I'll just send the demo code directly to Werner instead of doing a PR. |
Ok, I've prepared a repo that demonstrates how to build JavaCC UnitParser grammars using Maven. I've provided several simplified grammars that eliminate features from EBNF like exponents, logarithms, sums and the mix construct. Candidly, I don't quite understand how these constructs work in a unit parsing context - see my comments above. At the very least, they should be documented with use cases - I didn't see anything describing them in the JSR document at https://download.oracle.com/otndocs/jcp/units-2_0-pr-spec/index.html. This repo also provides a simple demo of setting up aliases using the EBNF parser. See the README. |
EBNFUnitFormat is an implementation detail, so the parsing details would not have a place in the Specification of the Unit API, if anywhere then a future version of https://github.com/unitsofmeasurement/uom-guide. |
The following code:
`import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;
import tech.units.indriya.format.EBNFUnitFormat;
import tech.units.indriya.format.SymbolMap;
import tech.units.indriya.unit.Units;
/**
Simple code to demonstrate use of aliases with the EBNFUnitFormat
*/
public class TestEBNF {
private static final String BUNDLE_NAME = "tech.units.indriya.format.messages"; //$NON-NLS-1$
/**
@param args the command line arguments
*/
public static void main(String[] args) throws UnsupportedEncodingException {
// TODO code application logic here
SymbolMap map = SymbolMap.of(ResourceBundle.getBundle(BUNDLE_NAME, Locale.ROOT));
map.alias(Units.COULOMB.multiply(1.60217662e-19), "e");
EBNFUnitFormat unitParser = EBNFUnitFormat.getInstance(map);
PrintStream utfOut = new PrintStream(System.out, true, "UTF-8");
utfOut.println("Dimension of e: " + unitParser.parse("e").getDimension());
}
}`
Fails with the following error:
`Exception in thread "main" javax.measure.format.MeasurementParseException: tech.units.indriya.internal.format.TokenException: Encountered " "e" "e "" at line 1, column 1.
Was expecting one of:
"(" ...
...
<FLOATING_POINT> ...
...
<NAT_LOG> ...
<UNIT_IDENTIFIER> ...
...
<FLOATING_POINT> ...
...
The grammar for the EBNF format indicates that a single initial character is allowed, and that all letters are allowed initial characters.
The text was updated successfully, but these errors were encountered: