diff --git a/IncrementalParsing.md b/IncrementalParsing.md new file mode 100644 index 00000000..d43d1748 --- /dev/null +++ b/IncrementalParsing.md @@ -0,0 +1,65 @@ +### Incremental parsing + +#### Basics + +We'll start by explaining how incremental parsing works for LL, then how we store that data. We are not going to talk about incremental _lexing_. + +Let's start with LL(1) and ignore semantic predicates and other things. Fundamentally, the problem of incremental parsing is one of knowing what can change about how a given parser rule processes the tokens (and the resulting output parse tree) given a set of new/deleted/changed tokens. For LL(1), this turns out to be very easy. Given LL(1) can only look ahead one token, the only token changes that can even matter to a given parser rule (and the output parse tree) are changes to whatever the tokens the rule looked at last time, plus 1 token forward. If no tokens have changed in that [startToken, stopToken+1] bound, the rule cannot be affected (assuming it gets run). The referenced paper explains this in detail and shows how to make it work for LR parsers. Terrence also explains variants of the above in a few github issues where people have asked about incremental parsing. + +ANTLR already tracks the token bounds of rules in the parse tree (startIndex/stopIndex). Thus, for LL(1) you don't even need extra information to do incremental parsing, you could simply use what exists. + +#### Making it work for LL(1) + +So how do we effect this incremental parsing for LL(1) in practice? + +For our purposes, we need the list of token changes and the previous parse tree. We guard each parser rule with a check as to whether any of the changed tokens were in the bounds of the rule (including possible lookahead) during the last parse. If so, we re-run the rule and take its output. If not, we reuse the context the parse tree has from last time (later we'll cover fixing up the token data). We seek the token index to the stopIndex the rule had last time. This happens all the way down the rule tree as we parse top-down. + +#### Making it work for LL(k) + +Making the above work for LL(k) is just changing the 1 we add to the bounds, as it's still just a constant number. You can just add k to the bounds instead of 1. + +#### Making it work for LL(\*) + +LL(\*) unfortunately adds a little bit of trickiness because the lookahead can be infinite. To make this work correctly, we need to know how far the parser _actually did_ look the last time we ran it. To account for this, we need to adjust how the token stream works a little bit. Thankfully ANTLR is well modularized, and all lookahead/lookbehind goes through the token streams through a well defined interface. So we create an IncrementalTokenStream class, and keep the information we need there. The information we need is to have a stack of min/max token bounds in the token stream[1]. When the parser enters a rule it pushes the current token index as the min/max onto the minmax stack. The token stream updates the min/max bounds of the top of the minmax stack whenever lookahead/lookbehind is called. When the parser exits a rule, it pops the minmax stack,and sets the min/max information on the rule context. If there is a parent rule context, it unions the child interval into the parent (so that the parent ends up with a token range spanning the entire set of children). This accounts for the _actual_ lookahead or lookbehind performed during a parse. + +[1] You can track it more exactly than this but it is highly unlikely to ever be worth it. The main issue this affects is changes to hidden tokens, which will cause reparsing even though the parser can't see them. + +#### Adaptive parsing, SLL, etc + +None of these change anything because they also go through the proper lookahead/lookbehind interfaces. At worst, they look at too much context and we reparse a little more than we should. + +#### Predicates + +Predicates that do lookahead or lookbehind are covered by the LL(\*) method with no additional work. +Past that, hopefully the bounds are somewhat obvious: Predicates that are idempotent and don't depend on context forward of a given rule/lookahead, work fine. +Others cannot be supported (and their failure can't easily be detected). + +#### Actions, parse listeners, etc + +Parse listeners attached directly to Parsers will not see rules that are skipped. This is fixable (but unclear if it is worth it). Actions that occur during skipped rules will not occur. +Once the tree is generated it is no different than any other tree. + +#### Tree fixup + +ANTLR tracks start/end position, line info, source stream info, in tokens, so when the parse tree pieces are reused, all of that may be wrong because they point to old tokens. The text in the parse tree will actually be right (by definition, otherwise the incremental parser is broken). Currently, we pass through the tree and replace old tokens to point to new ones. This is because updating the old token offsets/source/inputstream/etc turns out to be quite difficult (ANTLR is designed for the tokens to be immutable). The downside is that we have to retrieve the new tokens from the new lexer. +Tree fixup is actually the most expensive part of the parser data right now, and for those who only care about text being correct, it is a waste of time. + +#### Outstanding issues + +- The (rule, startindex) map stuff can be avoided if we really want (though it's + tricky and involves trying to walk the old parse tree as we build the new one). +- The way the incremental grammar option is parsed/used in the stg file should + obviously be moved to antlr4 core. +- There is code that could be cleaned up if we included an IntervalMap datastructure + (or at least a NonOverlappingIntervalList. IntervalSet does not do what we need). To ensure i did not add dependencies, i didn't do this, but it will likely be worth it in the future. +- We currently eagerly fixup the old parse tree in IncrementalParserData, etc. We + may want to be lazier and just do it in the parser when the context gets reused instead. +- We use the parselistener interface as an easy way to ensure we get to see entry/ + exit events at the right time. This turned out to be easier than handling + recursion/left factoring through overriding the relevant parser interface pieces. +- Top level recursion contexts are now reused, but we won't reuse individaul recursion contexts yet. + + +#### References + +"Efficient and Flexible Incremental Parsing" by Tim Wagner and Susan Graham diff --git a/package.json b/package.json index d0b94a93..525062e9 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "antlr4ts-root", - "version": "0.5.0-dev", + "version": "0.5.1-dev", "description": "Root project for ANTLR 4 runtime for Typescript", "private": true, "main": "index.js", @@ -11,9 +11,10 @@ "buildtool": "cd tool && npm link", "unlinktool": "cd tool && npm unlink", "clean": "npm run unlink && git clean -idx", - "antlr4ts": "npm run antlr4ts-runtime-xpath && npm run antlr4ts-test-runtime && npm run antlr4ts-test-labels && npm run antlr4ts-test-pattern && npm run antlr4ts-test-rewriter && npm run antlr4ts-test-xpath && npm run antlr4ts-benchmark", + "antlr4ts": "npm run antlr4ts-runtime-xpath && npm run antlr4ts-test-runtime && npm run antlr4ts-test-labels && npm run antlr4ts-test-pattern && npm run antlr4ts-test-rewriter && npm run antlr4ts-test-xpath && npm run antlr4ts-test-incremental && npm run antlr4ts-benchmark", "antlr4ts-runtime-xpath": "cd src/tree/xpath && antlr4ts XPathLexer.g4 -DbaseImportPath=../..", "antlr4ts-test-runtime": "cd test/runtime && antlr4ts TestGrammar.g4 -DbaseImportPath=../../../../src -o gen/typescript_only", + "antlr4ts-test-incremental": "cd test/tool && antlr4ts TestIncremental1.g4 TestIncrementalJava.g4 -DbaseImportPath=../../../../src -o gen/incremental", "antlr4ts-test-labels": "cd test/runtime/TestReferenceToListLabels && antlr4ts T.g4 -no-listener -DbaseImportPath=antlr4ts -o gen", "antlr4ts-test-pattern": "cd test/tool && antlr4ts ParseTreeMatcherX1.g4 ParseTreeMatcherX2.g4 ParseTreeMatcherX3.g4 ParseTreeMatcherX4.g4 ParseTreeMatcherX5.g4 ParseTreeMatcherX6.g4 ParseTreeMatcherX7.g4 ParseTreeMatcherX8.g4 -no-listener -DbaseImportPath=../../../../src -o gen/matcher", "antlr4ts-test-rewriter": "cd test/tool && antlr4ts RewriterLexer1.g4 RewriterLexer2.g4 RewriterLexer3.g4 -DbaseImportPath=../../../../src -o gen/rewriter", @@ -77,7 +78,7 @@ "istanbul": "^0.4.5", "mocha": "^5.2.0", "mocha-typescript": "^1.1.14", - "nyc": "^13.1.0", + "nyc": "^13.3.0", "source-map-support": "^0.5.6", "std-mocks": "^1.0.1", "tslint": "^5.11.0", diff --git a/src/IncrementalParser.ts b/src/IncrementalParser.ts new file mode 100644 index 00000000..b9ef1a99 --- /dev/null +++ b/src/IncrementalParser.ts @@ -0,0 +1,173 @@ +/*! + * Copyright 2019 The ANTLR Project. All rights reserved. + * Licensed under the BSD-3-Clause license. See LICENSE file in the project root for license information. + */ + +import { IncrementalParserRuleContext } from "./IncrementalParserRuleContext"; +import { IncrementalTokenStream } from "./IncrementalTokenStream"; +import { Parser } from "./Parser"; +import { ParserRuleContext } from "./ParserRuleContext"; +import { IncrementalParserData } from "./IncrementalParserData"; +import { ParseTreeListener } from "./tree/ParseTreeListener"; + +/** + * Incremental parser implementation + * + * There are only two differences between this parser and the underlying regular + * Parser - guard rules and min/max tracking + * + * The guard rule API is used in incremental mode to know when a rule context + * can be reused. It looks for token changes in the bounds of the rule. + * + * The min/max tracking is used to track how far ahead/behind the parser looked + * to correctly detect whether a token change can affect a parser rule in the future (IE when + * handed to the guard rule of the next parse) + * + * @notes See IncrementalParsing.md for more details on the theory behind this. + * In order to make this easier in code generation, we use the parse listener + * interface to do most of our work. + * + */ +export abstract class IncrementalParser extends Parser + implements ParseTreeListener { + // Current parser epoch. Incremented every time a new incremental parser is created. + private static _GLOBAL_PARSER_EPOCH: number = 0; + public static get GLOBAL_PARSER_EPOCH() { + return this._GLOBAL_PARSER_EPOCH; + } + protected incrementParserEpoch() { + return ++IncrementalParser._GLOBAL_PARSER_EPOCH; + } + public parserEpoch = -1; + + private parseData: IncrementalParserData | undefined; + constructor( + input: IncrementalTokenStream, + parseData?: IncrementalParserData, + ) { + super(input); + this.parseData = parseData; + this.parserEpoch = this.incrementParserEpoch(); + // Register ourselves as our own parse listener. Life is weird. + this.addParseListener(this); + } + + // Push the current token data onto the min max stack for the stream. + private pushCurrentTokenToMinMax() { + let incStream = this.inputStream as IncrementalTokenStream; + let token = this._input.LT(1); + incStream.pushMinMax(token.tokenIndex, token.tokenIndex); + } + + // Pop the min max stack the stream is using and return the interval. + private popCurrentMinMax(ctx: IncrementalParserRuleContext) { + let incStream = this.inputStream as IncrementalTokenStream; + return incStream.popMinMax(); + } + + /** + * Guard a rule's previous context from being reused. + * + * This routine will check whether a given parser rule needs to be rerun, or if we already have context that can be + * reused for this parse. + */ + public guardRule( + parentCtx: IncrementalParserRuleContext, + state: number, + ruleIndex: number, + ): IncrementalParserRuleContext | undefined { + // If we have no previous parse data, the rule needs to be run. + if (!this.parseData) { + return undefined; + } + // See if we have seen this state before at this starting point. + let existingCtx = this.parseData.tryGetContext( + parentCtx ? parentCtx.depth() + 1 : 1, + state, + ruleIndex, + this._input.LT(1).tokenIndex, + ); + // We haven't see it, so we need to rerun this rule. + if (!existingCtx) { + return undefined; + } + // We have seen it, see if it was affected by the parse + if (this.parseData.ruleAffectedByTokenChanges(existingCtx)) { + return undefined; + } + // Everything checked out, reuse the rule context - we add it to the + // parent context as enterRule would have; + let parent = this._ctx as IncrementalParserRuleContext | undefined; + // add current context to parent if we have a parent + if (parent != null) { + parent.addChild(existingCtx); + } + return existingCtx; + } + + /** + * Pop the min max stack the stream is using and union the interval + * into the passed in context. Return the interval for the context + * + * @param ctx Context to union interval into. + */ + private popAndHandleMinMax(ctx: IncrementalParserRuleContext) { + let interval = this.popCurrentMinMax(ctx); + ctx.minMaxTokenIndex = ctx.minMaxTokenIndex.union(interval); + // Returning interval is wrong because there may have been child + // intervals already merged into this ctx. + return ctx.minMaxTokenIndex; + } + /* + This is part of the regular Parser API. + The super method must be called. + */ + + /** + * The new recursion context is an unfortunate edge case for us. + * It reparents the relationship between the contexts, + * so we need to merge intervals here. + */ + public pushNewRecursionContext( + localctx: ParserRuleContext, + state: number, + ruleIndex: number, + ): void { + // This context becomes the child + let previous = this._ctx as IncrementalParserRuleContext; + // The incoming context becomes the parent + let incLocalCtx = localctx as IncrementalParserRuleContext; + incLocalCtx.minMaxTokenIndex = incLocalCtx.minMaxTokenIndex.union( + previous.minMaxTokenIndex, + ); + super.pushNewRecursionContext(localctx, state, ruleIndex); + } + + /* + These two functions are parse of the ParseTreeListener API. + We do not need to call super methods + */ + + public enterEveryRule(ctx: ParserRuleContext) { + // During rule entry, we push a new min/max token state. + this.pushCurrentTokenToMinMax(); + let incCtx = ctx as IncrementalParserRuleContext; + incCtx.epoch = this.parserEpoch; + } + public exitEveryRule(ctx: ParserRuleContext) { + // On exit, we need to merge the min max into the current context, + // and then merge the current context interval into our parent. + + // First merge with the interval on the top of the stack. + let incCtx = ctx as IncrementalParserRuleContext; + let interval = this.popAndHandleMinMax(incCtx); + + // Now merge with our parent interval. + if (incCtx._parent) { + let parentIncCtx = incCtx._parent as IncrementalParserRuleContext; + parentIncCtx.minMaxTokenIndex = parentIncCtx.minMaxTokenIndex.union( + interval, + ); + } + } +} diff --git a/src/IncrementalParserData.ts b/src/IncrementalParserData.ts new file mode 100644 index 00000000..6224f374 --- /dev/null +++ b/src/IncrementalParserData.ts @@ -0,0 +1,396 @@ +import { CommonToken } from "./CommonToken"; +import { CommonTokenStream } from "./CommonTokenStream"; +import { IncrementalParserRuleContext } from "./IncrementalParserRuleContext"; +import { IncrementalTokenStream } from "./IncrementalTokenStream"; +import { Interval } from "./misc/Interval"; +import { ParserRuleContext } from "./ParserRuleContext"; +import { Token } from "./Token"; +import { ParseTreeListener } from "./tree/ParseTreeListener"; +import { ParseTreeWalker } from "./tree/ParseTreeWalker"; + +// This is a binary search variant, but instead of looking for a specific individual number, +// we are looking to see if any of the values in the list fall into a given range. +// Binary search through the changed token list looking for the a number with a +// value >= rangeLow and <= rangeHigh. Terminate and return a value if we find +// one. Return undefined if we did not find anything. +function findChangedTokenInRange( + array: number[], + rangeLow: number, + rangeHigh: number, +) { + let low: number = 0; + let high: number = array.length - 1; + + while (low <= high) { + let mid: number = (low + high) >>> 1; + let midVal: number = array[mid]; + + if (midVal >= rangeLow) { + // If we found something in the range, terminate. + // Otherwise keep moving left. + if (midVal <= rangeHigh) { + return mid; + } + high = mid - 1; + } else { + low = mid + 1; + } + } + return -1; +} + +// Given a token index in the old token stream, and an array of token changes, see what the +// new token index should be. +function findAdjustedTokenIndex(array: TokenOffsetRange[], tokenIndex: number) { + let low: number = 0; + let high: number = array.length - 1; + + while (low <= high) { + let mid: number = (low + high) >>> 1; + let midVal: TokenOffsetRange = array[mid]; + // Ranges are not overlapping so if it contains this token, it is the correct offset. + if (tokenIndex >= midVal.interval.a) { + // If we found something in the range, terminate. + if (tokenIndex <= midVal.interval.b) { + return tokenIndex + midVal.indexOffset; + } + low = mid + 1; + } else { + high = mid - 1; + } + } + return undefined; +} + +/* This interface stores data about the offsets between tokens in the new and old stream */ + +interface TokenOffsetRange { + interval: Interval; + indexOffset: number; +} + +/** + * Definition of a token change: + * ADDED = A new token that did not exist before + * CHANGED = A token that was in the stream before but changed in some way. + * REMOVED = A token that no longer exists in the stream. + * + * Token changes may *not* overlap. + * You also need to account for hidden tokens (but not *skipped* ones). + */ +export enum TokenChangeType { + ADDED, + CHANGED, + REMOVED, +} +export interface TokenChange { + changeType: TokenChangeType; + newToken?: CommonToken; + oldToken?: CommonToken; +} + +/** + * + * This class computes and stores data needed by the incremental parser. + * It is fairly unoptimized ATM to make things obvious and hopefully less broken. + * + * Please note: This class expects to own the parse tree passed in, + * and will modify it. + * Please clone them if you need them to remain unmodified for some reason. + */ +export class IncrementalParserData { + private tokenStream: IncrementalTokenStream; + /* This mapping gives you a range and token index offset to be applied for + that range. It is used to figure out what token in the new stream to + look at for a given token in the old stream. */ + private tokenOffsets: TokenOffsetRange[]; + + /* This is the set of tokens that changed in any way. We do not use IntervalSet ATM, + we would need a function that operates like the binary search. Since IntervalSet + is a port, i didn't want to modify it. */ + private changedTokens: number[] = []; + + /* This is the set of token changes that were specified by the user. */ + private tokenChanges: TokenChange[] | undefined; + + /* This maps from rule number, starting token index, to context we've seen before. + We can't use a nice interface type here as the key because of how map equality + works in ES right now. Hopefully ES7 will fix this. */ + private ruleStartMap = new Map(); + + constructor(); + constructor( + tokenStream: IncrementalTokenStream, + tokenChanges: TokenChange[], + oldTree: IncrementalParserRuleContext, + ); + constructor( + tokenStream?: IncrementalTokenStream, + tokenChanges?: TokenChange[], + oldTree?: IncrementalParserRuleContext, + ) { + this.tokenChanges = tokenChanges; + if (tokenChanges) { + this.tokenStream = tokenStream!; + this.computeTokenOffsetRanges(oldTree!.maxTokenIndex); + this.indexAndAdjustParseTree(oldTree!); + } + } + + /** + * Take the set of token changes the user specified and convert it into two things: + * 1. A list of changed tokens + * 2. A set of ranges that say how tokenIndexes that appear in the old stream + * will have changed in the new stream. IE if a token was removed, the tokens after + * would appear at originalIndex - 1 in the new stream. + * + * @param maxOldTokenIndex The maximum token index we may see in the old stream. + * This is used as the upper bound of the last range. + */ + private computeTokenOffsetRanges(maxOldTokenIndex: number) { + if (!this.tokenChanges || this.tokenChanges.length === 0) { + return new Map(); + } + // Construct ranges for the token change offsets, and changed token intervals. + let indexOffset = 0; + let tokenOffsets: TokenOffsetRange[] = []; + for (let tokenChange of this.tokenChanges) { + let indexToPush = 0; + if (tokenChange.changeType === TokenChangeType.CHANGED) { + this.changedTokens.push(tokenChange.newToken!.tokenIndex); + // We only need to add this to changed tokens, it doesn't + // change token indexes. + continue; + } + // If a token changed, adjust the index the tokens after it + else if (tokenChange.changeType === TokenChangeType.REMOVED) { + this.changedTokens.push( + tokenChange.oldToken!.tokenIndex + indexOffset, + ); + + // The indexes move back one to account for the removed token. + indexOffset -= 1; + indexToPush = tokenChange.oldToken!.tokenIndex; + } else if (tokenChange.changeType === TokenChangeType.ADDED) { + this.changedTokens.push(tokenChange.newToken!.tokenIndex); + // The indexes move forward one to account for the removed token. + indexOffset += 1; + indexToPush = tokenChange.newToken!.tokenIndex; + } + // End the previous range at the token index right before us + if (tokenOffsets.length !== 0) { + let lastIdx = tokenOffsets.length - 1; + let lastItem = tokenOffsets[lastIdx]; + lastItem.interval = Interval.of( + lastItem.interval.a, + indexToPush - 1, + ); + } + // Push the range this change starts at, and what the effect is on + // the index. + tokenOffsets.push({ + indexOffset, + interval: Interval.of(indexToPush, indexToPush), + }); + } + // End the final range at length of the old token stream. That is the + // last possible thing we need to offset. + if (tokenOffsets.length !== 0) { + let lastIdx = tokenOffsets.length - 1; + let lastItem = tokenOffsets[lastIdx]; + lastItem.interval = Interval.of( + lastItem.interval.a, + maxOldTokenIndex, + ); + } + + this.tokenOffsets = tokenOffsets; + } + + /** + * Determine whether a given parser rule is affected by changes to the token stream. + * @param ctx Current parser context coming into a rule. + */ + public ruleAffectedByTokenChanges(ctx: IncrementalParserRuleContext) { + // If we never got passed data, reparse everything. + if (!this.tokenChanges) { + return true; + } + // However if there are no changes, the rule is fine + if (this.tokenChanges.length === 0) { + return false; + } + + // See if any changed token exists in our upper, lower bounds. + let start = ctx.minTokenIndex; + let end = ctx.maxTokenIndex; + let result = findChangedTokenInRange(this.changedTokens, start, end); + if (result !== -1) { + return true; + } + + return false; + } + /** + * Try to see if we have existing context for this state, rule and token position that may be reused. + * + * @param depth Current rule depth + * @param state Parser state number - currently ignored. + * @param ruleIndex Rule number + * @param tokenIndex Token index in the *new* token stream + */ + public tryGetContext( + depth: number, + state: number, + ruleIndex: number, + tokenIndex: number, + ) { + return this.ruleStartMap.get( + this.getKey(depth, state, ruleIndex, tokenIndex), + ); + } + + private getKeyFromContext(ctx: IncrementalParserRuleContext) { + return this.getKey( + ctx.depth(), + ctx.invokingState, + ctx.ruleIndex, + ctx.start.tokenIndex, + ); + } + private getKey( + depth: number, + state: number, + rule: number, + tokenIndex: number, + ) { + return `${depth},${rule},${tokenIndex}`; + } + /** + * Index a given parse tree and adjust the min/max ranges + * @param tree Parser context to adjust + */ + private indexAndAdjustParseTree(tree: IncrementalParserRuleContext) { + // This is a quick way of indexing the parse tree by start. We actually + // could walk the old parse tree as the parse proceeds. This is left as + // a future optimization. We also could just allow passing in + // constructed maps if this turns out to be slow. + this.tokenStream.fill(); + let listener = new IncrementalParserData.ParseTreeProcessor(this); + ParseTreeWalker.DEFAULT.walk(listener, tree); + } + + // We use a class expression so we can access private members of IncrementalData + private static ParseTreeProcessor = + /** + * This class does two things: + * 1. Simple indexer to record the rule index and token index start of each rule. + * 2. Adjust the min max token ranges for any necessary offsets. + */ + class ParseTreeProcessor implements ParseTreeListener { + private incrementalData: IncrementalParserData; + private tokenStream: IncrementalTokenStream; + private tokenOffsets: TokenOffsetRange[]; + private ruleStartMap: Map; + constructor(incrementalData: IncrementalParserData) { + this.incrementalData = incrementalData; + this.tokenStream = incrementalData.tokenStream; + this.tokenOffsets = incrementalData.tokenOffsets; + this.ruleStartMap = incrementalData.ruleStartMap; + } + + /** + * Given a token index the old stream, figure out the token it would + * be in the new stream and return it. If we don't need token + * adjustment, return nothing. + * @param oldTokenIndex Token index in old stream. + */ + private getAdjustedToken(oldTokenIndex: number): Token | undefined { + let newTokenIndex = findAdjustedTokenIndex( + this.tokenOffsets, + oldTokenIndex, + ); + if (newTokenIndex !== undefined) { + let syncableStream = this.tokenStream; + // We filled the stream before the walk + return syncableStream.get(newTokenIndex); + } + return undefined; + } + + /** + * Adjust the minimum/maximum token index that appears in a rule context. + * Like other functions, this simply converts the token indexes from how they + * appear in the old stream to how they would appear in the new stream. + * + * @param ctx Parser context to adjust. + */ + private adjustMinMax(ctx: IncrementalParserRuleContext) { + let changed = false; + let newMin = ctx.minTokenIndex; + let newToken = this.getAdjustedToken(newMin); + if (newToken) { + newMin = newToken.tokenIndex; + changed = true; + } + + let newMax = ctx.maxTokenIndex; + newToken = this.getAdjustedToken(newMax); + + if (newToken) { + newMax = newToken.tokenIndex; + changed = true; + } + + if (changed) { + ctx.minMaxTokenIndex = Interval.of(newMin, newMax); + } + } + + /** + * Adjust the start/stop token indexes of a rule to take into account + * position changes in the token stream. + * + * @param ctx The rule context to adjust the start/stop tokens of. + */ + private adjustStartStop(ctx: IncrementalParserRuleContext) { + let newToken = this.getAdjustedToken(ctx.start.tokenIndex); + if (newToken) { + ctx._start = newToken; + } + + if (ctx.stop) { + let newToken = this.getAdjustedToken(ctx.stop.tokenIndex); + if (newToken) { + ctx._stop = newToken; + } + } + } + + /** + * Main entry point for this walker. + */ + public enterEveryRule(ctx: ParserRuleContext) { + let incCtx = ctx as IncrementalParserRuleContext; + // Don't bother adjusting rule contexts that we can't possibly + // reuse. Also don't touch contexts without an epoch. They must + // represent something the incremental parser never saw, + // since it sets epochs on all contexts it touches. + if (incCtx.epoch === -1) { + return; + } + let mayNeedAdjustment = + this.tokenOffsets && this.tokenOffsets.length !== 0; + if (mayNeedAdjustment) { + this.adjustMinMax(incCtx); + } + if (!this.incrementalData.ruleAffectedByTokenChanges(incCtx)) { + if (mayNeedAdjustment) { + this.adjustStartStop(incCtx); + } + let key = this.incrementalData.getKeyFromContext(incCtx); + this.ruleStartMap.set(key, incCtx); + } + } + }; +} diff --git a/src/IncrementalParserRuleContext.ts b/src/IncrementalParserRuleContext.ts new file mode 100644 index 00000000..fae4cb4f --- /dev/null +++ b/src/IncrementalParserRuleContext.ts @@ -0,0 +1,72 @@ +/*! + * Copyright 2019 The ANTLR Project. All rights reserved. + * Licensed under the BSD-3-Clause license. See LICENSE file in the project root for license information. + */ + +import { ParserRuleContext } from "./ParserRuleContext"; +import { Interval } from "./misc/Interval"; +import { RuleContext } from "./RuleContext"; + +export class IncrementalParserRuleContext extends ParserRuleContext { + /* Avoid having to recompute depth on every single depth call */ + private cachedDepth: number | undefined = undefined; + private cachedParent: RuleContext | undefined = undefined; + + // This is an epoch number that can be used to tell which pieces were + // modified during a given incremental parse. The incremental parser + // adds the current epoch number to all rule contexts it creates. + // The epoch number is incremented every time a new parser instance is created. + public epoch: number; + + // The interval that stores the min/max token we touched during lookahead/lookbehind + private _minMaxTokenIndex: Interval = Interval.of( + Number.MAX_SAFE_INTEGER, + Number.MIN_SAFE_INTEGER, + ); + + /** + * Get the minimum token index this rule touched. + */ + get minTokenIndex(): number { + return this._minMaxTokenIndex.a; + } + /** + * Get the maximum token index this rule touched. + */ + get maxTokenIndex(): number { + return this._minMaxTokenIndex.b; + } + /** + * Get the interval this rule touched. + */ + get minMaxTokenIndex(): Interval { + return this._minMaxTokenIndex; + } + set minMaxTokenIndex(index: Interval) { + this._minMaxTokenIndex = index; + } + + /** + * Compute the depth of this context in the parse tree. + * + * @notes The incremental parser uses a caching implemntation. + * + */ + public depth(): number { + if ( + this.cachedParent !== undefined && + this.cachedParent === this._parent + ) { + return this.cachedDepth as number; + } + let n = 1; + if (this._parent) { + let parentDepth = this._parent.depth(); + this.cachedParent = this._parent; + this.cachedDepth = n = parentDepth + 1; + } else { + this.cachedDepth = n = 1; + } + return n; + } +} diff --git a/src/IncrementalTokenStream.ts b/src/IncrementalTokenStream.ts new file mode 100644 index 00000000..170d4406 --- /dev/null +++ b/src/IncrementalTokenStream.ts @@ -0,0 +1,80 @@ +/*! + * Copyright 2019 The ANTLR Project. All rights reserved. + * Licensed under the BSD-3-Clause license. See LICENSE file in the project root for license information. + */ + +import { CommonTokenStream } from "./CommonTokenStream"; +import { Token } from "./Token"; +import { CommonToken } from "./CommonToken"; +import { Interval } from "./misc/Interval"; + +export class IncrementalTokenStream extends CommonTokenStream { + /** + * ANTLR looks at the same tokens alot, and this avoids recalculating the + * interval when the position and lookahead number doesn't move. + */ + private lastP: number = -1; + private lastK: number = -1; + + /** + * This tracks the min/max token index looked at since the value was reset. + * This is used to track how far ahead the grammar looked, since it may be + * outside the rule context's start/stop tokens. + * We need to maintain a stack of such indices. + */ + + private minMaxStack: Interval[] = []; + + /** + * Push a new minimum/maximum token state. + * @param min Minimum token index + * @param max Maximum token index + */ + public pushMinMax(min: number, max: number) { + this.minMaxStack.push(Interval.of(min, max)); + } + + /** + * Pop the current minimum/maximum token state and return it. + */ + public popMinMax(): Interval { + if (this.minMaxStack.length === 0) { + throw new RangeError( + "Can't pop the min max state when there are 0 states", + ); + } + return this.minMaxStack.pop()!; + } + + /** + * Return the number of items on the minimum/maximum token state stack. + */ + public minMaxSize() { + return this.minMaxStack.length; + } + + /** + * This is an override of the base LT function that tracks the minimum/maximum token index looked at. + */ + public LT(k: number): Token { + let result = super.LT(k); + // Adjust the top of the minimum maximum stack if the position/lookahead amount changed. + if ( + this.minMaxStack.length !== 0 && + (this.lastP !== this.p || this.lastK !== k) + ) { + let lastIdx = this.minMaxStack.length - 1; + let stackItem = this.minMaxStack[lastIdx]; + this.minMaxStack[lastIdx] = stackItem.union( + Interval.of(result.tokenIndex, result.tokenIndex), + ); + + this.lastP = this.p; + this.lastK = k; + } + return result; + } + public getTokens(): CommonToken[] { + return super.getTokens() as CommonToken[]; + } +} diff --git a/src/index.ts b/src/index.ts index 316e8483..7157055c 100644 --- a/src/index.ts +++ b/src/index.ts @@ -22,6 +22,10 @@ export * from "./Dependents"; export * from "./DiagnosticErrorListener"; export * from "./FailedPredicateException"; export * from "./InputMismatchException"; +export * from "./IncrementalParser"; +export * from "./IncrementalParserData"; +export * from "./IncrementalParserRuleContext"; +export * from "./IncrementalTokenStream"; export * from "./InterpreterRuleContext"; export * from "./IntStream"; export * from "./Lexer"; diff --git a/test/tool/TestIncremental.ts b/test/tool/TestIncremental.ts new file mode 100644 index 00000000..0a2df2e8 --- /dev/null +++ b/test/tool/TestIncremental.ts @@ -0,0 +1,625 @@ +/*! + * Copyright 2016 The ANTLR Project. All rights reserved. + * Licensed under the BSD-3-Clause license. See LICENSE file in the project root for license information. + */ + +import * as assert from "assert"; +import { suite, test as Test } from "mocha-typescript"; +import { CharStreams } from "../../src/CharStreams"; +import { CommonToken } from "../../src/CommonToken"; +import { IncrementalParser } from "../../src/IncrementalParser"; +import { + IncrementalParserData, + TokenChangeType, +} from "../../src/IncrementalParserData"; +import { IncrementalParserRuleContext } from "../../src/IncrementalParserRuleContext"; +import { IncrementalTokenStream } from "../../src/IncrementalTokenStream"; +import { XPath } from "../../src/tree/xpath/XPath"; +import { TestIncremental1Lexer } from "./gen/incremental/TestIncremental1Lexer"; +import { + DigitsContext, + IdentifierContext, + TestIncremental1Parser, +} from "./gen/incremental/TestIncremental1Parser"; +import { TestIncrementalJavaLexer } from "./gen/incremental/TestIncrementalJavaLexer"; +import { + ClassOrInterfaceModifiersContext, + ExpressionContext, + FormalParametersContext, + LiteralContext, + ModifiersContext, + TestIncrementalJavaParser, +} from "./gen/incremental/TestIncrementalJavaParser"; + +const SAMPLE_TEXT_1 = "foo 5555 foo 5555 foo"; +const EXPECTED_TREE_1 = + "(program (identifier foo) (digits 5555) (identifier foo) (digits 5555) (identifier foo))"; +// In all of our expectations, the reused pieces come first and the modified pieces are after them. +const CHILD_EXPECTATIONS_1: ContextExpectation[] = [ + { + class: IdentifierContext, + startTokenIndex: 0, + stopTokenIndex: 0, + tree: "(identifier foo)", + }, + { + class: DigitsContext, + startTokenIndex: 2, + stopTokenIndex: 2, + tree: "(digits 5555)", + }, + { + class: IdentifierContext, + startTokenIndex: 4, + stopTokenIndex: 4, + tree: "(identifier foo)", + }, + { + class: DigitsContext, + startTokenIndex: 6, + stopTokenIndex: 6, + tree: "(digits 5555)", + }, + { + class: IdentifierContext, + startTokenIndex: 8, + stopTokenIndex: 8, + tree: "(identifier foo)", + }, +]; +const SAMPLE_DELETED_TEXT_2 = "foo 5555 5555 foo"; +const EXPECTED_TREE_2 = + "(program (identifier foo) (digits 5555) (digits 5555) (identifier foo))"; + +const CHILD_EXPECTATIONS_2: ContextExpectation[] = [ + { + class: IdentifierContext, + startTokenIndex: 0, + stopTokenIndex: 0, + tree: "(identifier foo)", + }, + { + class: DigitsContext, + startTokenIndex: 2, + stopTokenIndex: 2, + tree: "(digits 5555)", + }, + { + class: DigitsContext, + startTokenIndex: 4, + stopTokenIndex: 4, + tree: "(digits 5555)", + }, + { + class: IdentifierContext, + startTokenIndex: 6, + stopTokenIndex: 6, + tree: "(identifier foo)", + }, +]; +const SAMPLE_ADDED_TEXT_3 = "foo 5555 foo 5555 foo foo"; +const EXPECTED_TREE_3 = + "(program (identifier foo) (digits 5555) (identifier foo) (digits 5555) (identifier foo) (identifier foo))"; + +const CHILD_EXPECTATIONS_3: ContextExpectation[] = [ + { + class: IdentifierContext, + startTokenIndex: 0, + stopTokenIndex: 0, + tree: "(identifier foo)", + }, + { + class: DigitsContext, + startTokenIndex: 2, + stopTokenIndex: 2, + tree: "(digits 5555)", + }, + { + class: IdentifierContext, + startTokenIndex: 4, + stopTokenIndex: 4, + tree: "(identifier foo)", + }, + { + class: DigitsContext, + startTokenIndex: 6, + stopTokenIndex: 6, + tree: "(digits 5555)", + }, + { + class: IdentifierContext, + startTokenIndex: 8, + stopTokenIndex: 8, + tree: "(identifier foo)", + }, + { + class: IdentifierContext, + startTokenIndex: 10, + stopTokenIndex: 10, + tree: "(identifier foo)", + }, +]; +interface ContextExpectation { + tree: string; + startTokenIndex: number; + stopTokenIndex: number; + class: any; // Instanceof requires this be an any + epoch?: number; +} + +interface XPathExpectation { + xpathRule: string; + tree: string; + class: any; + epoch?: number; +} +const JAVA_PROGRAM_1 = + '\npublic class HelloWorld {\n\n public static void main(String[] args) {\n // Prints "Hello, World" to the terminal window.\n System.out.println("Hello, World");\n }\n\n}\n'; +const JAVA_EXPECTED_TREE_1 = + '(compilationUnit (typeDeclaration (classOrInterfaceDeclaration (classOrInterfaceModifiers (classOrInterfaceModifier public)) (classDeclaration (normalClassDeclaration class HelloWorld (classBody { (classBodyDeclaration (modifiers (modifier public) (modifier static)) (memberDecl void main (methodDeclaratorRest (formalParameters ( (formalParameterDecls variableModifiers (type (classOrInterfaceType String) [ ]) (formalParameterDeclsRest (variableDeclaratorId args))) )) (methodBody (block { (blockStatement (statement (statementExpression (expression (expression (expression (expression (primary System)) . out) . println) ( (expressionList (expression (primary (literal "Hello, World")))) ))) ;)) }))))) }))))) )'; +const JAVA_PROGRAM_2 = + '\npublic class HelloWorld {\n\n public static void main(String[] args) {\n // Prints "Hello, World" to the terminal window.\n System.out.println("Hello");\n }\n\n}\n'; +const JAVA_EXPECTED_TREE_2 = + '(compilationUnit (typeDeclaration (classOrInterfaceDeclaration (classOrInterfaceModifiers (classOrInterfaceModifier public)) (classDeclaration (normalClassDeclaration class HelloWorld (classBody { (classBodyDeclaration (modifiers (modifier public) (modifier static)) (memberDecl void main (methodDeclaratorRest (formalParameters ( (formalParameterDecls variableModifiers (type (classOrInterfaceType String) [ ]) (formalParameterDeclsRest (variableDeclaratorId args))) )) (methodBody (block { (blockStatement (statement (statementExpression (expression (expression (expression (expression (primary System)) . out) . println) ( (expressionList (expression (primary (literal "Hello")))) ))) ;)) }))))) }))))) )'; +const JAVA_PROGRAM_2_EXPECTATIONS: XPathExpectation[] = [ + { + class: ClassOrInterfaceModifiersContext, + tree: "(classOrInterfaceModifiers (classOrInterfaceModifier public))", + xpathRule: "//classOrInterfaceModifiers", + }, + { + class: FormalParametersContext, + tree: + "(formalParameters ( (formalParameterDecls variableModifiers (type (classOrInterfaceType String) [ ]) (formalParameterDeclsRest (variableDeclaratorId args))) ))", + xpathRule: "//formalParameters", + }, + { + class: ModifiersContext, + tree: "(modifiers (modifier public) (modifier static))", + xpathRule: "//modifiers", + }, + /* This requires reusing individual recursion contexts */ + /* + { + class: ExpressionContext, + tree: "System.out.println", + xpathRule: "//statementExpression/expression/expression", + },*/ + { + class: LiteralContext, + tree: '(literal "Hello")', + xpathRule: "//expression/primary/literal", + }, +]; +// We have to disable the unsafe any warning because instanceof requires an any type on the RHS. +/* tslint:disable:no-unsafe-any */ +@suite +export class TestIncremental { + // Verify a set of xpath expectations against the parse tree + private verifyXPathExpectations( + parser: IncrementalParser, + parseTree: IncrementalParserRuleContext, + expectations: XPathExpectation[], + ) { + for (let expectation of expectations) { + for (let XPathMatch of XPath.findAll( + parseTree, + expectation.xpathRule, + parser, + )) { + assert.ok( + XPathMatch instanceof expectation.class, + "Class of context is wrong", + ); + + let incCtx = XPathMatch as IncrementalParserRuleContext; + assert.strictEqual( + incCtx.toStringTree(parser), + expectation.tree, + "Tree of context is wrong", + ); + if (expectation.epoch) { + assert.strictEqual( + incCtx.epoch, + expectation.epoch, + "Epoch of context is wrong", + ); + } + } + } + } + // Verify a set of context expectations against an array of contexts + private verifyContextExpectations( + parser: IncrementalParser, + data: IncrementalParserRuleContext[], + expectations: ContextExpectation[], + ) { + assert.strictEqual(expectations.length, data.length); + for (let i = 0; i < expectations.length; ++i) { + assert.ok( + data[i] instanceof expectations[i].class, + `Class of context ${i} is wrong`, + ); + assert.strictEqual( + data[i].start.tokenIndex, + expectations[i].startTokenIndex, + `Start token of context ${i} is wrong`, + ); + assert.ok(data[i].stop); + assert.strictEqual( + data[i].stop!.tokenIndex, + expectations[i].stopTokenIndex, + `Stop token of context ${i} is wrong`, + ); + assert.strictEqual( + data[i].toStringTree(parser), + expectations[i].tree, + `Tree of context ${i} is wrong`, + ); + if (expectations[i].epoch) { + assert.strictEqual( + data[i].epoch, + expectations[i].epoch, + `Epoch of context ${i} is wrong`, + ); + } + } + } + // Test that the incremental parser works in non-incremental mode. + @Test public testBasicIncrementalParse(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(SAMPLE_TEXT_1); + let lexer = new TestIncremental1Lexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncremental1Parser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.program(); + + // Make sure the parse tree text is right + assert.strictEqual(firstTree.toStringTree(parser), EXPECTED_TREE_1); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 8); + assert.strictEqual(firstTree.childCount, 5); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + this.verifyContextExpectations( + parser, + firstTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_1, + ); + } + + // Test that reparsing with no changes reuses the parse tree + @Test public testBasicIncrementalReparse(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(SAMPLE_TEXT_1); + let lexer = new TestIncremental1Lexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncremental1Parser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.program(); + + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_1) { + expectation.epoch = startingEpoch; + } + + // Make sure the parse tree text is right + assert.strictEqual(firstTree.toStringTree(parser), EXPECTED_TREE_1); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 8); + assert.strictEqual(firstTree.childCount, 5); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + this.verifyContextExpectations( + parser, + firstTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_1, + ); + + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_1) { + expectation.epoch = startingEpoch; + } + // Reparse with no changes + inputStream = CharStreams.fromString(SAMPLE_TEXT_1); + lexer = new TestIncremental1Lexer(inputStream); + tokenStream = new IncrementalTokenStream(lexer); + let parserData = new IncrementalParserData(tokenStream, [], firstTree); + parser = new TestIncremental1Parser(tokenStream, parserData); + let secondTree = parser.program(); + // Make sure the parse tree text is right + assert.strictEqual(secondTree.toStringTree(parser), EXPECTED_TREE_1); + + // Make sure the parse tree looks correct + assert.strictEqual(secondTree.start.tokenIndex, 0); + assert.strictEqual(secondTree.stop!.tokenIndex, 8); + assert.strictEqual(secondTree.childCount, 5); + assert.ok(secondTree instanceof IncrementalParserRuleContext); + + // All data should have come from the original parser + // Verify that and that the current epoch was incremented + assert.strictEqual(secondTree.epoch, startingEpoch); + this.verifyContextExpectations( + parser, + secondTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_1, + ); + } + + // Test that reparsing with a delete reuses data not deleted. + @Test public testBasicIncrementalDeleteWithWhitespace(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(SAMPLE_TEXT_1); + let lexer = new TestIncremental1Lexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncremental1Parser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.program(); + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_1) { + expectation.epoch = startingEpoch; + } + // Make sure the parse tree text is right + assert.strictEqual(firstTree.toStringTree(parser), EXPECTED_TREE_1); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 8); + assert.strictEqual(firstTree.childCount, 5); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + this.verifyContextExpectations( + parser, + firstTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_1, + ); + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_2) { + expectation.epoch = startingEpoch; + } + // Reparse with a delete + let oldTokens = tokenStream.getTokens(); + inputStream = CharStreams.fromString(SAMPLE_DELETED_TEXT_2); + lexer = new TestIncremental1Lexer(inputStream); + tokenStream = new IncrementalTokenStream(lexer); + // Note that the whitespace tokens must be marked even though they are hidden from the parser. + let parserData = new IncrementalParserData( + tokenStream, + [ + { changeType: TokenChangeType.REMOVED, oldToken: oldTokens[3] }, + { changeType: TokenChangeType.REMOVED, oldToken: oldTokens[4] }, + ], + firstTree, + ); + parser = new TestIncremental1Parser(tokenStream, parserData); + let secondEpoch = parser.parserEpoch; + let secondTree = parser.program(); + // Make sure the parse tree text is right + assert.strictEqual(secondTree.toStringTree(parser), EXPECTED_TREE_2); + + // Make sure the parse tree looks correct + assert.strictEqual(secondTree.start.tokenIndex, 0); + assert.strictEqual(secondTree.stop!.tokenIndex, 6); + assert.strictEqual(secondTree.childCount, 4); + assert.ok(secondTree instanceof IncrementalParserRuleContext); + + // All data should have come from the original parser + // Verify that and that the current epoch was incremented + assert.strictEqual(secondTree.epoch, secondEpoch); + this.verifyContextExpectations( + parser, + secondTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_2, + ); + } + // Test that reparsing with an add reuses the non-added parts. + @Test public testBasicIncrementalAddWithWhitespace(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(SAMPLE_TEXT_1); + let lexer = new TestIncremental1Lexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncremental1Parser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.program(); + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_1) { + expectation.epoch = startingEpoch; + } + // Make sure the parse tree text is right + assert.strictEqual(firstTree.toStringTree(parser), EXPECTED_TREE_1); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 8); + assert.strictEqual(firstTree.childCount, 5); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + this.verifyContextExpectations( + parser, + firstTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_1, + ); + + // Reparse with a delete + let oldTokens = tokenStream.getTokens(); + inputStream = CharStreams.fromString(SAMPLE_ADDED_TEXT_3); + lexer = new TestIncremental1Lexer(inputStream); + tokenStream = new IncrementalTokenStream(lexer); + // Force load all tokens + tokenStream.fill(); + // Note that the whitespace tokens must be marked even though they are hidden from the parser. + let parserData = new IncrementalParserData( + tokenStream, + [ + { + changeType: TokenChangeType.ADDED, + newToken: tokenStream.get(9) as CommonToken, + }, + { + changeType: TokenChangeType.ADDED, + newToken: tokenStream.get(10) as CommonToken, + }, + ], + firstTree, + ); + parser = new TestIncremental1Parser(tokenStream, parserData); + let secondEpoch = parser.parserEpoch; + // Add the correct epoch to all expectations + for (let expectation of CHILD_EXPECTATIONS_3) { + expectation.epoch = startingEpoch; + } + CHILD_EXPECTATIONS_3[ + CHILD_EXPECTATIONS_3.length - 1 + ].epoch = secondEpoch; + + let secondTree = parser.program(); + // Make sure the parse tree text is right + assert.strictEqual(secondTree.toStringTree(parser), EXPECTED_TREE_3); + + // Make sure the parse tree looks correct + assert.strictEqual(secondTree.start.tokenIndex, 0); + assert.strictEqual(secondTree.stop!.tokenIndex, 10); + assert.strictEqual(secondTree.childCount, 6); + assert.ok(secondTree instanceof IncrementalParserRuleContext); + + // All data should have come from the original parser + // Verify that and that the current epoch was incremented + assert.strictEqual(secondTree.epoch, secondEpoch); + this.verifyContextExpectations( + parser, + secondTree.children as IncrementalParserRuleContext[], + CHILD_EXPECTATIONS_3, + ); + } + + // Test that the incremental parser works in non-incremental mode. + @Test public testJavaIncrementalParse(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(JAVA_PROGRAM_1); + let lexer = new TestIncrementalJavaLexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncrementalJavaParser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.compilationUnit(); + + // Make sure the parse tree text is right + assert.strictEqual( + firstTree.toStringTree(parser), + JAVA_EXPECTED_TREE_1, + ); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 26); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + } + + // Test that reparsing with no changes reuses the parse tree + @Test public testJavaIncrementalReparse(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(JAVA_PROGRAM_1); + let lexer = new TestIncrementalJavaLexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncrementalJavaParser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.compilationUnit(); + + // Make sure the parse tree text is right + assert.strictEqual( + firstTree.toStringTree(parser), + JAVA_EXPECTED_TREE_1, + ); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 26); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + + // Reparse with no changesinputStream = CharStreams.fromString(JAVA_PROGRAM_1); + lexer = new TestIncrementalJavaLexer(inputStream); + tokenStream = new IncrementalTokenStream(lexer); + let parserData = new IncrementalParserData(tokenStream, [], firstTree); + parser = new TestIncrementalJavaParser(tokenStream, parserData); + let secondTree = parser.compilationUnit(); + // Make sure the parse tree text is right + assert.strictEqual( + secondTree.toStringTree(parser), + JAVA_EXPECTED_TREE_1, + ); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 26); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + + // All data should have come from the original parser + // Verify that and that the current epoch was incremented + assert.strictEqual(secondTree.epoch, startingEpoch); + // Verify the first and second trees are exactly the same. + assert.deepStrictEqual(firstTree, secondTree); + } + // Test that reparsing with a changed token reuses everything but one piece of the parse tree + @Test public testJavaReparseWithChange(): void { + // Create a parser and verify the result is sane + let inputStream = CharStreams.fromString(JAVA_PROGRAM_1); + let lexer = new TestIncrementalJavaLexer(inputStream); + let tokenStream = new IncrementalTokenStream(lexer); + let parser = new TestIncrementalJavaParser(tokenStream); + let startingEpoch = parser.parserEpoch; + let firstTree = parser.compilationUnit(); + + // Make sure the parse tree text is right + assert.strictEqual( + firstTree.toStringTree(parser), + JAVA_EXPECTED_TREE_1, + ); + + // Make sure the parse tree looks correct + assert.strictEqual(firstTree.start.tokenIndex, 0); + assert.strictEqual(firstTree.stop!.tokenIndex, 26); + assert.ok(firstTree instanceof IncrementalParserRuleContext); + let oldTokens = tokenStream.getTokens(); + inputStream = CharStreams.fromString(JAVA_PROGRAM_2); + lexer = new TestIncrementalJavaLexer(inputStream); + tokenStream = new IncrementalTokenStream(lexer); + tokenStream.fill(); + let parserData = new IncrementalParserData( + tokenStream, + [ + { + changeType: TokenChangeType.CHANGED, + newToken: tokenStream.get(21) as CommonToken, + oldToken: oldTokens[21], + }, + ], + firstTree, + ); + parser = new TestIncrementalJavaParser(tokenStream, parserData); + let secondEpoch = parser.parserEpoch; + let secondTree = parser.compilationUnit(); + // Make sure the parse tree text is right + assert.strictEqual( + secondTree.toStringTree(parser), + JAVA_EXPECTED_TREE_2, + ); + + // Make sure the parse tree looks correct + assert.strictEqual(secondTree.start.tokenIndex, 0); + assert.strictEqual(secondTree.stop!.tokenIndex, 26); + assert.ok(secondTree instanceof IncrementalParserRuleContext); + assert.strictEqual(secondTree.epoch, secondEpoch); + // Set the epochs + for (let i = 0; i < JAVA_PROGRAM_2_EXPECTATIONS.length - 1; ++i) { + JAVA_PROGRAM_2_EXPECTATIONS[i].epoch = startingEpoch; + } + JAVA_PROGRAM_2_EXPECTATIONS[ + JAVA_PROGRAM_2_EXPECTATIONS.length - 1 + ].epoch = secondEpoch; + this.verifyXPathExpectations( + parser, + secondTree, + JAVA_PROGRAM_2_EXPECTATIONS, + ); + } +} diff --git a/test/tool/TestIncremental1.g4 b/test/tool/TestIncremental1.g4 new file mode 100644 index 00000000..42f79237 --- /dev/null +++ b/test/tool/TestIncremental1.g4 @@ -0,0 +1,12 @@ +grammar TestIncremental1; +options { + incremental = true; +} +program: (identifier | digits)+; +identifier: IDENT; +digits: DIGITS; +// We deliberately put these on a hidden channel rather than skip - it helps +// make the cases weirder by making the parser's token indexes non-contiguous. +WS: [ \t\r\n\u000C]+ -> channel(HIDDEN); +IDENT: [A-Za-z]+; +DIGITS: [0-9]+; diff --git a/test/tool/TestIncrementalJava.g4 b/test/tool/TestIncrementalJava.g4 new file mode 100644 index 00000000..9140fe13 --- /dev/null +++ b/test/tool/TestIncrementalJava.g4 @@ -0,0 +1,1252 @@ +/* + [The "BSD licence"] + Copyright (c) 2007-2008 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +/** A Java 1.5 grammar for ANTLR v3 derived from the spec + * + * This is a very close representation of the spec; the changes + * are comestic (remove left recursion) and also fixes (the spec + * isn't exactly perfect). I have run this on the 1.4.2 source + * and some nasty looking enums from 1.5, but have not really + * tested for 1.5 compatibility. + * + * I built this with: java -Xmx100M org.antlr.Tool java.g + * and got two errors that are ok (for now): + * java.g:691:9: Decision can match input such as + * "'0'..'9'{'E', 'e'}{'+', '-'}'0'..'9'{'D', 'F', 'd', 'f'}" + * using multiple alternatives: 3, 4 + * As a result, alternative(s) 4 were disabled for that input + * java.g:734:35: Decision can match input such as "{'$', 'A'..'Z', + * '_', 'a'..'z', '\u00C0'..'\u00D6', '\u00D8'..'\u00F6', + * '\u00F8'..'\u1FFF', '\u3040'..'\u318F', '\u3300'..'\u337F', + * '\u3400'..'\u3D2D', '\u4E00'..'\u9FFF', '\uF900'..'\uFAFF'}" + * using multiple alternatives: 1, 2 + * As a result, alternative(s) 2 were disabled for that input + * + * You can turn enum on/off as a keyword :) + * + * Version 1.0 -- initial release July 5, 2006 (requires 3.0b2 or higher) + * + * Primary author: Terence Parr, July 2006 + * + * Version 1.0.1 -- corrections by Koen Vanderkimpen & Marko van Dooren, + * October 25, 2006; + * fixed normalInterfaceDeclaration: now uses typeParameters instead + * of typeParameter (according to JLS, 3rd edition) + * fixed castExpression: no longer allows expression next to type + * (according to semantics in JLS, in contrast with syntax in JLS) + * + * Version 1.0.2 -- Terence Parr, Nov 27, 2006 + * java spec I built this from had some bizarre for-loop control. + * Looked weird and so I looked elsewhere...Yep, it's messed up. + * simplified. + * + * Version 1.0.3 -- Chris Hogue, Feb 26, 2007 + * Factored out an annotationName rule and used it in the annotation rule. + * Not sure why, but typeName wasn't recognizing references to inner + * annotations (e.g. @InterfaceName.InnerAnnotation()) + * Factored out the elementValue section of an annotation reference. Created + * elementValuePair and elementValuePairs rules, then used them in the + * annotation rule. Allows it to recognize annotation references with + * multiple, comma separated attributes. + * Updated elementValueArrayInitializer so that it allows multiple elements. + * (It was only allowing 0 or 1 element). + * Updated localVariableDeclaration to allow annotations. Interestingly the JLS + * doesn't appear to indicate this is legal, but it does work as of at least + * JDK 1.5.0_06. + * Moved the Identifier portion of annotationTypeElementRest to annotationMethodRest. + * Because annotationConstantRest already references variableDeclarator which + * has the Identifier portion in it, the parser would fail on constants in + * annotation definitions because it expected two identifiers. + * Added optional trailing ';' to the alternatives in annotationTypeElementRest. + * Wouldn't handle an inner interface that has a trailing ';'. + * Swapped the expression and type rule reference order in castExpression to + * make it check for genericized casts first. It was failing to recognize a + * statement like "Class TYPE = (Class)...;" because it was seeing + * 'Class'. + * Changed createdName to use typeArguments instead of nonWildcardTypeArguments. + * Changed the 'this' alternative in primary to allow 'identifierSuffix' rather than + * just 'arguments'. The case it couldn't handle was a call to an explicit + * generic method invocation (e.g. this.doSomething()). Using identifierSuffix + * may be overly aggressive--perhaps should create a more constrained thisSuffix rule? + * + * Version 1.0.4 -- Hiroaki Nakamura, May 3, 2007 + * + * Fixed formalParameterDecls, localVariableDeclaration, forInit, + * and forVarControl to use variableModifier* not 'final'? (annotation)? + * + * Version 1.0.5 -- Terence, June 21, 2007 + * --a[i].foo didn't work. Fixed unaryExpression + * + * Version 1.0.6 -- John Ridgway, March 17, 2008 + * Made "assert" a switchable keyword like "enum". + * Fixed compilationUnit to disallow "annotation importDeclaration ...". + * Changed "Identifier ('.' Identifier)*" to "qualifiedName" in more + * places. + * Changed modifier* and/or variableModifier* to classOrInterfaceModifiers, + * modifiers or variableModifiers, as appropriate. + * Renamed "bound" to "typeBound" to better match language in the JLS. + * Added "memberDeclaration" which rewrites to methodDeclaration or + * fieldDeclaration and pulled type into memberDeclaration. So we parse + * type and then move on to decide whether we're dealing with a field + * or a method. + * Modified "constructorDeclaration" to use "constructorBody" instead of + * "methodBody". constructorBody starts with explicitConstructorInvocation, + * then goes on to blockStatement*. Pulling explicitConstructorInvocation + * out of expressions allowed me to simplify "primary". + * Changed variableDeclarator to simplify it. + * Changed type to use classOrInterfaceType, thus simplifying it; of course + * I then had to add classOrInterfaceType, but it is used in several + * places. + * Fixed annotations, old version allowed "@X(y,z)", which is illegal. + * Added optional comma to end of "elementValueArrayInitializer"; as per JLS. + * Changed annotationTypeElementRest to use normalClassDeclaration and + * normalInterfaceDeclaration rather than classDeclaration and + * interfaceDeclaration, thus getting rid of a couple of grammar ambiguities. + * Split localVariableDeclaration into localVariableDeclarationStatement + * (includes the terminating semi-colon) and localVariableDeclaration. + * This allowed me to use localVariableDeclaration in "forInit" clauses, + * simplifying them. + * Changed switchBlockStatementGroup to use multiple labels. This adds an + * ambiguity, but if one uses appropriately greedy parsing it yields the + * parse that is closest to the meaning of the switch statement. + * Renamed "forVarControl" to "enhancedForControl" -- JLS language. + * Added semantic predicates to test for shift operations rather than other + * things. Thus, for instance, the string "< <" will never be treated + * as a left-shift operator. + * In "creator" we rule out "nonWildcardTypeArguments" on arrayCreation, + * which are illegal. + * Moved "nonWildcardTypeArguments into innerCreator. + * Removed 'super' superSuffix from explicitGenericInvocation, since that + * is only used in explicitConstructorInvocation at the beginning of a + * constructorBody. (This is part of the simplification of expressions + * mentioned earlier.) + * Simplified primary (got rid of those things that are only used in + * explicitConstructorInvocation). + * Lexer -- removed "Exponent?" from FloatingPointLiteral choice 4, since it + * led to an ambiguity. + * + * This grammar successfully parses every .java file in the JDK 1.5 source + * tree (excluding those whose file names include '-', which are not + * valid Java compilation units). + * + * June 26, 2008 + * + * conditionalExpression had wrong precedence x?y:z. + * + * February 26, 2011 + * added left-recursive expression rule + * + * Known remaining problems: + * "Letter" and "JavaIDDigit" are wrong. The actual specification of + * "Letter" should be "a character for which the method + * Character.isJavaIdentifierStart(int) returns true." A "Java + * letter-or-digit is a character for which the method + * Character.isJavaIdentifierPart(int) returns true." + */ +grammar TestIncrementalJava; +options { + incremental = true; +} +// starting point for parsing a java file +/* The annotations are separated out to make parsing faster, but must be associated with + a packageDeclaration or a typeDeclaration (and not an empty one). */ +compilationUnit + : annotations + ( packageDeclaration importDeclaration* typeDeclaration* + | classOrInterfaceDeclaration typeDeclaration* + ) + EOF + | packageDeclaration? importDeclaration* typeDeclaration* + EOF + ; + +packageDeclaration + : 'package' qualifiedName ';' + ; + +importDeclaration + : 'import' 'static'? qualifiedName ('.' '*')? ';' + ; + +typeDeclaration + : classOrInterfaceDeclaration + | ';' + ; + +classOrInterfaceDeclaration + : classOrInterfaceModifiers (classDeclaration | interfaceDeclaration) + ; + +classOrInterfaceModifiers + : classOrInterfaceModifier* + ; + +classOrInterfaceModifier + : annotation // class or interface + | ( 'public' // class or interface + | 'protected' // class or interface + | 'private' // class or interface + | 'abstract' // class or interface + | 'static' // class or interface + | 'final' // class only -- does not apply to interfaces + | 'strictfp' // class or interface + ) + ; + +modifiers + : modifier* + ; + +classDeclaration + : normalClassDeclaration + | enumDeclaration + ; + +normalClassDeclaration + : 'class' Identifier typeParameters? + ('extends' type)? + ('implements' typeList)? + classBody + ; + +typeParameters + : '<' typeParameter (',' typeParameter)* '>' + ; + +typeParameter + : Identifier ('extends' typeBound)? + ; + +typeBound + : type ('&' type)* + ; + +enumDeclaration + : ENUM Identifier ('implements' typeList)? enumBody + ; + +enumBody + : '{' enumConstants? ','? enumBodyDeclarations? '}' + ; + +enumConstants + : enumConstant (',' enumConstant)* + ; + +enumConstant + : annotations? Identifier arguments? classBody? + ; + +enumBodyDeclarations + : ';' (classBodyDeclaration)* + ; + +interfaceDeclaration + : normalInterfaceDeclaration + | annotationTypeDeclaration + ; + +normalInterfaceDeclaration + : 'interface' Identifier typeParameters? ('extends' typeList)? interfaceBody + ; + +typeList + : type (',' type)* + ; + +classBody + : '{' classBodyDeclaration* '}' + ; + +interfaceBody + : '{' interfaceBodyDeclaration* '}' + ; + +classBodyDeclaration + : ';' + | 'static'? block + | modifiers memberDecl + ; + +memberDecl + : genericMethodOrConstructorDecl + | memberDeclaration + | 'void' Identifier voidMethodDeclaratorRest + | Identifier constructorDeclaratorRest + | interfaceDeclaration + | classDeclaration + ; + +memberDeclaration + : type (methodDeclaration | fieldDeclaration) + ; + +genericMethodOrConstructorDecl + : typeParameters genericMethodOrConstructorRest + ; + +genericMethodOrConstructorRest + : (type | 'void') Identifier methodDeclaratorRest + | Identifier constructorDeclaratorRest + ; + +methodDeclaration + : Identifier methodDeclaratorRest + ; + +fieldDeclaration + : variableDeclarators ';' + ; + +interfaceBodyDeclaration + : modifiers interfaceMemberDecl + | ';' + ; + +interfaceMemberDecl + : interfaceMethodOrFieldDecl + | interfaceGenericMethodDecl + | 'void' Identifier voidInterfaceMethodDeclaratorRest + | interfaceDeclaration + | classDeclaration + ; + +interfaceMethodOrFieldDecl + : type Identifier interfaceMethodOrFieldRest + ; + +interfaceMethodOrFieldRest + : constantDeclaratorsRest ';' + | interfaceMethodDeclaratorRest + ; + +methodDeclaratorRest + : formalParameters ('[' ']')* + ('throws' qualifiedNameList)? + ( methodBody + | ';' + ) + ; + +voidMethodDeclaratorRest +options { baseContext = methodDeclaratorRest; } + : formalParameters ('throws' qualifiedNameList)? + ( methodBody + | ';' + ) + ; + +interfaceMethodDeclaratorRest + : formalParameters ('[' ']')* ('throws' qualifiedNameList)? ';' + ; + +interfaceGenericMethodDecl + : typeParameters (type | 'void') Identifier + interfaceMethodDeclaratorRest + ; + +voidInterfaceMethodDeclaratorRest +options { baseContext = interfaceMethodDeclaratorRest; } + : formalParameters ('throws' qualifiedNameList)? ';' + ; + +constructorDeclaratorRest + : formalParameters ('throws' qualifiedNameList)? constructorBody + ; + +constantDeclarator + : Identifier constantDeclaratorRest + ; + +variableDeclarators + : variableDeclarator (',' variableDeclarator)* + ; + +variableDeclarator + : variableDeclaratorId ('=' variableInitializer)? + ; + +constantDeclaratorsRest + : constantDeclaratorRest (',' constantDeclarator)* + ; + +constantDeclaratorRest + : ('[' ']')* '=' variableInitializer + ; + +variableDeclaratorId + : Identifier ('[' ']')* + ; + +variableInitializer + : arrayInitializer + | expression + ; + +arrayInitializer + : '{' (variableInitializer (',' variableInitializer)* (',')? )? '}' + ; + +modifier + : annotation + | ( 'public' + | 'protected' + | 'private' + | 'static' + | 'abstract' + | 'final' + | 'native' + | 'synchronized' + | 'transient' + | 'volatile' + | 'strictfp' + ) + ; + +packageOrTypeName + : qualifiedName + ; + +enumConstantName + : Identifier + ; + +typeName + : qualifiedName + ; + +type + : classOrInterfaceType ('[' ']')* + | primitiveType ('[' ']')* + ; + +classOrInterfaceType + : Identifier typeArguments? ('.' Identifier typeArguments? )* + ; + +primitiveType + : 'boolean' + | 'char' + | 'byte' + | 'short' + | 'int' + | 'long' + | 'float' + | 'double' + ; + +variableModifier + : 'final' + | annotation + ; + +typeArguments + : '<' typeArgument (',' typeArgument)* '>' + ; + +typeArgument + : type + | '?' (('extends' | 'super') type)? + ; + +qualifiedNameList + : qualifiedName (',' qualifiedName)* + ; + +formalParameters + : '(' formalParameterDecls? ')' + ; + +formalParameterDecls + : variableModifiers type formalParameterDeclsRest + ; + +formalParameterDeclsRest + : variableDeclaratorId (',' formalParameterDecls)? + | '...' variableDeclaratorId + ; + +methodBody + : block + ; + +constructorBody + : block + ; + +qualifiedName + : Identifier ('.' Identifier)* + ; + +literal + : IntegerLiteral + | FloatingPointLiteral + | CharacterLiteral + | StringLiteral + | BooleanLiteral + | 'null' + ; + +// ANNOTATIONS + +annotations + : annotation+ + ; + +annotation + : '@' annotationName ( '(' ( elementValuePairs | elementValue )? ')' )? + ; + +annotationName + : Identifier ('.' Identifier)* + ; + +elementValuePairs + : elementValuePair (',' elementValuePair)* + ; + +elementValuePair + : Identifier '=' elementValue + ; + +elementValue + : expression + | annotation + | elementValueArrayInitializer + ; + +elementValueArrayInitializer + : '{' (elementValue (',' elementValue)*)? (',')? '}' + ; + +annotationTypeDeclaration + : '@' 'interface' Identifier annotationTypeBody + ; + +annotationTypeBody + : '{' (annotationTypeElementDeclaration)* '}' + ; + +annotationTypeElementDeclaration + : modifiers annotationTypeElementRest + | ';' // this is not allowed by the grammar, but apparently allowed by the actual compiler + ; + +annotationTypeElementRest + : type annotationMethodOrConstantRest ';' + | normalClassDeclaration ';'? + | normalInterfaceDeclaration ';'? + | enumDeclaration ';'? + | annotationTypeDeclaration ';'? + ; + +annotationMethodOrConstantRest + : annotationMethodRest + | annotationConstantRest + ; + +annotationMethodRest + : Identifier '(' ')' defaultValue? + ; + +annotationConstantRest + : variableDeclarators + ; + +defaultValue + : 'default' elementValue + ; + +// STATEMENTS / BLOCKS + +block + : '{' blockStatement* '}' + ; + +blockStatement + : localVariableDeclarationStatement + | classOrInterfaceDeclaration + | statement + ; + +localVariableDeclarationStatement + : localVariableDeclaration ';' + ; + +localVariableDeclaration + : variableModifiers type variableDeclarators + ; + +variableModifiers + : variableModifier* + ; + +statement + : block + | ASSERT expression (':' expression)? ';' + | 'if' parExpression statement ('else' statement)? + | 'for' '(' forControl ')' statement + | 'while' parExpression statement + | 'do' statement 'while' parExpression ';' + | 'try' block (catches finallyBlock? | finallyBlock) + | 'try' resourceSpecification block catches? finallyBlock? + | 'switch' parExpression '{' switchBlockStatementGroups '}' + | 'synchronized' parExpression block + | 'return' expression? ';' + | 'throw' expression ';' + | 'break' Identifier? ';' + | 'continue' Identifier? ';' + | ';' + | statementExpression ';' + | Identifier ':' statement + ; + +catches + : catchClause+ + ; + +catchClause + : 'catch' '(' variableModifiers catchType Identifier ')' block + ; + +catchType + : qualifiedName ('|' qualifiedName)* + ; + +finallyBlock + : 'finally' block + ; + +resourceSpecification + : '(' resources ';'? ')' + ; + +resources + : resource (';' resource)* + ; + +resource + : variableModifiers classOrInterfaceType variableDeclaratorId '=' expression + ; + +formalParameter + : variableModifiers type variableDeclaratorId + ; + +switchBlockStatementGroups + : (switchBlockStatementGroup)* + ; + +/* The change here (switchLabel -> switchLabel+) technically makes this grammar + ambiguous; but with appropriately greedy parsing it yields the most + appropriate AST, one in which each group, except possibly the last one, has + labels and statements. */ +switchBlockStatementGroup + : switchLabel+ blockStatement* + ; + +switchLabel + : 'case' constantExpression ':' + | 'case' enumConstantName ':' + | 'default' ':' + ; + +forControl + : enhancedForControl + | forInit? ';' expression? ';' forUpdate? + ; + +forInit + : localVariableDeclaration + | expressionList + ; + +enhancedForControl + : variableModifiers type Identifier ':' expression + ; + +forUpdate + : expressionList + ; + +// EXPRESSIONS + +parExpression + : '(' expression ')' + ; + +expressionList + : expression (',' expression)* + ; + +statementExpression + : expression + ; + +constantExpression + : expression + ; + +expression + : primary + | expression '.' Identifier + | expression '.' 'this' + | expression '.' 'new' nonWildcardTypeArguments? innerCreator + | expression '.' 'super' superSuffix + | expression '.' explicitGenericInvocation + | 'new' creator + | expression '[' expression ']' + | '(' type ')' expression + | expression ('++' | '--') + | expression '(' expressionList? ')' + | ('+'|'-'|'++'|'--') expression + | ('~'|'!') expression + | expression ('*'|'/'|'%') expression + | expression ('+'|'-') expression + | expression ('<' '<' | '>' '>' '>' | '>' '>') expression + | expression ('<=' | '>=' | '>' | '<') expression + | expression 'instanceof' type + | expression ('==' | '!=') expression + | expression '&' expression + | expression '^' expression + | expression '|' expression + | expression '&&' expression + | expression '||' expression + | expression '?' expression ':' expression + | expression + ( '=' + | '+=' + | '-=' + | '*=' + | '/=' + | '&=' + | '|=' + | '^=' + | '>>=' + | '>>>=' + | '<<=' + | '%=' + ) + expression + ; + +primary + : '(' expression ')' + | 'this' + | 'super' + | literal + | Identifier + | type '.' 'class' + | 'void' '.' 'class' + | nonWildcardTypeArguments (explicitGenericInvocationSuffix | 'this' arguments) + ; + +creator + : nonWildcardTypeArguments createdName classCreatorRest + | createdName (arrayCreatorRest | classCreatorRest) + ; + +createdName + : Identifier typeArgumentsOrDiamond? ('.' Identifier typeArgumentsOrDiamond?)* + | primitiveType + ; + +innerCreator + : Identifier nonWildcardTypeArgumentsOrDiamond? classCreatorRest + ; + +arrayCreatorRest + : '[' + ( ']' ('[' ']')* arrayInitializer + | expression ']' ('[' expression ']')* ('[' ']')* + ) + ; + +classCreatorRest + : arguments classBody? + ; + +explicitGenericInvocation + : nonWildcardTypeArguments explicitGenericInvocationSuffix + ; + +nonWildcardTypeArguments + : '<' typeList '>' + ; + +typeArgumentsOrDiamond + : '<' '>' + | typeArguments + ; + +nonWildcardTypeArgumentsOrDiamond + : '<' '>' + | nonWildcardTypeArguments + ; + +superSuffix + : arguments + | '.' Identifier arguments? + ; + +explicitGenericInvocationSuffix + : 'super' superSuffix + | Identifier arguments + ; + +arguments + : '(' expressionList? ')' + ; + +// LEXER + +// §3.9 Keywords + +ABSTRACT : 'abstract'; +ASSERT : 'assert'; +BOOLEAN : 'boolean'; +BREAK : 'break'; +BYTE : 'byte'; +CASE : 'case'; +CATCH : 'catch'; +CHAR : 'char'; +CLASS : 'class'; +CONST : 'const'; +CONTINUE : 'continue'; +DEFAULT : 'default'; +DO : 'do'; +DOUBLE : 'double'; +ELSE : 'else'; +ENUM : 'enum'; +EXTENDS : 'extends'; +FINAL : 'final'; +FINALLY : 'finally'; +FLOAT : 'float'; +FOR : 'for'; +IF : 'if'; +GOTO : 'goto'; +IMPLEMENTS : 'implements'; +IMPORT : 'import'; +INSTANCEOF : 'instanceof'; +INT : 'int'; +INTERFACE : 'interface'; +LONG : 'long'; +NATIVE : 'native'; +NEW : 'new'; +PACKAGE : 'package'; +PRIVATE : 'private'; +PROTECTED : 'protected'; +PUBLIC : 'public'; +RETURN : 'return'; +SHORT : 'short'; +STATIC : 'static'; +STRICTFP : 'strictfp'; +SUPER : 'super'; +SWITCH : 'switch'; +SYNCHRONIZED : 'synchronized'; +THIS : 'this'; +THROW : 'throw'; +THROWS : 'throws'; +TRANSIENT : 'transient'; +TRY : 'try'; +VOID : 'void'; +VOLATILE : 'volatile'; +WHILE : 'while'; + +// §3.10.1 Integer Literals + +IntegerLiteral + : DecimalIntegerLiteral + | HexIntegerLiteral + | OctalIntegerLiteral + | BinaryIntegerLiteral + ; + +fragment +DecimalIntegerLiteral + : DecimalNumeral IntegerTypeSuffix? + ; + +fragment +HexIntegerLiteral + : HexNumeral IntegerTypeSuffix? + ; + +fragment +OctalIntegerLiteral + : OctalNumeral IntegerTypeSuffix? + ; + +fragment +BinaryIntegerLiteral + : BinaryNumeral IntegerTypeSuffix? + ; + +fragment +IntegerTypeSuffix + : [lL] + ; + +fragment +DecimalNumeral + : '0' + | NonZeroDigit (Digits? | Underscores Digits) + ; + +fragment +Digits + : Digit (DigitsAndUnderscores? Digit)? + ; + +fragment +Digit + : '0' + | NonZeroDigit + ; + +fragment +NonZeroDigit + : [1-9] + ; + +fragment +DigitsAndUnderscores + : DigitOrUnderscore+ + ; + +fragment +DigitOrUnderscore + : Digit + | '_' + ; + +fragment +Underscores + : '_'+ + ; + +fragment +HexNumeral + : '0' [xX] HexDigits + ; + +fragment +HexDigits + : HexDigit (HexDigitsAndUnderscores? HexDigit)? + ; + +fragment +HexDigit + : [0-9a-fA-F] + ; + +fragment +HexDigitsAndUnderscores + : HexDigitOrUnderscore+ + ; + +fragment +HexDigitOrUnderscore + : HexDigit + | '_' + ; + +fragment +OctalNumeral + : '0' Underscores? OctalDigits + ; + +fragment +OctalDigits + : OctalDigit (OctalDigitsAndUnderscores? OctalDigit)? + ; + +fragment +OctalDigit + : [0-7] + ; + +fragment +OctalDigitsAndUnderscores + : OctalDigitOrUnderscore+ + ; + +fragment +OctalDigitOrUnderscore + : OctalDigit + | '_' + ; + +fragment +BinaryNumeral + : '0' [bB] BinaryDigits + ; + +fragment +BinaryDigits + : BinaryDigit (BinaryDigitsAndUnderscores? BinaryDigit)? + ; + +fragment +BinaryDigit + : [01] + ; + +fragment +BinaryDigitsAndUnderscores + : BinaryDigitOrUnderscore+ + ; + +fragment +BinaryDigitOrUnderscore + : BinaryDigit + | '_' + ; + +// §3.10.2 Floating-Point Literals + +FloatingPointLiteral + : DecimalFloatingPointLiteral + | HexadecimalFloatingPointLiteral + ; + +fragment +DecimalFloatingPointLiteral + : Digits '.' Digits? ExponentPart? FloatTypeSuffix? + | '.' Digits ExponentPart? FloatTypeSuffix? + | Digits ExponentPart FloatTypeSuffix? + | Digits FloatTypeSuffix + ; + +fragment +ExponentPart + : ExponentIndicator SignedInteger + ; + +fragment +ExponentIndicator + : [eE] + ; + +fragment +SignedInteger + : Sign? Digits + ; + +fragment +Sign + : [+-] + ; + +fragment +FloatTypeSuffix + : [fFdD] + ; + +fragment +HexadecimalFloatingPointLiteral + : HexSignificand BinaryExponent FloatTypeSuffix? + ; + +fragment +HexSignificand + : HexNumeral '.'? + | '0' [xX] HexDigits? '.' HexDigits + ; + +fragment +BinaryExponent + : BinaryExponentIndicator SignedInteger + ; + +fragment +BinaryExponentIndicator + : [pP] + ; + +// §3.10.3 Boolean Literals + +BooleanLiteral + : 'true' + | 'false' + ; + +// §3.10.4 Character Literals + +CharacterLiteral + : '\'' SingleCharacter '\'' + | '\'' EscapeSequence '\'' + ; + +fragment +SingleCharacter + : ~['\\] + ; + +// §3.10.5 String Literals + +StringLiteral + : '"' StringCharacters? '"' + ; + +fragment +StringCharacters + : StringCharacter+ + ; + +fragment +StringCharacter + : ~["\\] + | EscapeSequence + ; + +// §3.10.6 Escape Sequences for Character and String Literals + +fragment +EscapeSequence + : '\\' [btnfr"'\\] + | OctalEscape + ; + +fragment +OctalEscape + : '\\' OctalDigit + | '\\' OctalDigit OctalDigit + | '\\' ZeroToThree OctalDigit OctalDigit + ; + +fragment +ZeroToThree + : [0-3] + ; + +// §3.10.7 The Null Literal + +NullLiteral + : 'null' + ; + +// §3.11 Separators + +LPAREN : '('; +RPAREN : ')'; +LBRACE : '{'; +RBRACE : '}'; +LBRACK : '['; +RBRACK : ']'; +SEMI : ';'; +COMMA : ','; +DOT : '.'; + +// §3.12 Operators + +ASSIGN : '='; +GT : '>'; +LT : '<'; +BANG : '!'; +TILDE : '~'; +QUESTION : '?'; +COLON : ':'; +EQUAL : '=='; +LE : '<='; +GE : '>='; +NOTEQUAL : '!='; +AND : '&&'; +OR : '||'; +INC : '++'; +DEC : '--'; +ADD : '+'; +SUB : '-'; +MUL : '*'; +DIV : '/'; +BITAND : '&'; +BITOR : '|'; +CARET : '^'; +MOD : '%'; + +ADD_ASSIGN : '+='; +SUB_ASSIGN : '-='; +MUL_ASSIGN : '*='; +DIV_ASSIGN : '/='; +AND_ASSIGN : '&='; +OR_ASSIGN : '|='; +XOR_ASSIGN : '^='; +MOD_ASSIGN : '%='; +LSHIFT_ASSIGN : '<<='; +RSHIFT_ASSIGN : '>>='; +URSHIFT_ASSIGN : '>>>='; + +// §3.8 Identifiers (must appear after all keywords in the grammar) + +Identifier + : JavaLetter JavaLetterOrDigit* + ; + +fragment +JavaLetter + : [a-zA-Z$_] // these are the "java letters" below 0xFF +// | // covers all characters above 0xFF which are not a surrogate +// ~[\u0000-\u00FF\uD800-\uDBFF] +// {Character.isJavaIdentifierStart(_input.LA(-1))}? +// | // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF +// [\uD800-\uDBFF] [\uDC00-\uDFFF] +// {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}? + ; + +fragment +JavaLetterOrDigit + : [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF +// | // covers all characters above 0xFF which are not a surrogate +// ~[\u0000-\u00FF\uD800-\uDBFF] +// {Character.isJavaIdentifierPart(_input.LA(-1))}? +// | // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF +// [\uD800-\uDBFF] [\uDC00-\uDFFF] +// {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}? + ; + +// +// Additional symbols not defined in the lexical specification +// + +AT : '@'; +ELLIPSIS : '...'; + +// +// Whitespace and comments +// + +WS : [ \t\r\n\u000C]+ -> skip + ; + +COMMENT + : '/*' .*? '*/' -> skip + ; + +LINE_COMMENT + : '//' ~[\r\n]* -> skip + ; diff --git a/tool/resources/org/antlr/v4/tool/templates/codegen/TypeScript/TypeScript.stg b/tool/resources/org/antlr/v4/tool/templates/codegen/TypeScript/TypeScript.stg index 74d7c606..4b5c19e9 100644 --- a/tool/resources/org/antlr/v4/tool/templates/codegen/TypeScript/TypeScript.stg +++ b/tool/resources/org/antlr/v4/tool/templates/codegen/TypeScript/TypeScript.stg @@ -23,6 +23,12 @@ ParserFile(file, parser, namedActions, contextSuperClass) ::= << import { ATN } from "/atn/ATN"; import { ATNDeserializer } from "/atn/ATNDeserializer"; import { FailedPredicateException } from "/FailedPredicateException"; + +import { IncrementalParser } from "/IncrementalParser"; +import { IncrementalParserData } from "/IncrementalParserData"; +import { IncrementalParserRuleContext } from "/IncrementalParserRuleContext"; +import { IncrementalTokenStream } from "/IncrementalTokenStream"; + import { NotNull } from "/Decorators"; import { NoViableAltException } from "/NoViableAltException"; import { Override } from "/Decorators"; @@ -143,7 +149,11 @@ Parser(parser, funcs, atn, sempredFuncs, superClass, isLexer=false) ::= << >> Parser_(parser, funcs, atn, sempredFuncs, ctor, superClass) ::= << + +export abstract class extends { + export abstract class extends { + = ;}; separator="\n", wrap, anchor> @@ -239,10 +249,17 @@ case : >> parser_ctor(p) ::= << + +constructor(input: IncrementalTokenStream, parseData?: IncrementalParserData) { + super(input, parseData); + this._interp = new ParserATNSimulator(._ATN, this); +} + constructor(input: TokenStream) { super(input); this._interp = new ParserATNSimulator(._ATN, this); } + >> /* This generates a private method since the actionIndex is generated, making an @@ -276,6 +293,15 @@ case : RuleFunction(currentRule,args,code,locals,ruleCtx,altLabelCtxs,namedActions,finallyAction,postamble,exceptions) ::= << // @RuleVersion() }>public (): { + + // Check whether we need to execute this rule. + let guardResult = this.guardRule(this._ctx as IncrementalParserRuleContext, this.state, .RULE_) as ; + // If we found an existing context that is valid, return it. + if (guardResult) { + this._input.seek(guardResult.stop!.tokenIndex + 1); + return guardResult; + } + let _localctx: = new (this._ctx, this.state}>); this.enterRule(_localctx, , .RULE_); @@ -391,9 +417,19 @@ LeftRecursiveRuleFunction(currentRule,args,code,locals,ruleCtx,altLabelCtxs, let _parentctx: ParserRuleContext = this._ctx; let _parentState: number = this.state; + + // Check whether we need to execute this rule. + let guardResult = this.guardRule(this._ctx as IncrementalParserRuleContext, _parentState, .RULE_) as ; + // If we found an existing context that is valid, return it. + if (guardResult) { + this._input.seek(guardResult.stop!.tokenIndex + 1); + return guardResult; + } + let _localctx: = new (this._ctx, _parentState}>); let _prevctx: = _localctx; let _startState: number = ; + this.enterRecursionRule(_localctx, , .RULE_, _p); @@ -757,7 +793,11 @@ CaptureNextTokenType(d) ::= " = this._input.LA(1);" StructDecl(struct,ctorAttrs,attrs,getters,dispatchMethods,interfaces,extensionMembers) ::= << + +export class extends IncrementalParserRuleContext implements { + export class extends ParserRuleContext implements { + ;}; separator="\n"> }; separator="\n"> constructor(parent: ParserRuleContext | undefined, invokingState: number}>) { diff --git a/tool/src/org/antlr/v4/TypeScriptTool.java b/tool/src/org/antlr/v4/TypeScriptTool.java index f58955d2..74e9efb9 100644 --- a/tool/src/org/antlr/v4/TypeScriptTool.java +++ b/tool/src/org/antlr/v4/TypeScriptTool.java @@ -7,6 +7,7 @@ import java.io.File; import java.io.IOException; import java.io.Writer; +import java.util.Arrays; import java.util.HashMap; import org.antlr.v4.tool.ErrorType; import org.antlr.v4.tool.Grammar; @@ -17,10 +18,11 @@ */ public class TypeScriptTool extends Tool { private boolean verbose = false; - static { Grammar.parserOptions.add("baseImportPath"); Grammar.lexerOptions.add("baseImportPath"); + Grammar.parserOptions.add("incremental"); + Grammar.lexerOptions.add("incremental"); } public TypeScriptTool() { @@ -55,8 +57,7 @@ public static void main(String[] args) { try { String logname = antlr.logMgr.save(); System.out.println("wrote " + logname); - } - catch (IOException ioe) { + } catch (IOException ioe) { antlr.errMgr.toolError(ErrorType.INTERNAL_ERROR, ioe); } } @@ -75,9 +76,12 @@ public static void main(String[] args) { @Override public Writer getOutputFileWriter(Grammar g, String fileName) throws IOException { + if (Boolean.parseBoolean(g.getOptionString("incremental"))) { + grammarOptions.put("incremental", "true"); + } if (outputDirectory != null) { // output directory is a function of where the grammar file lives - // for subdir/T.g4, you get subdir here. Well, depends on -o etc... + // for subdir/T.g4, you get subdir here. Well, depends on -o etc... File outputDir = getOutputDirectory(g.fileName); File outputFile = new File(outputDir, fileName); if (this.verbose) {