Skip to content

dotandev/ara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ara

Ara is a lightweight compiler/interpreter project.

internal/lexer

The lexer (or "tokenizer") is the first stage of the Ara pipeline. Its job is to take raw source code (a long string of characters) and transform it into a sequence of meaningful units called Tokens.

How it Works

1. State Management (Lexer struct)

The lexer maintains several pointers to keep track of its progress through the input string:

  • input: The raw source code string.
  • position: Points to the current character being examined (ch).
  • readPosition: Points to the next character in the input (used for "peeking" or advancing).
  • ch: The actual character currently being inspected.

2. Character-by-Character Reading (readChar)

The readChar method is the "heartbeat" of the lexer. It:

  1. Checks if it has reached the end of the input.
  2. If not, it updates ch with the character at readPosition.
  3. Advances both position and readPosition.

3. The Tokenization Loop (NextToken)

When the parser asks for the next unit of code, NextToken is called. It follows these steps:

A. Skip Whitespace
It calls skipWhitespace to ignore spaces, tabs, and newlines. The lexer doesn't care about these unless they are part of a literal (like a string).

B. Identify Simple Tokens
It checks the current character (l.ch) against known single-character operators and punctuation:

  • Operators: +, -, *, /
  • Punctuation: (, ), ,
  • EOF (End Of File): A special token used to tell the parser nothing is left.

C. Identify Complex Tokens (Identifiers and Numbers)
If the character doesn't match an operator, it checks if it's the start of something more complex:

  • Identifiers: If it's a letter (or _), it calls readIdentifier. This continues reading until it hits a non-letter/non-digit character. These are used for variable names or keywords.
  • Numbers: If it's a digit, it calls readNumber. It consumes characters as long as they are digits or a decimal point ..

D. Error Handling
If a character is encountered that doesn't fit any rule (like a random symbol @ or $), it generates a TOKEN_ERROR.

Tokenization Example

If you feed the lexer the input (1 + foo), it performs the following transformations:

Character(s) Action Resulting Token
( Direct Match Token(Type: LPAREN, Literal: "(")
1 isDigit -> readNumber Token(Type: NUMBER, Literal: "1")
+ Direct Match Token(Type: PLUS, Literal: "+")
foo isLetter -> readIdentifier Token(Type: IDENT, Literal: "foo")
) Direct Match Token(Type: RPAREN, Literal: ")")
(End) Empty Input Token(Type: EOF, Literal: "")

internal/parser

Here is a breakdown of the most common ones and why we are choosing a Pratt Parser for Ara.

Common Parser Types

Parser Type How it Works Pros Cons
Recursive Descent Uses a set of recursive functions that mirror the language's grammar rules. Very easy to write and debug by hand. Struggles with operator precedence (e.g., * vs +) without creating a deep, complex hierarchy of functions.
LL/LR (Generators) Uses tools like Yacc, Bison, or ANTLR to generate code from a formal grammar file. Extremely powerful; handles very complex languages efficiently. Hard to debug; the generated code is often unreadable "spaghetti."
Pratt Parser Associates parsing functions with token types and uses "binding power" (precedence levels) to decide how tokens stick together. Perfect for expressions. Handles operator precedence and associativity elegantly with very little code. Can be slightly more abstract to understand initially than basic recursive descent.

Why Ara uses a Pratt Parser

Since Ara is an expression engine, its primary job is to evaluate mathematical and logical formulas (like 1 + 2 * 3 or a > 10 && b < 5).

In a standard recursive descent parser, you would need a separate function for every level of priority (parseAddition, parseMultiplication, parseComparison, etc.). With a Pratt Parser, we simply assign a number to each operator:

  • +: Precedence 10
  • *: Precedence 20

The parser then automatically knows that the * "pulls" the numbers closer than the + does. This makes the code much cleaner, faster, and easier to extend when we add new operators later.

Supported Operators & Features

Ara currently supports the following expressions:

  • Arithmetic: +, -, *, /
  • Comparisons: ==, !=, <, >, <=, >=
  • Logical: &&, ||
  • Booleans: true, false
  • Grouping: Parentheses ( ) for precedence control
  • Identifiers: Variable names (e.g. price, volume)
  • Numbers: Integer literals (e.g. 100, 42)

About

JIT-compiled expression engine for high-performance data processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors