Ara

Ara is a lightweight compiler/interpreter project.

internal/lexer

The lexer (or "tokenizer") is the first stage of the Ara pipeline. Its job is to take raw source code (a long string of characters) and transform it into a sequence of meaningful units called Tokens.

How it Works

1. State Management (`Lexer` struct)

The lexer maintains several pointers to keep track of its progress through the input string:

input: The raw source code string.
position: Points to the current character being examined (ch).
readPosition: Points to the next character in the input (used for "peeking" or advancing).
ch: The actual character currently being inspected.

2. Character-by-Character Reading (`readChar`)

The readChar method is the "heartbeat" of the lexer. It:

Checks if it has reached the end of the input.
If not, it updates ch with the character at readPosition.
Advances both position and readPosition.

3. The Tokenization Loop (`NextToken`)

When the parser asks for the next unit of code, NextToken is called. It follows these steps:

A. Skip Whitespace
It calls skipWhitespace to ignore spaces, tabs, and newlines. The lexer doesn't care about these unless they are part of a literal (like a string).

B. Identify Simple Tokens
It checks the current character (l.ch) against known single-character operators and punctuation:

Operators: +, -, *, /
Punctuation: (, ), ,
EOF (End Of File): A special token used to tell the parser nothing is left.

C. Identify Complex Tokens (Identifiers and Numbers)
If the character doesn't match an operator, it checks if it's the start of something more complex:

Identifiers: If it's a letter (or _), it calls readIdentifier. This continues reading until it hits a non-letter/non-digit character. These are used for variable names or keywords.
Numbers: If it's a digit, it calls readNumber. It consumes characters as long as they are digits or a decimal point ..

D. Error Handling
If a character is encountered that doesn't fit any rule (like a random symbol @ or $), it generates a TOKEN_ERROR.

Tokenization Example

If you feed the lexer the input (1 + foo), it performs the following transformations:

Character(s)	Action	Resulting Token
`(`	Direct Match	`Token(Type: LPAREN, Literal: "(")`
`1`	`isDigit` -> `readNumber`	`Token(Type: NUMBER, Literal: "1")`
`+`	Direct Match	`Token(Type: PLUS, Literal: "+")`
`foo`	`isLetter` -> `readIdentifier`	`Token(Type: IDENT, Literal: "foo")`
`)`	Direct Match	`Token(Type: RPAREN, Literal: ")")`
(End)	Empty Input	`Token(Type: EOF, Literal: "")`

internal/parser

Here is a breakdown of the most common ones and why we are choosing a Pratt Parser for Ara.

Common Parser Types

Parser Type	How it Works	Pros	Cons
Recursive Descent	Uses a set of recursive functions that mirror the language's grammar rules.	Very easy to write and debug by hand.	Struggles with operator precedence (e.g., `*` vs `+`) without creating a deep, complex hierarchy of functions.
LL/LR (Generators)	Uses tools like Yacc, Bison, or ANTLR to generate code from a formal grammar file.	Extremely powerful; handles very complex languages efficiently.	Hard to debug; the generated code is often unreadable "spaghetti."
Pratt Parser	Associates parsing functions with token types and uses "binding power" (precedence levels) to decide how tokens stick together.	Perfect for expressions. Handles operator precedence and associativity elegantly with very little code.	Can be slightly more abstract to understand initially than basic recursive descent.

Why Ara uses a Pratt Parser

Since Ara is an expression engine, its primary job is to evaluate mathematical and logical formulas (like 1 + 2 * 3 or a > 10 && b < 5).

In a standard recursive descent parser, you would need a separate function for every level of priority (parseAddition, parseMultiplication, parseComparison, etc.). With a Pratt Parser, we simply assign a number to each operator:

+: Precedence 10
*: Precedence 20

The parser then automatically knows that the * "pulls" the numbers closer than the + does. This makes the code much cleaner, faster, and easier to extend when we add new operators later.

Supported Operators & Features

Ara currently supports the following expressions:

Arithmetic: +, -, *, /
Comparisons: ==, !=, <, >, <=, >=
Logical: &&, ||
Booleans: true, false
Grouping: Parentheses ( ) for precedence control
Identifiers: Variable names (e.g. price, volume)
Numbers: Integer literals (e.g. 100, 42)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
internal		internal
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONCEPT.md		CONCEPT.md
ISSUES.md		ISSUES.md
README.md		README.md
go.mod		go.mod
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ara

internal/lexer

How it Works

1. State Management (`Lexer` struct)

2. Character-by-Character Reading (`readChar`)

3. The Tokenization Loop (`NextToken`)

Tokenization Example

internal/parser

Common Parser Types

Why Ara uses a Pratt Parser

Supported Operators & Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ara

internal/lexer

How it Works

1. State Management (Lexer struct)

2. Character-by-Character Reading (readChar)

3. The Tokenization Loop (NextToken)

Tokenization Example

internal/parser

Common Parser Types

Why Ara uses a Pratt Parser

Supported Operators & Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. State Management (`Lexer` struct)

2. Character-by-Character Reading (`readChar`)

3. The Tokenization Loop (`NextToken`)

Packages