Skip to content

sa-dd/Simple-Simple-C

Repository files navigation

Simple Simple C

Compiler frontend for a C subset language which implements advanced array support for a custom programming language. It includes features such as N-dimensional arrays, dynamic resizing, array operations, and more.

Project Structure

Core Components

  1. Lexer and Parser

    • ssc.l: Flex-based lexical analyzer
    • ssc.y: Bison-based parser
    • ssc_types.h: Core type definitions
  2. Abstract Syntax Tree (AST)

    • ast.hpp: AST node definitions and visitor interfaces
    • ast.cpp: Implementation of AST node operations
  3. LLVM Code Generation

    • llvmcodegen.hpp: LLVM code generator interface
    • llvmcodegen.cpp: Implementation of LLVM IR generation
    • llvmruntime.cpp: Runtime support for LLVM-generated code
    • CodeGen.h: Code generation utilities and helpers
  4. Intermediate Representation

    • IR.h: Symbol table and array operation definitions
  5. Support Files

    • compile.sh: Build automation script
    • Makefile: Project build configuration
    • Various test files (*.ssc): Example programs and test cases

Build Instructions

Prerequisites

  • LLVM development libraries (version 14.0 or later)
  • Flex (Fast Lexical Analyzer)
  • Bison (Parser Generator)
  • C++ compiler with C++17 support
  • Make build system

Building the Project

  1. Install dependencies:

    # Ubuntu/Debian
    sudo apt-get install llvm-dev flex bison build-essential
    
    # macOS
    brew install llvm flex bison
  2. Build the compiler:

    # Using make to compile the compiler
    make clean
    make
    
    # then use the compile script to use the compiler to generate cpp and llvm ir code
    chmod +x compile.sh
    ./compile.sh test.ssc
    
    ./a.out
  3. The build process will generate:

    • ssc_compiler: The main compiler executable
    • Various intermediate files (lex.yy.c, ssc.tab.c, etc.)

Language Features

1. Array Operations

  • N-dimensional array support
  • Dynamic resizing
  • Broadcasting
  • Array slicing
  • Map and reduce operations
  • Built-in functions (sum, max, min, average)

2. Type System

  • Static type checking
  • Support for integers, doubles, and strings
  • Array type inference
  • Dimension checking for array operations

3. LLVM Integration

  • Efficient code generation using LLVM IR
  • Optimization passes
  • Cross-platform support
  • JIT compilation capability

Code Generation Pipeline

  1. Parsing: Source code → AST

    • Lexical analysis (ssc.l)
    • Syntax analysis (ssc.y)
    • AST construction (ast.cpp)
  2. Analysis:

    • Type checking
    • Dimension validation
    • Symbol resolution
  3. LLVM IR Generation:

    • AST traversal
    • LLVM IR instruction generation
    • Runtime function integration
  4. Optimization and Output:

    • LLVM optimization passes
    • Machine code generation
    • Object file or executable output

Example Usage

  1. Create a source file (e.g., test.ssc):

    array int a[3][3] = {{1,2,3},{4,5,6},{7,8,9}};
    array int b = map(a, SQUARE);
    int sum = reduce(b, ADD);
    print(sum);
    
  2. Compile and run:

    make
    ./ssc_compiler test.ssc
    ./a.out

Development and Testing

  • Use array-test.ssc for array operation testing
  • input.ssc provides basic functionality tests
  • Debug output can be enabled in both lexer and parser
  • CPP is saved in output.cpp
  • LLVM IR output is saved in output.ll

License

This project is open source and available under the MIT License.

Technical Details

  1. Language Features:

    • N-dimensional array support
    • Dynamic array resizing
    • Array operations (sum, max, min, average)
    • Broadcasting
    • Mapping and reduction
    • Array slicing
    • Serialization and deserialization
  2. Build Process:

    • Use Flex to generate the lexer (lex ssc.l)
    • Use Bison to generate the parser (bison -d ssc.y)
    • Compile the generated C files along with your main program
  3. Execution:

    • The main function in ssc.y serves as the entry point
    • It can read input from a file (if provided as an argument) or from stdin
  4. Symbol Table:

    • Implemented using std::map in C++
    • Separate tables for different data types (double, int, string)
  5. Array Implementation:

    • Multi-dimensional arrays are flattened into 1D vectors for storage
    • A separate map (arrayDimensions) keeps track of the dimensions
  6. Error Handling:

    • The yyerror function is used to report parsing errors
    • Runtime errors (e.g., out-of-bounds access) are handled using C++ exceptions
  7. Debugging:

    • Debug macros are provided in both the lexer and parser
    • Can be enabled by defining DEBUGSSC and DEBUGBISON respectively

Usage

To compile and run the project:

  1. Generate the lexer: flex ssc.l
  2. Generate the parser: bison -d ssc.y
  3. Compile the generated files along with your main program:
    make clean
    make
    chmod +x compile.sh
    ./compile.sh test.ssc
    
  4. Run the compiler:
    • With input file: ./ssc_compiler input_file.ssc
    • Interactive mode: ./ssc_compiler

Makefile Commands

The project includes a Makefile with the following commands:

make        - Compile the SSC compiler
make run    - Run the SSC compiler with the test file
make clean  - Remove compiled and intermediate files
make distclean - Remove all build files and output
make help   - Display this help message

To use these commands, simply type make followed by the desired target. For example:

make        # Compile the compiler
make run    # Run the compiler with the test file
make clean  # Clean up build files

Use make help to see a list of available commands and their descriptions.

Future Improvements

  • Implement more array operations and functions
  • Optimize memory usage for large arrays
  • Add support for user-defined functions in map and reduce operations
  • Enhance error reporting and recovery mechanisms
  • Implement LLVM code generation

LLVM Code Generation

One of the major planned improvements for this project is the implementation of LLVM code generation. This will involve:

  1. LLVM Integration: Integrating the LLVM libraries into the project to enable code generation.

  2. IR to LLVM IR Translation: Developing a module to translate our custom Intermediate Representation (IR) into LLVM IR.

  3. Optimization Passes: Implementing LLVM optimization passes to improve the generated code's performance.

  4. Code Emission: Generating executable machine code from the LLVM IR.

  5. Array Operation Optimization: Utilizing LLVM's vector operations to optimize array manipulations.

  6. JIT Compilation: Exploring the possibility of Just-In-Time (JIT) compilation for improved runtime performance.

  7. Cross-platform Support: Leveraging LLVM's cross-platform capabilities to generate code for multiple target architectures.

The addition of LLVM code generation will significantly enhance the project by:

  • Improving execution speed of the compiled programs
  • Enabling more advanced optimizations
  • Providing better cross-platform support
  • Allowing for potential future features like runtime code generation

This feature will require additional dependencies (LLVM libraries) and will likely introduce new build steps and make commands, which will be documented once implemented.

About

Compiler frontend for a C subset language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published