Skip to content

This repository contains our latest research focused on enhancing the accuracy of large language models (LLMs) in mathematical applications.

Notifications You must be signed in to change notification settings

JasonHonKL/DaC-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dac-LLM

This repository showcases our recent research aimed at improving the accuracy of large language models (LLMs) in mathematical domains. We believe our approach has surpassed previous methods, such as chain-of-thought and graph-of-thought techniques, achieving state-of-the-art performance.

Background

In recent years, numerous prompting methods have been developed to guide large language models (LLMs) in tackling mathematical problems i.e Chain of Thought. However, their mathematical performance still falls short of satisfaction espically without fine tuning or zero-shot prompting. As a result, we have devised a novel approach to enhance this performance, which we call "divide and conquer." Unlike traditional applications of divide and conquer, our method proposes utilizing a programming language, such as Python, combined with an interpreter, simulating the way a human uses a calculator.

Algorithm (2024-12-31 Update)

Dac Algorithm

Screenshot 2025-01-01 at 1 49 49 PM

Perofrmance Analysis

Screenshot 2025-01-01 at 3 26 21 PM

Explain

Our algorithm primarily focuses on mathematical problems, particularly computational challenges rather than proof-based issues. It first assesses whether the problem is a mathematical one and then divides it into subproblems until it can be solved using Python programming. This approach significantly reduces calculation errors, and we believe that our performance, even without fine-tuning the model, has reached state-of-the-art levels.

Performance

Currently, our experiments with other models and datasets are still in progress. However, here are some experimental results using the DeepSeek v3 model. Some result materials are provided.

Method Score
Direct Prompting 271 / 500
Our Method (Multiple Attempts) 441 / 500
Our Method (One Attempt) 407 / 500

Usage

Your dataset should be in a JSON file containing problems and their corresponding answers. Additionally, Python installation is necessary.

Create a json file

[
    {
      "problem": "Calculate the area of a circle with radius 5",
      "answer": "78.54"
    },
    {
      "problem": "Find the sum of numbers from 1 to 10",
      "answer": "55"
    },
    {
      "problem": "What is the square root of 144?",
      "answer": "12"
    }
]

You can create the json file with our generator. You only need to change to config.json file to use this program.

python dac.py

Demo

https://www.youtube.com/watch?v=t5CKDndcRSk

Author

Hon Kit Long u3608018 [at] connect.hku.hk
Au Chi Kin Kinson chikinau03 [at] gmail.com
Liu Peter Hong Zhiliuhongzhi3000 [at] gmail.com
Cho Chung Hei u3605966 [at] connect.hku.hk

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

This repository contains our latest research focused on enhancing the accuracy of large language models (LLMs) in mathematical applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages