Skip to content

Commit

Permalink
fixed a typo, added readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Gianni Crivello authored and Gianni Crivello committed Aug 21, 2024
1 parent 870247c commit 79e4b0d
Showing 2 changed files with 198 additions and 2 deletions.
198 changes: 197 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,197 @@
This file will be overwritten by `index.ipynb`
# LazyLM


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Introduction

In the rapidly evolving landscape of artificial intelligence, large
language models (LLMs) have emerged as powerful tools for a wide range
of applications, from content creation to complex problem-solving. We’ve
seen widespread adoption of language models as “assistants” in various
domains, such as healthcare, education, and content creation. However,
in educational contexts, these models often fall short of providing an
optimal learning experience. They tend to generate complete solutions
upfront, robbing students of the opportunity to engage in the
step-by-step reasoning process that is crucial for deep understanding.

This raises an important question:

> Can we get these systems to evaluate in a ‘Socratic’ way?
In other words, can we develop systems that encourage students to
*think* step-by-step, rather than generating complete solutions upfront?

While there has been significant work on making LLMs reason step-by-step
(e.g., chain-of-thought prompting), to our knowledge, there isn’t an
existing framework to build systems that allow users to think through
problems step-by-step while having the LLM assist in a pedagogical way.
This is where the concept of “lazy evaluation” in language models comes
into play, offering a more Socratic approach to AI-assisted tasks.

This work demonstrates an approach to building a framework for forcing
models to evaluate in a lazy way, drawing inspiration from functional
programming concepts.

## Motivation

The motivation behind implementing lazy evaluation in language models
stems from several key observations:

- **Pedagogical Effectiveness**: Traditional tutoring methods often
involve guiding students through problems step-by-step, allowing them
to think critically and make connections on their own. AI tutors
should aim to replicate this approach rather than simply providing
answers.

- **Resource Efficiency**: Generating complete solutions upfront is
computationally expensive, especially for complex problems. A lazy
evaluation approach can significantly reduce resource usage by
generating only the necessary information on demand.

- **Adaptability**: Students have varying levels of understanding and
may require different amounts of guidance. A lazy evaluation system
can adapt to each student’s needs, providing more or less detail as
required.

- **Engagement**: By revealing information gradually, we can maintain
student engagement and encourage active participation in the
problem-solving process.

- **Real-world Problem Solving**: In many real-world scenarios,
solutions are not immediately apparent and must be approached
incrementally. Training students to think in this way prepares them
for challenges beyond the classroom.

## Understanding Lazy Evaluation

## A Note on Lazy Evaluation in Programming

The concept of lazy evaluation is well-established in functional
programming languages, where the evaluation of an expression is only
done when the value of the expression is needed. This is also known as
call-by-need. The contrast to this evaluation strategy is what’s called
“eager” or strict evaluation.

The concept of lazy evaluation is well-established in functional
programming languages, where the evaluation of an expression is only
done when the value (or terminial) of the expression is needed. This is
also known as call-by-need. The contrast of this evaluation strategy is
what’s called “eager” evaluation or strict evaluation. Eager evaluation
evaluates all of the subexpressions of an expression regardless whether
the value is used or not. For example the expression:

``` haskell
func :: int -> int -> int
func a b = a
```

will always just pass back the first argument to the function

``` haskell
>> func (2+2) 100
4
```

We get `4` because we are passing back that expression as part of the
function, therefore we need to evaluate it to print it to the screen.

However, what about this

``` haskell
>>> func (2 + 2) (100^1000000)
```

In an eager (or strict) language like python, the evaluation of that
function would look something like this

func((2+2), (100^1000000))

func( 4, (100^1000000) )

func( 4, 10000000000000000000000000000.....)

4

That is a lot of wasted computation just for a function that returns the
first argument!

Programming languages that support lazy evaluation have an evaluation
strategy that looks more like this:

func((2+2), (100^1000000))

(2 + 2)

4

Our language “knows” that we don’t need the second argument, so why
bother evaluating it?

### Lazy Evaluation in the Context of Language Models

Language models are inherently eager: given a prompt, they will continue
to generate tokens until reaching an end-of-sequence token. However,
this behavior is not always desirable, especially in educational
contexts. For example, given a prompt such as:

“What is the derivative of `2x^3 + x^2 + 2x + 1`? Give me the solution
step-by-step”

A language model will generate the entire sequence of steps in one go.
Something like this:

Given function: f(x) = 2x^3 + x^2 + 2x + 1

Step 1: Differentiate each term separately using the power rule and constant rule.
The power rule states that the derivative of x^n is nx^(n-1).
The constant rule states that the derivative of a constant is 0.

a) Differentiate 2x^3:
d/dx(2x^3) = 2 * 3x^(3-1) = 6x^2
b) Differentiate x^2:
d/dx(x^2) = 2x^(2-1) = 2x
c) Differentiate 2x:
d/dx(2x) = 2
d) Differentiate 1:
d/dx(1) = 0

Step 2: Combine the results from each term.
f'(x) = 6x^2 + 2x + 2 + 0

Step 3: Simplify the expression.
f'(x) = 6x^2 + 2x + 2

Therefore, the derivative of 2x^3 + x^2 + 2x + 1 is 6x^2 + 2x + 2.

In the context of creating an application that can assist students with
learning this content, a full trace will likely be suboptimal in
facilitating a productive learning enviornment.

More abstractly, we can frame the problems as a problem initalization, a
sequence of steps, and the final solution.

`sp_1 -> sp_2 -> sp_3 -> ... -> sp_n, where sp_i is the sub problem at step i for problem p`

With some light definition in place we can frame the desired evaluation
strategy like this:

`past(sp_1) -> curr(sp_2) -> future(sp_3 -> ... -> sp_n)`

where `past` are the steps that have already been evaluated, `curr` is
the current step that has been evaluated, and `future` are all of the
future steps to be evaluated.

To a language model, this is all just token sequences.

TokenSequence_1 -> TokenSequence_2 -> TokenSequence_3 -> ... -> TokenSequence_n`
| sp_1 | | sp_2 | | sp_3 | ... | sp_n |

Leveraging a model’s KV cache for memoizing the compute done on
`TokenSequence`s that have already been computed we can frame our
evaluation strategy to look much more “lazy”.

`memoized(TokenSequence_1 -> TokenSequence_2) -> TokenSequence_3 -> future(TokenSequence_4 -> ... -> TokenSequence_n)`

where we can get roughly the desired evaluation strategy having the LLM
compute the next sequence it samples as a sub problem and nothing more.
2 changes: 1 addition & 1 deletion nbs/index.ipynb
Original file line number Diff line number Diff line change
@@ -16,7 +16,7 @@
"source": [
"# LazyLM\n",
"\n",
"> A prompting framework for getting foundational models to \"lazyily\" evaluate their reasoning trace."
"> A prompting framework for getting foundational models to \"lazily\" evaluate their reasoning trace."
]
},
{

0 comments on commit 79e4b0d

Please sign in to comment.