-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Gianni Crivello
authored and
Gianni Crivello
committed
Aug 21, 2024
1 parent
870247c
commit 79e4b0d
Showing
2 changed files
with
198 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,197 @@ | ||
This file will be overwritten by `index.ipynb` | ||
# LazyLM | ||
|
||
|
||
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --> | ||
|
||
## Introduction | ||
|
||
In the rapidly evolving landscape of artificial intelligence, large | ||
language models (LLMs) have emerged as powerful tools for a wide range | ||
of applications, from content creation to complex problem-solving. We’ve | ||
seen widespread adoption of language models as “assistants” in various | ||
domains, such as healthcare, education, and content creation. However, | ||
in educational contexts, these models often fall short of providing an | ||
optimal learning experience. They tend to generate complete solutions | ||
upfront, robbing students of the opportunity to engage in the | ||
step-by-step reasoning process that is crucial for deep understanding. | ||
|
||
This raises an important question: | ||
|
||
> Can we get these systems to evaluate in a ‘Socratic’ way? | ||
In other words, can we develop systems that encourage students to | ||
*think* step-by-step, rather than generating complete solutions upfront? | ||
|
||
While there has been significant work on making LLMs reason step-by-step | ||
(e.g., chain-of-thought prompting), to our knowledge, there isn’t an | ||
existing framework to build systems that allow users to think through | ||
problems step-by-step while having the LLM assist in a pedagogical way. | ||
This is where the concept of “lazy evaluation” in language models comes | ||
into play, offering a more Socratic approach to AI-assisted tasks. | ||
|
||
This work demonstrates an approach to building a framework for forcing | ||
models to evaluate in a lazy way, drawing inspiration from functional | ||
programming concepts. | ||
|
||
## Motivation | ||
|
||
The motivation behind implementing lazy evaluation in language models | ||
stems from several key observations: | ||
|
||
- **Pedagogical Effectiveness**: Traditional tutoring methods often | ||
involve guiding students through problems step-by-step, allowing them | ||
to think critically and make connections on their own. AI tutors | ||
should aim to replicate this approach rather than simply providing | ||
answers. | ||
|
||
- **Resource Efficiency**: Generating complete solutions upfront is | ||
computationally expensive, especially for complex problems. A lazy | ||
evaluation approach can significantly reduce resource usage by | ||
generating only the necessary information on demand. | ||
|
||
- **Adaptability**: Students have varying levels of understanding and | ||
may require different amounts of guidance. A lazy evaluation system | ||
can adapt to each student’s needs, providing more or less detail as | ||
required. | ||
|
||
- **Engagement**: By revealing information gradually, we can maintain | ||
student engagement and encourage active participation in the | ||
problem-solving process. | ||
|
||
- **Real-world Problem Solving**: In many real-world scenarios, | ||
solutions are not immediately apparent and must be approached | ||
incrementally. Training students to think in this way prepares them | ||
for challenges beyond the classroom. | ||
|
||
## Understanding Lazy Evaluation | ||
|
||
## A Note on Lazy Evaluation in Programming | ||
|
||
The concept of lazy evaluation is well-established in functional | ||
programming languages, where the evaluation of an expression is only | ||
done when the value of the expression is needed. This is also known as | ||
call-by-need. The contrast to this evaluation strategy is what’s called | ||
“eager” or strict evaluation. | ||
|
||
The concept of lazy evaluation is well-established in functional | ||
programming languages, where the evaluation of an expression is only | ||
done when the value (or terminial) of the expression is needed. This is | ||
also known as call-by-need. The contrast of this evaluation strategy is | ||
what’s called “eager” evaluation or strict evaluation. Eager evaluation | ||
evaluates all of the subexpressions of an expression regardless whether | ||
the value is used or not. For example the expression: | ||
|
||
``` haskell | ||
func :: int -> int -> int | ||
func a b = a | ||
``` | ||
|
||
will always just pass back the first argument to the function | ||
|
||
``` haskell | ||
>> func (2+2) 100 | ||
4 | ||
``` | ||
|
||
We get `4` because we are passing back that expression as part of the | ||
function, therefore we need to evaluate it to print it to the screen. | ||
|
||
However, what about this | ||
|
||
``` haskell | ||
>>> func (2 + 2) (100^1000000) | ||
``` | ||
|
||
In an eager (or strict) language like python, the evaluation of that | ||
function would look something like this | ||
|
||
func((2+2), (100^1000000)) | ||
|
||
func( 4, (100^1000000) ) | ||
|
||
func( 4, 10000000000000000000000000000.....) | ||
|
||
4 | ||
|
||
That is a lot of wasted computation just for a function that returns the | ||
first argument! | ||
|
||
Programming languages that support lazy evaluation have an evaluation | ||
strategy that looks more like this: | ||
|
||
func((2+2), (100^1000000)) | ||
|
||
(2 + 2) | ||
|
||
4 | ||
|
||
Our language “knows” that we don’t need the second argument, so why | ||
bother evaluating it? | ||
|
||
### Lazy Evaluation in the Context of Language Models | ||
|
||
Language models are inherently eager: given a prompt, they will continue | ||
to generate tokens until reaching an end-of-sequence token. However, | ||
this behavior is not always desirable, especially in educational | ||
contexts. For example, given a prompt such as: | ||
|
||
“What is the derivative of `2x^3 + x^2 + 2x + 1`? Give me the solution | ||
step-by-step” | ||
|
||
A language model will generate the entire sequence of steps in one go. | ||
Something like this: | ||
|
||
Given function: f(x) = 2x^3 + x^2 + 2x + 1 | ||
|
||
Step 1: Differentiate each term separately using the power rule and constant rule. | ||
The power rule states that the derivative of x^n is nx^(n-1). | ||
The constant rule states that the derivative of a constant is 0. | ||
|
||
a) Differentiate 2x^3: | ||
d/dx(2x^3) = 2 * 3x^(3-1) = 6x^2 | ||
b) Differentiate x^2: | ||
d/dx(x^2) = 2x^(2-1) = 2x | ||
c) Differentiate 2x: | ||
d/dx(2x) = 2 | ||
d) Differentiate 1: | ||
d/dx(1) = 0 | ||
|
||
Step 2: Combine the results from each term. | ||
f'(x) = 6x^2 + 2x + 2 + 0 | ||
|
||
Step 3: Simplify the expression. | ||
f'(x) = 6x^2 + 2x + 2 | ||
|
||
Therefore, the derivative of 2x^3 + x^2 + 2x + 1 is 6x^2 + 2x + 2. | ||
|
||
In the context of creating an application that can assist students with | ||
learning this content, a full trace will likely be suboptimal in | ||
facilitating a productive learning enviornment. | ||
|
||
More abstractly, we can frame the problems as a problem initalization, a | ||
sequence of steps, and the final solution. | ||
|
||
`sp_1 -> sp_2 -> sp_3 -> ... -> sp_n, where sp_i is the sub problem at step i for problem p` | ||
|
||
With some light definition in place we can frame the desired evaluation | ||
strategy like this: | ||
|
||
`past(sp_1) -> curr(sp_2) -> future(sp_3 -> ... -> sp_n)` | ||
|
||
where `past` are the steps that have already been evaluated, `curr` is | ||
the current step that has been evaluated, and `future` are all of the | ||
future steps to be evaluated. | ||
|
||
To a language model, this is all just token sequences. | ||
|
||
TokenSequence_1 -> TokenSequence_2 -> TokenSequence_3 -> ... -> TokenSequence_n` | ||
| sp_1 | | sp_2 | | sp_3 | ... | sp_n | | ||
|
||
Leveraging a model’s KV cache for memoizing the compute done on | ||
`TokenSequence`s that have already been computed we can frame our | ||
evaluation strategy to look much more “lazy”. | ||
|
||
`memoized(TokenSequence_1 -> TokenSequence_2) -> TokenSequence_3 -> future(TokenSequence_4 -> ... -> TokenSequence_n)` | ||
|
||
where we can get roughly the desired evaluation strategy having the LLM | ||
compute the next sequence it samples as a sub problem and nothing more. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters