fixed a typo, added readme

Techolution · Aug 21, 2024 · 79e4b0d · 79e4b0d
1 parent 870247c
commit 79e4b0d
Showing 2 changed files with 198 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,197 @@
-This file will be overwritten by `index.ipynb`
+# LazyLM
+
+
+<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
+
+## Introduction
+
+In the rapidly evolving landscape of artificial intelligence, large
+language models (LLMs) have emerged as powerful tools for a wide range
+of applications, from content creation to complex problem-solving. We’ve
+seen widespread adoption of language models as “assistants” in various
+domains, such as healthcare, education, and content creation. However,
+in educational contexts, these models often fall short of providing an
+optimal learning experience. They tend to generate complete solutions
+upfront, robbing students of the opportunity to engage in the
+step-by-step reasoning process that is crucial for deep understanding.
+
+This raises an important question:
+
+> Can we get these systems to evaluate in a ‘Socratic’ way?
+
+In other words, can we develop systems that encourage students to
+*think* step-by-step, rather than generating complete solutions upfront?
+
+While there has been significant work on making LLMs reason step-by-step
+(e.g., chain-of-thought prompting), to our knowledge, there isn’t an
+existing framework to build systems that allow users to think through
+problems step-by-step while having the LLM assist in a pedagogical way.
+This is where the concept of “lazy evaluation” in language models comes
+into play, offering a more Socratic approach to AI-assisted tasks.
+
+This work demonstrates an approach to building a framework for forcing
+models to evaluate in a lazy way, drawing inspiration from functional
+programming concepts.
+
+## Motivation
+
+The motivation behind implementing lazy evaluation in language models
+stems from several key observations:
+
+- **Pedagogical Effectiveness**: Traditional tutoring methods often
+  involve guiding students through problems step-by-step, allowing them
+  to think critically and make connections on their own. AI tutors
+  should aim to replicate this approach rather than simply providing
+  answers.
+
+- **Resource Efficiency**: Generating complete solutions upfront is
+  computationally expensive, especially for complex problems. A lazy
+  evaluation approach can significantly reduce resource usage by
+  generating only the necessary information on demand.
+
+- **Adaptability**: Students have varying levels of understanding and
+  may require different amounts of guidance. A lazy evaluation system
+  can adapt to each student’s needs, providing more or less detail as
+  required.
+
+- **Engagement**: By revealing information gradually, we can maintain
+  student engagement and encourage active participation in the
+  problem-solving process.
+
+- **Real-world Problem Solving**: In many real-world scenarios,
+  solutions are not immediately apparent and must be approached
+  incrementally. Training students to think in this way prepares them
+  for challenges beyond the classroom.
+
+## Understanding Lazy Evaluation
+
+## A Note on Lazy Evaluation in Programming
+
+The concept of lazy evaluation is well-established in functional
+programming languages, where the evaluation of an expression is only
+done when the value of the expression is needed. This is also known as
+call-by-need. The contrast to this evaluation strategy is what’s called
+“eager” or strict evaluation.
+
+The concept of lazy evaluation is well-established in functional
+programming languages, where the evaluation of an expression is only
+done when the value (or terminial) of the expression is needed. This is
+also known as call-by-need. The contrast of this evaluation strategy is
+what’s called “eager” evaluation or strict evaluation. Eager evaluation
+evaluates all of the subexpressions of an expression regardless whether
+the value is used or not. For example the expression:
+
+``` haskell
+func :: int -> int -> int
+func a b = a
+```
+
+will always just pass back the first argument to the function
+
+``` haskell
+>> func (2+2) 100
+4
+```
+
+We get `4` because we are passing back that expression as part of the
+function, therefore we need to evaluate it to print it to the screen.
+
+However, what about this
+
+``` haskell
+>>> func (2 + 2) (100^1000000)
+```
+
+In an eager (or strict) language like python, the evaluation of that
+function would look something like this
+
+    func((2+2), (100^1000000))
+
+    func( 4, (100^1000000) )
+
+    func( 4, 10000000000000000000000000000.....)
+
+    4
+
+That is a lot of wasted computation just for a function that returns the
+first argument!
+
+Programming languages that support lazy evaluation have an evaluation
+strategy that looks more like this:
+
+    func((2+2), (100^1000000))
+
+    (2 + 2)
+
+    4
+
+Our language “knows” that we don’t need the second argument, so why
+bother evaluating it?
+
+### Lazy Evaluation in the Context of Language Models
+
+Language models are inherently eager: given a prompt, they will continue
+to generate tokens until reaching an end-of-sequence token. However,
+this behavior is not always desirable, especially in educational
+contexts. For example, given a prompt such as:
+
+“What is the derivative of `2x^3 + x^2 + 2x + 1`? Give me the solution
+step-by-step”
+
+A language model will generate the entire sequence of steps in one go.
+Something like this:
+
+    Given function: f(x) = 2x^3 + x^2 + 2x + 1
+
+    Step 1: Differentiate each term separately using the power rule and constant rule.
+    The power rule states that the derivative of x^n is nx^(n-1).
+    The constant rule states that the derivative of a constant is 0.
+
+    a) Differentiate 2x^3:
+        d/dx(2x^3) = 2 * 3x^(3-1) = 6x^2
+    b) Differentiate x^2:
+        d/dx(x^2) = 2x^(2-1) = 2x
+    c) Differentiate 2x:
+        d/dx(2x) = 2
+    d) Differentiate 1:
+        d/dx(1) = 0
+
+    Step 2: Combine the results from each term.
+    f'(x) = 6x^2 + 2x + 2 + 0
+
+    Step 3: Simplify the expression.
+    f'(x) = 6x^2 + 2x + 2
+
+    Therefore, the derivative of 2x^3 + x^2 + 2x + 1 is 6x^2 + 2x + 2.
+
+In the context of creating an application that can assist students with
+learning this content, a full trace will likely be suboptimal in
+facilitating a productive learning enviornment.
+
+More abstractly, we can frame the problems as a problem initalization, a
+sequence of steps, and the final solution.
+
+`sp_1 -> sp_2 -> sp_3 -> ... -> sp_n, where sp_i is the sub problem at step i for problem p`
+
+With some light definition in place we can frame the desired evaluation
+strategy like this:
+
+`past(sp_1) -> curr(sp_2) -> future(sp_3 -> ... -> sp_n)`
+
+where `past` are the steps that have already been evaluated, `curr` is
+the current step that has been evaluated, and `future` are all of the
+future steps to be evaluated.
+
+To a language model, this is all just token sequences.
+
+    TokenSequence_1 -> TokenSequence_2 -> TokenSequence_3 -> ... -> TokenSequence_n`
+    |     sp_1    |    |     sp_2    |    |     sp_3    |    ...    |     sp_n    |
+
+Leveraging a model’s KV cache for memoizing the compute done on
+`TokenSequence`s that have already been computed we can frame our
+evaluation strategy to look much more “lazy”.
+
+`memoized(TokenSequence_1 -> TokenSequence_2) -> TokenSequence_3 -> future(TokenSequence_4 -> ... -> TokenSequence_n)`
+
+where we can get roughly the desired evaluation strategy having the LLM
+compute the next sequence it samples as a sub problem and nothing more.
diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -16,7 +16,7 @@
    "source": [
     "# LazyLM\n",
     "\n",
-    "> A prompting framework for getting foundational models to \"lazyily\" evaluate their reasoning trace."
+    "> A prompting framework for getting foundational models to \"lazily\" evaluate their reasoning trace."
    ]
   },
   {