docs: improve documentation (All-Hands-AI#3425)

* add imports to eval harness * update out-dated custom sandbox guide * Update docs/modules/usage/evaluation_harness.md * remove llm pasta * update od doc --------- Co-authored-by: Engel Nyst <[email protected]>
autoscienceai · Aug 16, 2024 · 5d92048 · 5d92048
1 parent 3f20e4e
commit 5d92048
Show file tree

Hide file tree

Showing 3 changed files with 32 additions and 38 deletions.
diff --git a/docs/modules/usage/custom_sandbox_guide.md b/docs/modules/usage/custom_sandbox_guide.md
@@ -90,34 +90,7 @@ Congratulations!
 
 ## Technical Explanation
 
-The relevant code is defined in [ssh_box.py](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/ssh_box.py) and [image_agnostic_util.py](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py).
-
-In particular, `ssh_box.py` checks the config object for ```config.sandbox_container_image``` and then attempts to retrieve the image using [get_od_sandbox_image](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L72) which is defined in image_agnostic_util.py.
-
-When first using a custom image, it will not be found and thus it will be built (on subsequent runs the built image will be found and returned).
-
-The custom image is built using [_build_sandbox_image()](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L29), which creates a docker file using your custom_image as a base and then configures the environment for OpenDevin, like this:
-
-```python
-dockerfile_content = (
-    f'FROM {base_image}\n'
-    'RUN apt update && apt install -y openssh-server wget sudo\n'
-    'RUN mkdir -p -m0755 /var/run/sshd\n'
-    'RUN mkdir -p /opendevin && mkdir -p /opendevin/logs && chmod 777 /opendevin/logs\n'
-    'RUN echo "" > /opendevin/bash.bashrc\n'
-    'RUN if [ ! -d /opendevin/miniforge3 ]; then \\\n'
-    '        wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \\\n'
-    '        bash Miniforge3-$(uname)-$(uname -m).sh -b -p /opendevin/miniforge3 && \\\n'
-    '        chmod -R g+w /opendevin/miniforge3 && \\\n'
-    '        bash -c ". /opendevin/miniforge3/etc/profile.d/conda.sh && conda config --set changeps1 False && conda config --append channels conda-forge"; \\\n'
-    '    fi\n'
-    'RUN /opendevin/miniforge3/bin/pip install --upgrade pip\n'
-    'RUN /opendevin/miniforge3/bin/pip install jupyterlab notebook jupyter_kernel_gateway flake8\n'
-    'RUN /opendevin/miniforge3/bin/pip install python-docx PyPDF2 python-pptx pylatexenc openai\n'
-).strip()
-```
-
-> Note: the name of the image is modified via [_get_new_image_name()](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L63) and it is the modified name that is searched for on subsequent runs.
+Please refer to [custom docker image section of the runtime documentation](https://docs.all-hands.dev/modules/usage/runtime#advanced-how-opendevin-builds-and-maintains-od-runtime-images) for more details.
 
 ## Troubleshooting / Errors
 

diff --git a/docs/modules/usage/evaluation_harness.md b/docs/modules/usage/evaluation_harness.md
@@ -84,9 +84,35 @@ To integrate your own benchmark, we suggest starting with the one that most clos
 
 ## How to create an evaluation workflow
 
+
 To create an evaluation workflow for your benchmark, follow these steps:
 
-1. Create a configuration:
+1. Import relevant OpenDevin utilities:
+   ```python
+    import agenthub
+    from evaluation.utils.shared import (
+        EvalMetadata,
+        EvalOutput,
+        make_metadata,
+        prepare_dataset,
+        reset_logger_for_multiprocessing,
+        run_evaluation,
+    )
+    from opendevin.controller.state.state import State
+    from opendevin.core.config import (
+        AppConfig,
+        SandboxConfig,
+        get_llm_config_arg,
+        parse_arguments,
+    )
+    from opendevin.core.logger import opendevin_logger as logger
+    from opendevin.core.main import create_runtime, run_controller
+    from opendevin.events.action import CmdRunAction
+    from opendevin.events.observation import CmdOutputObservation, ErrorObservation
+    from opendevin.runtime.runtime import Runtime
+   ```
+
+2. Create a configuration:
    ```python
    def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig:
        config = AppConfig(
@@ -103,15 +129,15 @@ To create an evaluation workflow for your benchmark, follow these steps:
        return config
    ```
 
-2. Initialize the runtime and set up the evaluation environment:
+3. Initialize the runtime and set up the evaluation environment:
    ```python
    async def initialize_runtime(runtime: Runtime, instance: pd.Series):
        # Set up your evaluation environment here
        # For example, setting environment variables, preparing files, etc.
        pass
    ```
 
-3. Create a function to process each instance:
+4. Create a function to process each instance:
    ```python
    async def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput:
        config = get_config(instance, metadata)
@@ -141,7 +167,7 @@ To create an evaluation workflow for your benchmark, follow these steps:
        )
    ```
 
-4. Run the evaluation:
+5. Run the evaluation:
    ```python
    metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir)
    output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
@@ -162,8 +188,6 @@ Remember to customize the `get_instruction`, `your_user_response_function`, and
 
 By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenDevin framework.
 
-Certainly! I'll add a section explaining the user_response_fn and include a description of the workflow and interaction. Here's an updated version of the guideline with the new section:
-
 
 ## Understanding the `user_response_fn`
 

diff --git a/opendevin/README.md b/opendevin/README.md
@@ -48,8 +48,5 @@ flowchart LR
 ```
 
 ## Runtime
-The Runtime class is abstract, and has a few different implementations:
 
-* We have a LocalRuntime, which runs commands and edits files directly on the user's machine
-* We have a DockerRuntime, which runs commands inside of a docker sandbox, and edits files directly on the user's machine
-* We have an E2BRuntime, which uses [e2b.dev containers](https://github.com/e2b-dev/e2b) to sandbox file and command operations
+Please refer to the [documentation](https://docs.all-hands.dev/modules/usage/runtime) to learn more about `Runtime`.