Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ notebooks/
*logs/
work/
appmap.log
tmp/appmap

# Solve
solve
Expand Down
61 changes: 61 additions & 0 deletions VERIFY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
The purpose of this issue is to provide instructions on how to verify the open source status and benchmark results for AppMap Navie v2 on the Lite and Verified benchmarks.

## Navie is open source

You can find the benchmark code for Navie V2 here:

[https://github.com/getappmap/navie-benchmark](https://github.com/getappmap/navie-benchmark)

Within that project, there are two git submodules, which are also open source:

* [https://github.com/getappmap/appmap-js/](https://github.com/getappmap/appmap-js/)
* [https://github.com/getappmap/navie-editor](https://github.com/getappmap/navie-editor)

These three projects completely contain the code of Navie v2.

## Running the benchmark

### General instructions

You'll be using the GitHub Workflow `official.yml` to run the solver. It will generate test patches ("synthetic tests"), code patches ("solutions"), and then evaluate the results.

For best results, use `claude-3-5-sonnet-20241022` with GitHub Action environment variable `ANTHROPIC_API_KEY`.

Use the default branch of the repository, which is `swe-bench-2`.

### Instance set option

The primary input that you need to select is the instance set. The `instance_set` option names a ".txt" file that's located in `data/instance_sets`. For example, the instance set `verified_33_pct_1` includes 1/3 of the instances from the Verified set (every 3rd instance). Using instance sets enables you to run solver more quickly and cheaply than running the entire dataset.

To run a quick "smoke" test, use instance set `smoke`.

To run the entire Verified dataset, use instance set `verified`.

### Other options

- **llm** `claude-3-5-sonnet-20241022`
- **context_token_limit** `64000` For economy, you can run with a smaller token limit (e.g. `16000`), however you'll lose a couple of percent in the solve rate.
- **context_token_limit_increase** `20` (default)
- **temperature_increase** `0.1` (default)
- **test_patch_solve_threshold** `1` (default)
- **max_test_solve_iterations** `3` (default)
- **num_runners** Size these according to the instance set that you use. We recommend using one runner for every 20-30 instances. With this many runners, you can expect the workflow to complete in 1-2 hours.
- **name** As desired

## Notes

_Evaluation_

If you prefer to use your own evaluation, rather than the code in this fork of swe-bench, you can remove that section from the Workflow.

_Environments other than GitHub Actions_

Of course, you don’t have to use GitHub Actions to run Navie. It’s just easy because it’s all configured.

You can see from the official.yml that, aside from building a conda environment and installing some dependencies, it’s necessary to build submodules/appmap-js using yarn.

---

Please let me know if you have any questions, or if you would like these instructions in a different format or for a different target system.


Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@diagram /noprojectinfo /include=\bsolver\b Create a class diagram for the feature "code checkout", using the provided documentation as a guide.
5 changes: 5 additions & 0 deletions architecture/code-checkout/.navie/dependencyFiles.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[
"solver/checkout_code.py",
"solver/harness/build_extended_image.py",
"solver/harness/image_store.py"
]
14 changes: 14 additions & 0 deletions architecture/code-checkout/.navie/readme/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Task

Your task is to document a feature in the style of a software architecture document. Use the
available project information to document the feature.

Document the feature from a usage point of view, not from an implementation point of view. Focus
on the design and behavior of the feature as it is used by the end user or other parts of the
system. Avoid including implementation details in the documentation. Do not provide a code
breakdown, as the user is not interested in that information.

Do not emit anything before or after the documentation content. Just emit the documentation content.

Avoid using tentative language such as "may", "might", "could", "appears", "likely" etc. Describe
only what you see from the data.
1 change: 1 addition & 0 deletions architecture/code-checkout/.navie/readme/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@explain /noprojectinfo /include=\bsolver\b Document the feature "code checkout".
24 changes: 24 additions & 0 deletions architecture/code-checkout/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
**Feature: Code Checkout**

The "Code Checkout" feature facilitates the creation of a local working copy of the code repository by leveraging a Docker container. This process involves exporting the current state of the version-controlled files from the container to the local file system, capturing the initial code baseline in a local git repository, and ensuring the code is set up for subsequent modifications and executions.

### Feature Overview

- **Container-Based Code Export**: The feature initiates by executing a command in a Docker container to create a compressed `.tar.gz` archive of the current state of the code. This archive is generated using the `git archive` command, emphasizing that the export is derived from a version-controlled git repository within the container.

- **Local Directory Setup**: Prior to extraction, the feature verifies that the designated local directory (`source_dir`) for extracting the code does not already exist. If it does, a `ValueError` is raised to prevent unintentional overwrites. The directory is created if it doesn't preexist.

- **Extraction Process**: The generated archive is copied from the container to the local file system. The content of the archive is then extracted into `source_dir`. This step ensures that the working directory is populated with the latest version-controlled code from the container environment.

- **Local Git Initialization**: After extraction, the feature performs a series of git commands within the local directory to initialize a new git repository. It adds all the extracted files to the staging area and performs an initial commit, labeling it as "Baseline commit". This establishes a baseline from which subsequent modifications can be tracked locally.

### Error Handling

The feature incorporates robust error handling to address potential issues during the checkout process:

- If a failure occurs during the git initialization and commit stages, the error is logged with details of the exception. This ensures transparency and ease of troubleshooting.
- Regardless of the operation outcome, the process guarantees that the system directory context is reset to its original state by using a `finally` block.

### Usage Context

This feature is a fundamental part of setting up the development workflow by ensuring that the source code is properly initialized with version control in the local environment after being exported from a controlled Docker container. This setup is particularly useful when working with remote environments, allowing developers to have a synchronized and consistent starting point for code development and testing on their local systems.
34 changes: 34 additions & 0 deletions architecture/code-checkout/class-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

```mermaid
classDiagram
direction LR

class CodeCheckout {
-log: Callable
-container: docker.models.containers.Container
-source_dir: Path
-tmp_dir: Path
+checkout_code(): void
}

class GitOperations {
+initialize_git(source_dir: Path): void
+add_all_files(source_dir: Path): void
+commit_files(message: str): void
}

class DockerOperations {
+create_git_archive(container: docker.models.containers.Container, archive_path: Path): void
+copy_archive_to_local(container: docker.models.containers.Container, local_path: Path): void
}

class DirectoryManager {
+verify_directory_not_exists(path: Path): void
+create_directory(path: Path): void
+reset_directory_context(original_path: Path): void
}

CodeCheckout --> DockerOperations : uses
CodeCheckout --> GitOperations : uses
CodeCheckout --> DirectoryManager : uses
```
1 change: 1 addition & 0 deletions architecture/code-solving/.navie/class-diagram/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@diagram /noprojectinfo /include=\bsolver\b Create a class diagram for the feature "code solving", using the provided documentation as a guide.
4 changes: 4 additions & 0 deletions architecture/code-solving/.navie/dependencyFiles.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[
"solver/solve.py",
"solver/workflow/generate_code.py"
]
14 changes: 14 additions & 0 deletions architecture/code-solving/.navie/readme/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Task

Your task is to document a feature in the style of a software architecture document. Use the
available project information to document the feature.

Document the feature from a usage point of view, not from an implementation point of view. Focus
on the design and behavior of the feature as it is used by the end user or other parts of the
system. Avoid including implementation details in the documentation. Do not provide a code
breakdown, as the user is not interested in that information.

Do not emit anything before or after the documentation content. Just emit the documentation content.

Avoid using tentative language such as "may", "might", "could", "appears", "likely" etc. Describe
only what you see from the data.
1 change: 1 addition & 0 deletions architecture/code-solving/.navie/readme/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@explain /noprojectinfo /include=\bsolver\b Document the feature "code solving".
38 changes: 38 additions & 0 deletions architecture/code-solving/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Feature: Code Solving

### Overview

The "Code Solving" feature is designed to automate the generation and optimization of code patches within a specified project. This feature systematically identifies test errors, formulates a plan to address them, and applies code modifications confined to specific files, without altering the testing infrastructure. The ultimate objective is to create an environment where code changes are seamless and optimized for the existing architecture, avoiding disruptions or errors.

### Functionality

1. **Error Analysis**:
- The code solver identifies and presents test errors that need addressing. A structured plan is generated outlining these errors and preventing test failures.

2. **Plan and Modify**:
- A detailed plan is created for the necessary code modifications, restricting changes to explicitly mentioned files. This ensures that only the targeted areas of the codebase are altered, preserving the integrity of other components.

3. **Patch Generation**:
- A new code patch is generated based on the specified plan. The code solver selects optimal code patches using mock functionalities to simulate and validate the generated code's effectiveness in resolving identified issues.

4. **Test Compatibility**:
- Ensures compatibility with the test framework utilized, allowing seamless integration and execution of generated code patches using pre-defined command lines for testing. This maintains consistency across project environments.

5. **Execution**:
- Utilizes Python's subprocess capabilities to execute commands related to instance sets and solve limits. This automation facilitates the smooth application of generated code patches across the codebase.

6. **Archiving Logs**:
- After the code-solving process, logs for the applied patches and predictions are archived. This is instrumental in maintaining records of all code changes and predictions made during the process.

### Design Considerations

- **Environment-Specific Code**:
- The feature is designed to respect the constraints of the specific Python version used by the project, ensuring no incompatibilities or unsupported features are introduced in the codebase.

- **No Direct Testing Suggestions**:
- The feature does not provide direct testing recommendations or alterations, as these considerations are handled in a separate step to maintain focus on code optimization.

- **User Interaction**:
- Minimal user intervention is required, other than initiating the code-solving process and inputting necessary parameters such as instance sets and context tokens.

Overall, the Code Solving feature offers a streamlined approach to code optimization within a project, facilitating automatic code patch creation and application while aligning with the existing architectural constraints and testing frameworks.
42 changes: 42 additions & 0 deletions architecture/code-solving/class-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

```mermaid
classDiagram

class CodeSolver {
+solve_errors(test_errors: List~str~): Plan
+apply_patch(patch: Patch): bool
+execute_cmds(cmds: List~str~)
+archive_logs(log_dir: Path)
}

class Plan {
+errors: List~str~
+generate_plan(): void
+modify_code(files: List~str~): bool
}

class Patch {
+content: str
+apply_to(file: str): bool
}

class Logger {
+log_process(step: str, message: str): void
}

class Environment {
+python_version: str
+validate_compatibility(): bool
}

class UserInteraction {
+init_code_solver()
+input_parameters(instance_set: str, context_tokens: int)
}

CodeSolver --> Plan
CodeSolver --> Patch
CodeSolver --> Logger
CodeSolver --> Environment
CodeSolver -- UserInteraction
```
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@diagram /noprojectinfo /include=\bsolver\b Create a class diagram for the feature "collect appmap context", using the provided documentation as a guide.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[
"solver/appmap/appmap.py",
"solver/observe_test.py",
"solver/solve.py",
"solver/workflow/collect_appmap_context.py",
"solver/workflow/observe_test.py",
"solver/workflow/solution_listener.py",
"solver/workflow/solve_code.py"
]
14 changes: 14 additions & 0 deletions architecture/collect-appmap-context/.navie/readme/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Task

Your task is to document a feature in the style of a software architecture document. Use the
available project information to document the feature.

Document the feature from a usage point of view, not from an implementation point of view. Focus
on the design and behavior of the feature as it is used by the end user or other parts of the
system. Avoid including implementation details in the documentation. Do not provide a code
breakdown, as the user is not interested in that information.

Do not emit anything before or after the documentation content. Just emit the documentation content.

Avoid using tentative language such as "may", "might", "could", "appears", "likely" etc. Describe
only what you see from the data.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@explain /noprojectinfo /include=\bsolver\b Document the feature "collect appmap context".
35 changes: 35 additions & 0 deletions architecture/collect-appmap-context/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
## Feature: Collect AppMap Context

### Overview

The "Collect AppMap Context" feature is responsible for extracting context information from AppMap data files. This feature focuses on gathering code context details from `.appmap.json` files to support subsequent processing steps or analyses.

### Usage

The feature is primarily utilized in scenarios involving collecting and utilizing AppMap data for code analysis and validation. It serves as part of a broader workflow designed to enhance code understanding and processing. The collection process is typically triggered during the execution of synthetic tests, where AppMap data is generated.

#### Workflow Integration

1. **Test Execution and Observation**: The process begins with the observation and execution of synthetic tests within a controlled environment. This is achieved using a Docker container setup. During the test execution phase, AppMap data files are generated and stored in a specified directory. The `ObserveTest` class and its associated methods manage the test execution and data storage.

2. **AppMap Context Collection**: Once the tests are run and AppMap data is available, the `collect_appmap_context_from_directory` function is invoked. It iterates over the generated `.appmap.json` files and extracts relevant context using the `AppMap` class functionalities. The context primarily includes code locations (filename and line number) and associated function codes.

3. **Handling Data**: The collected AppMap context is maintained within a dictionary structure, where keys represent code locations and values contain the associated function code. This context data is then made available for downstream processes, such as improved code patch generation and validation.

4. **Logging and Error Handling**: The feature includes logging mechanisms to track the status and progress of the context collection process. Errors encountered during data extraction are logged for troubleshooting and resolution.

### Key Components

- **AppMap Class**: The central component responsible for parsing and extracting location data from `.appmap.json` files. It provides the `list_locations` method to enumerate code locations present within the class map of the AppMap data.

- **Collection Functions**:
- `collect_appmap_context_from_directory`: This function initiates the collection of AppMap context from a specified directory containing the AppMap files.
- `collect_appmap_context`: Called by the directory function to handle individual AppMap data and populate the result dictionary with location-to-code mappings.

### Benefits

- **Enhanced Code Understanding**: By collecting detailed location and function code information, developers gain better insights into the structure and behavior of the codebase.
- **Support for Code Analyses**: The available context facilitates various analyses and transformations, enabling more informed code generation and validation processes.
- **Improved Workflow Efficiency**: Automation of context extraction reduces manual overhead and streamlines the workflow, enhancing overall productivity.

In summary, the "Collect AppMap Context" feature provides a robust mechanism to extract and maintain code context information, supporting advanced code analyses and improvements. It plays a crucial role in understanding and validating test-generated code efficiently and effectively.
32 changes: 32 additions & 0 deletions architecture/collect-appmap-context/class-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

```mermaid
classDiagram
class AppMap {
+data: dict
+__init__(data: Union[str, dict])
+list_locations(): List[str]
}
class ObserveTest {
+log
+work_dir: Path
+test_spec: TestSpec
+run(docker_client: docker.DockerClient, test_patch: Patch): Optional[ObserveTestResult]
}
class Path {
}
class TestSpec {
}
class Patch {
}
class ObserveTestResult {
+test_status: TestStatus
+appmap_dir: Path
}
class AppMapContextCollector {
+collect_appmap_context_from_directory(log, appmap_dir: Path): dict[str, str]
+collect_appmap_context(log, appmap: AppMap, result: dict[str, str]): dict[str, str]
}
AppMap --> AppMapContextCollector
ObserveTest --> AppMapContextCollector
ObserveTestResult --> Path
```
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@diagram /noprojectinfo /include=\bsolver\b Create a class diagram for the feature "filter solutions from instance set", using the provided documentation as a guide.
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[
"solver/filter_solutions_from_instance_set.py",
"solver/prepare_predictions.py",
"solver/report.py",
"solver/solve.py",
"solver/solve_loop.py",
"solver/workflow/generate_code.py",
"solver/workflow/generate_test.py",
"solver/workflow/observe_test.py",
"solver/workflow/patch.py",
"solver/workflow/solve_listener.py"
]
Loading
Loading