Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion ms_agent/agent/llm_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,14 +405,20 @@ def handle_new_response(self, messages: List[Message],
assert response_message is not None, 'No response message generated from LLM.'
if response_message.tool_calls:
self.log_output('[tool_calling]:')
for tool_call in response_message.tool_calls:
for idx, tool_call in enumerate(response_message.tool_calls):
tool_call = deepcopy(tool_call)
if isinstance(tool_call['arguments'], str):
try:
tool_call['arguments'] = json.loads(
tool_call['arguments'])
except json.decoder.JSONDecodeError:
pass
if tool_call['arguments'] is None:
response_message.tool_calls[idx]['arguments'] = {
'__error__':
'Original arguments were None, replaced by default.'
}
Comment on lines +416 to +420

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for handling None arguments replaces them with a dictionary {'__error__': '...'}. This is likely to cause a TypeError when the tool function is called with an unexpected __error__ keyword argument. A safer approach would be to use an empty dictionary. This would result in a more informative TypeError about missing required arguments if the tool expects any, which is better than an error about an unexpected keyword.

                if tool_call['arguments'] is None:
                    response_message.tool_calls[idx]['arguments'] = {}


self.log_output(
json.dumps(tool_call, ensure_ascii=False, indent=4))

Expand Down
94 changes: 93 additions & 1 deletion ms_agent/tools/filesystem_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@

from ms_agent.llm.utils import Tool
from ms_agent.tools.base import ToolBase
from ms_agent.utils import get_logger
from ms_agent.utils import MAX_CONTINUE_RUNS, get_logger, retry
from ms_agent.utils.constants import DEFAULT_OUTPUT_DIR
from openai import OpenAI

logger = get_logger()

Expand All @@ -21,6 +22,12 @@ def __init__(self, config, **kwargs):
super(FileSystemTool, self).__init__(config)
self.exclude_func(getattr(config.tools, 'file_system', None))
self.output_dir = getattr(config, 'output_dir', DEFAULT_OUTPUT_DIR)
if 'edit_file' not in self.exclude_functions:
self.edit_file_config = getattr(config.tools.file_system,
'edit_file_config', None)
self.client = OpenAI(
api_key=self.edit_file_config.api_key,
base_url=self.edit_file_config.base_url)
Comment on lines +26 to +30

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code directly accesses config.tools.file_system which could raise an AttributeError if config.tools exists but file_system does not. Additionally, if edit_file_config is not found, self.edit_file_config will be None, causing a crash when accessing .api_key. It's better to handle this gracefully with safer access and clear error messages to ensure the configuration is valid when the tool is enabled.

            file_system_config = getattr(config.tools, 'file_system', None)
            self.edit_file_config = getattr(file_system_config, 'edit_file_config', None) if file_system_config else None
            if not self.edit_file_config:
                raise ValueError("'edit_file_config' is missing in the configuration for FileSystemTool.")
            self.client = OpenAI(
                api_key=self.edit_file_config.api_key,
                base_url=self.edit_file_config.base_url)

self.trust_remote_code = kwargs.get('trust_remote_code', False)
self.allow_read_all_files = getattr(
getattr(config.tools, 'file_system', {}), 'allow_read_all_files',
Expand Down Expand Up @@ -125,6 +132,65 @@ async def get_tools(self):
'required': ['path'],
'additionalProperties': False
}),
Tool(
tool_name='edit_file',
server_name='file_system',
description=
('Use this tool to make an edit to an existing file.\n\n'
'This will be read by a less intelligent model, which will quickly apply the edit. '
'You should make it clear what the edit is, while also minimizing the unchanged code you write.\n'
'When writing the edit, you should specify each edit in sequence, with the special comment '
'// ... existing code ... to represent unchanged code in between edited lines.\n\n'
'For example:\n\n// ... existing code ...\nFIRST_EDIT\n// ... existing code ...\n'
'SECOND_EDIT\n// ... existing code ...\nTHIRD_EDIT\n// ... existing code ...\n\n'
'You should still bias towards repeating as few lines of the original file '
'as possible to convey the change.\n'
'But, each edit should contain minimally sufficient context of unchanged lines '
"around the code you're editing to resolve ambiguity.\n"
'DO NOT omit spans of pre-existing code (or comments) without using the '
'// ... existing code ... comment to indicate its absence. '
'If you omit the existing code comment, the model may inadvertently delete these lines.\n'
'If you plan on deleting a section, you must provide context before and after to delete it. '
'If the initial code is ```code \\n Block 1 \\n Block 2 \\n Block 3 \\n code```, '
'and you want to remove Block 2, you would output '
'```// ... existing code ... \\n Block 1 \\n Block 3 \\n // ... existing code ...```.\n'
'Make sure it is clear what the edit should be, and where it should be applied.\n'
'Make edits to a file in a single edit_file call '
'instead of multiple edit_file calls to the same file. '
'The apply model can handle many distinct edits at once.'
),
parameters={
'type': 'object',
'properties': {
'path': {
'type': 'string',
'description':
'Path of the target file to modify.'
},
'instructions': {
'type':
'string',
'description':
('A single sentence instruction describing '
'what you are going to do for the sketched edit. '
'This is used to assist the less intelligent model in applying the edit. '
'Use the first person to describe what you are going to do. '
'Use it to disambiguate uncertainty in the edit.'
)
},
'code_edit': {
'type':
'string',
'description':
('Specify ONLY the precise lines of code that you wish to edit. '
'NEVER specify or write out unchanged code. '
'Instead, represent all unchanged code using the comment of the language '
"you're editing in - example: // ... existing code ..."
)
}
},
'required': ['path', 'instructions', 'code_edit']
}),
]
}
return {
Expand Down Expand Up @@ -267,3 +333,29 @@ async def list_files(self, path: str = None):
except Exception as e:
return f'List files of <{path or "root path"}> failed, error: ' + str(
e)

@retry(max_attempts=MAX_CONTINUE_RUNS, delay=1.0)
async def edit_file(self,
path: str = None,
instructions: str = None,
code_edit: str = None):
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The edit_file function signature allows path to be None, but the function body does not handle this case. If path is None, os.path.join(self.output_dir, path) inside the try block will raise a TypeError. Although the tool definition marks path as required, it's good practice to add a defensive check for path at the beginning of the function to prevent this runtime error. For example: if not path: return "Error: 'path' argument is required.".

with open(os.path.join(self.output_dir, path), 'r') as f:
initial_code = f.read()
response = self.client.chat.completions.create(
model=self.edit_file_config.diff_model,
messages=[{
'role':
'user',
'content':
(f'<instruction>{instructions}</instruction>\n'
f'<code>{initial_code}</code>\n'
f'<update>{code_edit}</update>')
}])
merged_code = response.choices[0].message.content

with open(os.path.join(self.output_dir, path), 'w') as f:
f.write(merged_code)
return f'Edit file <{path}> successfully.'
except Exception as e:
return f'Edit file <{path}> failed, error: ' + str(e)
6 changes: 5 additions & 1 deletion projects/code_scratch/architecture.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
llm:
service: openai
model: claude-sonnet-4-5-20250929
model: claude-haiku-4-5-20251001
openai_api_key:
openai_base_url: https://dashscope.aliyuncs.com/compatible-mode/v1

Expand Down Expand Up @@ -53,6 +53,10 @@ prompt:
callbacks:
- callbacks/artifact_callback

tools:
file_system:
mcp: false

max_chat_round: 1

tool_call_timeout: 30000
Expand Down
50 changes: 20 additions & 30 deletions projects/code_scratch/callbacks/eval_callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from contextlib import contextmanager
from typing import List, Optional

from file_parser import extract_code_blocks
from ms_agent.agent.runtime import Runtime
from ms_agent.callbacks import Callback
from ms_agent.llm.utils import Message
Expand All @@ -26,6 +25,7 @@ def __init__(self, config: DictConfig):
self.compile_round = 300
self.cur_round = 0
self.last_issue_length = 0
self.devtool_prompt = getattr(config.prompt, 'devtool', None)

async def on_task_begin(self, runtime: Runtime, messages: List[Message]):
self.omit_intermediate_messages(messages)
Expand Down Expand Up @@ -87,25 +87,17 @@ def check_install():
@staticmethod
def check_runtime():
try:
os.system('pkill -f node')
if os.getcwd().endswith('backend'):
result = subprocess.run(['npm', 'run', 'dev'],
capture_output=True,
text=True,
timeout=5,
stdin=subprocess.DEVNULL)
else:
result = subprocess.run(['npm', 'run', 'build'],
capture_output=True,
text=True,
check=True)
result = subprocess.run(['npm', 'run', 'dev'],
capture_output=True,
text=True,
timeout=5,
stdin=subprocess.DEVNULL)
Comment on lines +90 to +94

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pkill -f node commands were removed from check_runtime. This is risky because npm run dev often starts a long-running development server. Without a mechanism to terminate it, the process might be orphaned and cause issues in subsequent runs, such as "address already in use" errors or resource leaks. Consider re-introducing a cleanup step to ensure the node process is terminated after the check.

except subprocess.CalledProcessError as e:
output = EvalCallback._parse_e_msg(e)
except subprocess.TimeoutExpired as e:
output = EvalCallback._parse_e_msg(e)
else:
output = result.stdout + '\n' + result.stderr
os.system('pkill -f node')
return output

def _run_compile(self):
Expand Down Expand Up @@ -139,12 +131,21 @@ async def on_generate_response(self, runtime: Runtime,
self.last_issue_length = len(messages) - 3 - self.last_issue_length
self.omit_intermediate_messages(messages)
query = self.get_compile_feedback('frontend').strip()

# compile -> devtools
if not query:
human_feedback = True
query = self.get_human_feedback().strip()
feedback_type = 'devtools'
query = self.devtool_prompt
self.devtool_prompt = 'Use chrome-devtools to thoroughly test again'
else:
human_feedback = False
feedback_type = 'compling'
logger.warn(f'[Compile Feedback]: {query}]')

# devtools -> human
if not query:
feedback_type = 'human'
query = self.get_human_feedback().strip()

if not query:
self.feedback_ended = True
feedback = (
Expand All @@ -153,22 +154,11 @@ async def on_generate_response(self, runtime: Runtime,
else:
all_local_files = await self.file_system.list_files()
feedback = (
f'Feedback from {"human" if human_feedback else "compling"}: {query}\n'
f'Feedback from {feedback_type}: {query}\n'
f'The files on the local system of this project: {all_local_files}\n'
f'Now please analyze and fix this issue:\n')
messages.append(Message(role='user', content=feedback))

async def on_tool_call(self, runtime: Runtime, messages: List[Message]):
design, _ = extract_code_blocks(
messages[-1].content, target_filename='design.txt')
if len(design) > 0:
front, design = messages[-1].content.split(
'```text: design.txt', maxsplit=1)
design, end = design.rsplit('```', 1)
design = design.strip()
if design:
messages[2].content = await self.do_arch_update(
runtime=runtime, messages=messages, updated_arch=design)
logger.info(messages)

async def after_tool_call(self, runtime: Runtime, messages: List[Message]):
runtime.should_stop = runtime.should_stop and self.feedback_ended
2 changes: 1 addition & 1 deletion projects/code_scratch/coding.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
llm:
service: openai
model: claude-sonnet-4-5-20250929
model: claude-haiku-4-5-20251001
openai_api_key:
openai_base_url: https://dashscope.aliyuncs.com/compatible-mode/v1

Expand Down
49 changes: 32 additions & 17 deletions projects/code_scratch/refine.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
llm:
service: openai
model: claude-sonnet-4-5-20250929
model: claude-haiku-4-5-20251001
openai_api_key:
openai_base_url: https://dashscope.aliyuncs.com/compatible-mode/v1

Expand Down Expand Up @@ -40,21 +40,7 @@ prompt:
* Do a minimum change in case that the normal code is damaged, if you are doing a break change, change related files also
* Fix other issues you discover while reading the code files, and these issues need to be ones where you have identified the root cause

4. Express your thinking in concise and clear language. When you fix files, you should use the following format:

```type: filename
text
```

for example:
```javascript: frontend/index.js
your code here
```

`javascript: frontend/index.js` will be used as the filename. If you are fixing a file, you need to:
* Read the target file
* Follow the original data structures and file imports, do not break it(you may read more files depends on)
* Then output the complete fixed code of the file.
4. Express your thinking in concise and clear language. When you fix files, you should use the edit_file tool

If you only output code snippets to demonstrate your conclusions, you can use standard code blocks:

Expand All @@ -66,8 +52,27 @@ prompt:

Let's begin:

devtool: |
Use chrome-devtools to thoroughly test the generated frontend and backend code:
* List all console messages using list_console_messages to identify JavaScript errors, warnings, or logs
* Get detailed error information using get_console_message for each error or warning found
* List network requests using list_network_requests to check if API calls are successful, verify HTTP status codes, and identify failed requests
* Get detailed network request/response information using get_network_request to analyze request headers, payloads, and response data
* Take a snapshot of the page to understand the current UI state and available interactive elements
* Test the implemented functionality by:
- Clicking on interactive elements (buttons, links, forms) using click tool
- Filling out forms using fill or fill_form tools to test user input workflows
- Navigating between pages to verify routing works correctly
- Testing keyboard interactions using press_key when necessary
* Take screenshots at critical steps to document the UI state and verify visual correctness
* Analyze the feedback from all these operations to identify:
- Console errors (e.g., undefined variables, import errors, runtime exceptions)
- Network failures (e.g., 404/500 errors, CORS issues, timeout problems)
- UI/UX issues (e.g., broken layouts, missing elements, non-functional buttons)
- Logic errors (e.g., incorrect data display, failed form submissions)
* Use this comprehensive feedback to help the refine model better understand and fix the issues

callbacks:
- callbacks/artifact_callback
- callbacks/eval_callback

tools:
Expand All @@ -77,6 +82,16 @@ tools:
- create_directory
- write_file
- list_files
edit_file_config:
diff_model: morph-v3-fast
api_key:
base_url: https://api.morphllm.com/v1

chrome-devtools:
mcp: true
command: "npx"
args: ["-y", "chrome-devtools-mcp@latest"]
transport: "stdio"

max_chat_round: 100

Expand Down
Loading