CodeExecutorAgent executes code from complete context #4810

Leon0402 · 2024-12-25T14:32:11Z

What happened?

The CodeExecutorAgent executes code in the complete context.

What did you expect to happen?

It should only execute code in the most recent message and not everything that was ever written in the context.

How can we reproduce it (as minimally and precisely as possible)?

Run on_messages with multiple messages, observe how all the code is executed.

AutoGen version

0.4 (master)

Which package was this bug in

AgentChat

Model used

gpt4-mini

Python version

3.10

Operating system

Linux

Any additional info you think would be helpful for fixing this bug

Maybe more generically the interface of on_messages might be not too great. It seems problematic to me that there is no differentiation between the messages the models should directly react to and the overall context. I could imagine use cases, where I have multiple agents and want to execute code from multiple responses, but not from everything. I don't have a specific suggestion in mind though at the moment, it is more of a general note.

The text was updated successfully, but these errors were encountered:

ekzhu · 2024-12-26T01:12:34Z

Perhaps do you mean a different behavior of code executor agent? If you can start from a custom agent that uses code executor and create your own logic, we can learn from your experience.

Leon0402 · 2024-12-26T06:37:40Z

Just something like this:

    async def on_messages(self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken) -> Response:
        if not isinstance(messages[-1], TextMessage):
            return Response(chat_message=TextMessage(content="", source=self.name))

        code_blocks = _extract_markdown_code_blocks(messages[-1].content)
        if code_blocks:
            result = await self._code_executor.execute_code_blocks(code_blocks, cancellation_token=cancellation_token)

            code_output = result.output
            if code_output.strip() == "":
                # No output
                code_output = f"The script ran but produced no output to console. The POSIX exit code was: {result.exit_code}. If you were expecting output, consider revising the script to ensure content is printed to stdout."
            elif result.exit_code != 0:
                # Error
                code_output = f"The script ran, then exited with an error (POSIX exit code: {result.exit_code})\nIts output was:\n{result.output}"

            return Response(chat_message=TextMessage(content=code_output, source=self.name))
        else:
            return Response(
                chat_message=TextMessage(
                    content="No code blocks found in the thread. Please provide at least one markdown-encoded code block to execute (i.e., quoting code in ```python or ```sh code blocks).",
                    source=self.name,
                )
            )

So instead of executing all messages, just executing the very last message. Because the previous code works like this:

Agent 1: I need to write some python code ... <python block 1>
CodeExecutor: Here is the result of block 1
Agent 2: Ok, looks great, I will write more python code <python block 2>
Code Executor: Here is the result of block 1 and block 2
Agent 3: Let's now write even more python <python block 3>
Code Executor: Here is the result of block 1, block 2 and block 3

What you usually want (at least I want that and I believe it to be more common?):

Agent 1: I need to write some python code ... <python block 1>
CodeExecutor: Here is the result of block 1
Agent 2: Ok, looks great, I will write more python code <python block 2>
Code Executor: Here is the result of block 2
Agent 3: Let's now write even more python <python block 3>
Code Executor: Here is the result of block 3

Because the results of each python block will be in the context, so there is no need to execute them over and over again. Or in case of Jupyter Notebook it is stateful anyway. Or am I mistaken here?

Leon0402 · 2024-12-26T07:09:40Z

Perhaps I jumped a little bit too fast to conclusions here. It seems that messages: Sequence[ChatMessage] is not always the complete history here as I assumed previously, but always the new messages instead. And if the model needs access to the old messages, it needs to store it themself as done in AssistantAgent for instance.
I thought initially that messages would always be the complete messages, which is not the case. Sorry my bad!

In that case, probably the only thing needed is again some field sources that filters messages from specific agents such as the initial prompt.

ekzhu · 2024-12-26T20:12:55Z

In that case, probably the only thing needed is again some field sources that filters messages from specific agents such as the initial prompt.

That's a good idea too. Would be useful to filter by agent who actually meant to generate code blocks. Welcome a PR for this.

ekzhu · 2024-12-26T20:14:03Z

I thought initially that messages would always be the complete messages, which is not the case. Sorry my bad!

We should update the docs so it becomes obvious that the on_messages is meant for delta not complete history.

github-actions bot added the needs-triage label Dec 25, 2024

ekzhu mentioned this issue Dec 26, 2024

Update doc to make it obvious that on_messages is meant for delta. #4817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeExecutorAgent executes code from complete context #4810

CodeExecutorAgent executes code from complete context #4810

Leon0402 commented Dec 25, 2024

ekzhu commented Dec 26, 2024

Leon0402 commented Dec 26, 2024 •

edited

Loading

Leon0402 commented Dec 26, 2024

ekzhu commented Dec 26, 2024

ekzhu commented Dec 26, 2024

CodeExecutorAgent executes code from complete context #4810

CodeExecutorAgent executes code from complete context #4810

Comments

Leon0402 commented Dec 25, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

AutoGen version

Which package was this bug in

Model used

Python version

Operating system

Any additional info you think would be helpful for fixing this bug

ekzhu commented Dec 26, 2024

Leon0402 commented Dec 26, 2024 • edited Loading

Leon0402 commented Dec 26, 2024

ekzhu commented Dec 26, 2024

ekzhu commented Dec 26, 2024

Leon0402 commented Dec 26, 2024 •

edited

Loading