-
Notifications
You must be signed in to change notification settings - Fork 5
feat: A tool to run R code #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@simonpcouch If you have time for a quick review, I'd love another set of eyes on the documentation and the tool description. Have I hit the right balance between "don't poke you're eye out" and "this is pretty useful"? |
simonpcouch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reaaaaaally nice features here. This is dope!
One thing that wasn't immediately obvious to me was that the code that the model wrote was scrollable. At first, I assumed there was a bug in the tool output display and only when trying to debug it realized that the shown code is fixed-height and can be scrolled through:
run_r_code.mov
The above is a tool call resulting from "Could you load the forested data from the forested package and make 3 ggplots?"
Also, I'm seeing this ellmer error intermittently, but don't have good reflexes on what the root cause might be:
R/tool-run.R
Outdated
| or perform dangerous or irreversible actions. Always consider the security | ||
| implications of the code that you write. If you have any doubts, consult the | ||
| user with a preview of the code you would like to write before executing it. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When using this tool, work incrementally by breaking your analysis into small, focused steps rather than writing large blocks of code at once. Importantly, create only one plot per call to this tool. | |
| A few style guidelines to keep in mind when using this tool: | |
| * Return results implicitly, like `x`, rather than with `print(x)` or `cat(x)`. | |
| * Return plots implicitly rather than assigning them to intermediate variables and then displaying the variable. | |
| * Do not communicate with the user via the `code` argument to this tool, instead explaining choices you've made and interpretations of output in a message to them directly. | |
| * Do not decorate output with custom displays, e.g. with `cat()`. | |
| * Respect the existing console width that the user has set. | |
Here, I'd discourage the model from writing lots of code in a single call to this tool, instead preferring to run code piece-by-piece. In the tool display as it's implemented (which I think is stellar), when the model does a whole bunch at once, it can be difficult to trace back which pieces of output come from which pieces of code without a lot of scrolling back and forth.
Also, prompting against some common anti-patterns in models' R code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I just pushed a change that shows source and output interleaved in the display, which solves the scrolling and over-long code block problems. I also added a "copy code" button with a cool feature: it copies the source code and output together in a reprex-style format (note that feature doesn't work in the Positron viewer, yet).
As you can see Haiku 4.5 really really likes those cat() calls, even though I've updated the instructions. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think I've squashed that intermittent error you saw (I also bumped into it)
don't strip all whitespace, just the ones around the edges that make things look weird
this is important because the tool card is re-rendered frequently when streaming in a result
these are wasted tokens if using coding assistants to edit the main js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sharp. Interleaving that output is so nice. This looks great!
Models' "default" Amount Of Code Written when calling this tool still feels too long to me. This leads to issues like that below, where the model hallucinates column names in data is hasn't seen yet because it didn't stop to examine the output of glimpse(forested). We've prompted PA/Databot/side::kick() in the same way in their analogous tools for this reason.
If you still disagree, okay with me that maybe this is a matter of preference best resolved in our btw.mds. :)
|
Is there any way could suppress package startup messages by default in the tool UI? |
@simonpcouch ah that's a great point; thanks for showing me the example. I didn't take those lines initially because there are some shinychat limitations we need to fix around how the tool UI works when you're streaming in results. So I wasn't sure if I wanted to create a situation where the model tries to run code in too small of a chunk and ends up making that problem worse. I think I'll go back and add those lines in though after seeing your example. I'll look into suppressing package startup messages too! |
Closes #118
Summary
This PR adds
btw_tool_run_r(), a tool that executes R code in the global environment and returns the results to the LLM. I've marked the tool Experimental for now.What it captures
The tool captures and returns:
print(),cat(), etc.message()warning()stop()When an error occurs, all output up to the error is returned.
The tool makes use of recent changes in ellmer v0.4.0 to allow tools to return lists of
Content, includeContentImagetypes from plots. Each of the above output types are given btw-local content types which are also used to customize their display in shinychat.Security
This tool is disabled by default. It executes arbitrary code in the global environment without sandboxing or review. We recommend:
To enable the tool
The tool can be enable via an R option (in a session, an
.Rprofileor inbtw.md):Or equivalently via environment variable:
When this option is set,
btw_tools()will include thebtw_tool_run_r()tool, otherwise it is excluded frombtw_tools().In
btw_tools(), you can also explicitly include the"run","run_r"or"btw_tool_run_r"tool intools:Or in
btw.md:Dependencies
This feature adds a few additional suggested dependencies.
evaluatefor running and evaluating the LLM-written codefansiis available, we use it to translate ANSI colors to HTMLraggis available, we use it as the plot rendering device. The plot device can be customized by providing a function via the R optionbtw.run_r.graphics_device.