You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
["Click on the Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art"]
"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
Action: ...
## Action Space
{action_space}
## User Instruction
{instruction}
"""
action_space
"""
click(start_box='<|box_start|>(x1,y1)<|box_end|>')
long_press(start_box='<|box_start|>(x1,y1)<|box_end|>', time='')
type(content='')
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
press_back()
wait() #Sleep for 5s and take a screenshot to check for any changes.
"""
The text was updated successfully, but these errors were encountered:
We recommend trying the following prompt format. When providing the Thought, use the format mentioned in the prompt to guide the model in predicting the Action(e.g. Thought: Click on the Romanticism art\nAction: ....). Additionally, we suggest conducting mobile scenario experiments on the SFT version of the model.
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
Output Format
Thought: ...
Action: ...
Action Space
click(start_box='[x1, y1, x2, y2]')
long_press(start_box='[x1, y1, x2, y2]', time='')
type(content='')
scroll(direction='down or up or right or left')
open_app(app_name='')
press_back()
press_home()
wait()
finished() # Submit the task regardless of whether it succeeds or fails.
Note
Use English in Thought part.
Summarize your next action (with its target element) in one sentence in Thought part.
User Instruction
Make the Copy of Office Pic in the Drive app
By structuring the prompt in this way, the model can better understand the formatting requirements and predict actions more effectively. Let us know if you have any further questions! 🚀
I have a question, is the direction of the instruction and the direction of the label opposite during training?
That is, if the instruction is scrolling up, the actual model output is in the opposite direction.
instructions
["Click on the Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art", "Swipe up and learn more about Romanticism art"]
result
[["Action: click(start_box='(259,314)')"], ["Action: scroll(direction='down')"], ["Action: scroll(direction='down')"], ["Action: scroll(direction='down')"], ["Action: scroll(direction='down')"]]
prompts
"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
## Action Space
{action_space}
## User Instruction
{instruction}
"""
action_space
"""
click(start_box='<|box_start|>(x1,y1)<|box_end|>')
long_press(start_box='<|box_start|>(x1,y1)<|box_end|>', time='')
type(content='')
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
press_back()
wait() #Sleep for 5s and take a screenshot to check for any changes.
"""
The text was updated successfully, but these errors were encountered: