Skip to content

Commit e06720e

Browse files
committed
initial commit
0 parents  commit e06720e

15 files changed

+1096
-0
lines changed

.gitignore

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
node_modules
2+
.DS_Store
3+
.env
4+
.cache/*
5+
.vscode/*
6+
.ipynb_checkpoints/*
7+
package-lock.json
8+
node_modules/*
9+
py39/*
10+
venv/*
11+
vectorstore/*
12+
__pycache__/*
13+
html/*
14+
dom/*
15+
OAI_CONFIG_LIST

README.md

+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# AutoBrowse
2+
AutoBrowse is an autonomous AI agent that can browse the web.
3+
You simply give AutoBrowse a task and it will complete it by interacting with the web browser as if it were a human.
4+
5+
Some examples of tasks you can give it:
6+
7+
- Go to booking.com and find a hotel for 2 people in Madrid for 2 nights starting 2 November for under 200 EUR per night.
8+
9+
- Sign up to ryanair.com with email: <[email protected]> and password C0mplexPassword!.
10+
11+
- Go to Craigslist and search for Nintendo DS. Click on the first result.
12+
13+
## How to run
14+
15+
1. Create a Python 3.9 environment
16+
17+
```bash
18+
conda create --name py39 python=3.9
19+
```
20+
21+
2. Activate the environment
22+
23+
```bash
24+
conda activate py39
25+
```
26+
27+
3. Install dependencies
28+
29+
```bash
30+
pip install -r requirements.txt
31+
```
32+
33+
4. Start the browser environment
34+
35+
Check the README under `browser-console/` for instructions on how to run it.
36+
37+
38+
5. Create a file `OAI_CONFIG_LIST` with the following content and put in your OpenAI API key:
39+
40+
```json
41+
[
42+
{
43+
"model": "gpt-4",
44+
"api_key": "<your-api-key>"
45+
},
46+
{
47+
"model": "gpt-3.5-turbo",
48+
"api_key": "<your-api-key>"
49+
},
50+
{
51+
"model": "gpt-3.5-turbo-16k",
52+
"api_key": "<your-api-key>"
53+
}
54+
]
55+
```
56+
57+
58+
59+
60+
5. Run AutoBrowse
61+
62+
```bash
63+
python autobrowse.py
64+
```
65+
66+
You will then be prompted give a task to AutoBrowse.
67+
68+
69+
You can make modifications agent configurations by modifying the `agent_config.py` file. You can edit the system prompts, change the OpenAI models used etc.
70+
71+
72+
## How it works
73+
AutoBrowse uses [autogen](https://github.com/microsoft/autogen) agents and a browser console to plan and execute the task.
74+
75+
The design consists of 3 agents:
76+
77+
1. An HTML assistant that answers questions about the HTML of the current page open in the browser.
78+
79+
2. A code generator agent that generates puppeteer.js code to interact with the browser (i.e. navigate to a new page, click on a button, fill in form elements)
80+
81+
3. A planner agent that coordinates the use of the two agents above to fulfill the high-level task description provided by the user.
82+
83+
The agents interact with the browser through a websocket connection to a sandboxed browser environment that has an endpoint to accept puppeteer.js code to execute, as well as and endpoint to return the rendered HTML of the current open page.
84+
85+
### HTML Assistant
86+
Since HTML documents can be quite long and can exceed the token limit of OpenAI the following approach is taken to answer queries about the HTML:
87+
88+
- The HTML returned from the browser environment is stripped down and simplified to reduce its size. This done by keeping only the most important attributes like id, name, type, and class . Moreover, `script`, `style`, `noscript`, `img`, `svg`, `link`, `meta` tags are removed altogether.
89+
90+
- The processed HTML is chunked into 15,000 token (as counted by OpenAI) so that they can easily fit in the 16K context window of `gpt-3.5-turbo-16k`.
91+
92+
- Using RAG with OpenAI embeddings, the most relevant chunk is provided as context to the question, and `gpt-3.5-turbo` can then answer the question about the HTML.
93+
94+
### Code Generator
95+
The code generator uses `gpt-4` to generate puppeteer.js code to interact with the browser. A user proxy agent attached to the code generator sends this code to the browser environment to be executed and reports back the result, so that the code generator can amend the code if there are any errors. Because the code generation needs to be as accurate as possible the more expensive `gpt-4` model is used in favor of the cheaper `gpt-3.5-turbo`.
96+
97+
### Planner
98+
99+
The planner receives the task description from the user and tries to complete it by invoking the HTML Assistant and Code Generator as necessary. The planner, in addition to its own thinking, has the ability to invoke two functions:
100+
101+
102+
1. `ask_html_assistant()` to ask the HTML assistant a question about the current HTML (e.g. extract the HTML for the sign-up form), and
103+
104+
2. `ask_code_generator()` to ask the code generator to produce puppeteer.js code to send to the browser. The planner may also add HTML retrieved from the HTML assistant to provide more context to the code generator.
105+
106+
107+
108+
109+

agent_config.py

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
config = {
3+
"html_assistant": {
4+
"model": "gpt-3.5-turbo-16k",
5+
"system_message": """You are a helpful AI Assistant. You will answer questions about HTML code. Respond only with HTML code from the HTML that is provided to you.
6+
(i.e. find the answer only in the HTML that you are given, don't make up imaginary HTML) """,
7+
},
8+
"code_generator": {
9+
"model": "gpt-4",
10+
"system_message": """You are a Javascript engineer. You generate puppeteer.js javascript code to fulfill
11+
a given task that has to do with web browsing. The output of this agent should only be code (inside codeblocks). You may also be
12+
asked to correct code. You should assume that the puppeteer environment has already been initialized with the following code:
13+
const browser = await puppeteer.launch({ headless: false });
14+
const page = await browser.newPage();
15+
Whenever you go to a new website you should use the { waitUntil: 'networkidle0' } option to make sure the page is fully loaded.
16+
Before typing text into an input field you should first click on it to make sure it is focused.
17+
If you get a 'Node is either not clickable or not an Element' error, you are probably trying to click on the wrong element, so if there is no other element
18+
that you can click, you should reply with NOT_CLICKABLE <element_name> .
19+
You may also be provided the execution result of the code. If you see success:true in the execution result, you should reply with TERMINATE .
20+
"""
21+
},
22+
"code_generator_user_proxy": {
23+
"max_consecutive_auto_reply" : 1,
24+
},
25+
26+
"planner": {
27+
"model": "gpt-4",
28+
"system_message": """You are a planner. You generate a plan to fulfill a web browsing task. This is done through the use of 2 other AI assistant agents. You can propose the usage of two functions : 1. ask_html_assistant (to ask questions about the current
29+
page in the browser - the result will be HTML code) Keep in mind that this agent is not able to make any modifications to the page, only respond to questions about it. 2. ask_code_generator (to generate and execute puppeteer.js code in the browser) .
30+
The code_generator does not have the HTML context, so you may need to provide it with the HTML from the html_assistant.
31+
PLEASE MAKE SURE TO ASK THE HTML ASSISTANT FOR RELEVANT CONTEXT AND PROVIDE THE RETRIEVED HTML AS CONTEXT TO THE CODE GENERATOR!!!! If the context is not needed then pass an empty string as context_html.
32+
So you might want to first ask the html_assistant a question about the HTML content, and then use the result of that question as context input to the code_generator.
33+
When the function call to ask_code_generator comes back with TERMINATE, that means the code has been generated and executed successfully.
34+
IMPORTANT: You should not write out the entire plan right away and instead focus on the next step of the plan.
35+
If any step of the plan does not work out (e.g. code execution fails), DO NOT proceed to the next step without first trying to fix the current step.
36+
Also try to prompt the other agents to do things as granularly as possible (i.e. don't try to do too many things in one function call, like trying to extracting 2 html selectors at the same time from the html, or trying to ask the code generator to do many actions in one function call).
37+
Please ensure to remove any cookie notices and other popups.
38+
When you suggest a function call do not produce any ```json code blocks. Make sure you suggest function calls correctly so that autogen can parse them. Explicitly say it when you are suggesting a function call.
39+
Don't provide HTML selectors to the code generator but rather actual HTML fragments.
40+
Please ensure to accept any cookie notices and remove any other popups.
41+
When the plan has been successfully completed reply with FINISHED.
42+
Take a deep breath and work on this problem step-by-step.
43+
""",
44+
},
45+
46+
"planner_user_proxy": {
47+
"max_consecutive_auto_reply": 35,
48+
},
49+
}

0 commit comments

Comments
 (0)