Web Browser Control

AgentScope supports web browser control with the agentscope.service.WebBrowser module. It allows agent to interact with web pages, and take actions like clicking, typing and scrolling.

Note the current web browser module requires a vision LLM to work properly. We will provide text-based vision in the future.

Note the web browser module is still in beta, which will be updated frequently.

Prerequisites

The WebBrowser module is implemented based on Playwright. You need to install the lasted AgentScope, as well as the playwright packages as follows:

# Install the latest AgentScope from source
git clone https://github.com/modelscope/agentscope.git
cd agentscope
pip install -e .

# Install playwright
pip install playwright
playwright install

Guidance

Initialize the WebBrowser module as follows

from agentscope.service import WebBrowser

browser = WebBrowser()

The WebBrowser module facilitates browser control and state retrieval. The name of the control functions are all prefixed by “action_”, e.g. action_visit_url, and action_click. To see the full list of functions, calling the get_action_functions method.

# To see full supported actions
print(browser.get_action_functions())

# Visit a new webpage
browser.action_visit_url("https://www.bing.com")

To monitor the current state of the browser, you can call the function prefixed by "page_", e.g. page_url, page_title, and page_html”.

# The url
print(browser.page_url)

# The page title
print(browser.page_title)

# The page in MarkDown format (parsed by markdownify)
print(browser.page_markdown)

# The page html (maybe too long)
print(browser.page_html)

Besides, to help vision models to understand the webpage better, we provide set_interactive_marks function, which will mark all the interactive elements on the current webpage with index labels. After calling set_interactive_marks function, more actions can be performed on the webpage. For example, clicking a button, typing in a text box, etc.

# Set interactive marks with index labels
browser.set_interactive_marks()

# Remove interactive marks
# browser.remove_interactive_marks()

Work with Agent

The above functions provide basic operations for interactive web browser control. You can use them to build your own web browsing agent.

In AgentScope, the web browser is also some kind of tool functions, so you can use it together with the service toolkit module to build your own agent. We also provide a web browser agent in our example. You can refer to it for more details.

[Back to the top]