agentscope.service.browser.web_browser module

The web browser module for agent to interact with web pages.

class agentscope.service.browser.web_browser.WebElementInfo(*, html: str, tag_name: str, node_name: str, node_value: None | str, type: None | str, aria_label: None | str, is_clickable: str | bool, meta_data: list[str], inner_text: str, origin_x: float, origin_y: float, width: float, height: float)[源代码]

基类:BaseModel

The information of a web interactive element.

html: str

The html content of the element.

tag_name: str

The tage name of the element.

node_name: str

The node name of the element.

node_value: None | str

The node value of the element.

type: None | str

The type of the element.

aria_label: None | str

The aria label of the element.

is_clickable: str | bool

Whether the element is clickable. If clickable, the value is the link of the element, otherwise, the value is False.

meta_data: list[str]

The meta data of the elements, e.g. attributes

inner_text: str

The text content of the element.

origin_x: float

The x coordinate of the origin of the element.

origin_y: float

The y coordinate of the origin of the element.

width: float

The width of the element.

height: float

The height of the element.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'aria_label': FieldInfo(annotation=Union[NoneType, str], required=True), 'height': FieldInfo(annotation=float, required=True), 'html': FieldInfo(annotation=str, required=True), 'inner_text': FieldInfo(annotation=str, required=True), 'is_clickable': FieldInfo(annotation=Union[str, bool], required=True), 'meta_data': FieldInfo(annotation=list[str], required=True), 'node_name': FieldInfo(annotation=str, required=True), 'node_value': FieldInfo(annotation=Union[NoneType, str], required=True), 'origin_x': FieldInfo(annotation=float, required=True), 'origin_y': FieldInfo(annotation=float, required=True), 'tag_name': FieldInfo(annotation=str, required=True), 'type': FieldInfo(annotation=Union[NoneType, str], required=True), 'width': FieldInfo(annotation=float, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class agentscope.service.browser.web_browser.WebBrowser(timeout: int = 30, browser_visible: bool = True, browser_width: int = 1280, browser_height: int = 1080)[源代码]

基类:object

The web browser for agent, which is implemented with playwright. This module allows agent to interact with web pages, such as visiting a web page, clicking on elements, typing text, scrolling web page, etc.

备注

1. This module is still under development, and changes will be made in the future. 2. In Playwright, because of its asynchronous operations, it is essential to use if __name__ == “__main__”: to designate the main entry point of the program. This practice ensures that asynchronous functions are executed correctly within the appropriate context.

Install:

Execute the following code to install the required packages:

pip install playwright
playwright install
Details:

1. The actions that the agent can take in the web browser includes: “action_click”, “action_type”, “action_scroll_up”, “action_scroll_down”, “action_press_key”, and “action_visit_url”. 2. You can extract the html content, title, url, screenshot of the current web page by calling the corresponding properties, e.g. page_url, page_html, page_title, page_screenshot. 3. You can set or remove the interactive marks on the web page by calling the set_interactive_marks and remove_interactive_marks methods.

示例

from agentscope.service import WebBrowser
import time

if __name__ == "__main__":
    browser = WebBrowser()
    # Visit the specific web page
    browser.action_visit_url("https://www.bing.com")
    # Set the interactive marks on the web page
    browser.set_interactive_marks()

    time.sleep(5)

    browser.close()
__init__(timeout: int = 30, browser_visible: bool = True, browser_width: int = 1280, browser_height: int = 1080) None[源代码]

Initialize the web browser module.

参数:
  • timeout (int, defaults to 30) – The timeout (in seconds) for the browser to wait for the page to load, defaults to 60s.

  • browser_visible (bool, defaults to True) – Whether the browser is visible.

  • browser_width (int, defaults to 1280) – The width of the browser. Defaults to 1280.

  • browser_height (int, defaults to 1080) – The height of the browser. Defaults to 1080.

property url: str

The url of current page.

property page_html: str

The html content of current page.

property page_title: str

The title of current page.

property page_markdown: str

The content of current page in Markdown format.

property page_screenshot: bytes

The screenshot of the current page.

action_click(element_id: int) ServiceResponse[源代码]

Click on the element with the given id.

参数:

element_id (int) – The id of the element to click.

返回:

The response of the click action.

返回类型:

ServiceResponse

action_type(element_id: int, text: str, submit: bool) ServiceResponse[源代码]

Type text into the element with the given id.

参数:
  • element_id (int) – The id of the element to type text into.

  • text (str) – The text to type into the element.

  • submit (bool) – If press the “Enter” after typing text.

返回:

The response of the type action.

返回类型:

ServiceResponse

action_scroll_up() ServiceResponse[源代码]

Scroll up the current web page.

action_scroll_down() ServiceResponse[源代码]

Scroll down the current web page.

action_press_key(key: str) ServiceResponse[源代码]

Press down a key in the current web page.

参数:

key (str) – Chosen from F1 - F12, Digit0- Digit9, KeyA- KeyZ, Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, etc.

action_visit_url(url: str) ServiceResponse[源代码]

Visit the given url.

参数:

url (str) – The url to visit in browser.

get_action_functions() dict[str, Callable][源代码]

Return a dictionary of the action functions, where the key is the action name and the value is the corresponding function.

set_interactive_marks() list[WebElementInfo][源代码]

Mark the interactive elements on the current web page.

remove_interactive_marks() None[源代码]

Remove the interactive elements on the current web page.

close() None[源代码]

Close the browser