agentscope.service.browser.web_browser module
The web browser module for agent to interact with web pages.
- class WebBrowser(timeout: int = 30, browser_visible: bool = True, browser_width: int = 1280, browser_height: int = 1080)[source]
Bases:
object
The web browser for agent, which is implemented with playwright. This module allows agent to interact with web pages, such as visiting a web page, clicking on elements, typing text, scrolling web page, etc.
Note
1. This module is still under development, and changes will be made in the future. 2. In Playwright, because of its asynchronous operations, it is essential to use if __name__ == “__main__”: to designate the main entry point of the program. This practice ensures that asynchronous functions are executed correctly within the appropriate context.
- Install:
Execute the following code to install the required packages:
pip install playwright playwright install
- Details:
1. The actions that the agent can take in the web browser includes: “action_click”, “action_type”, “action_scroll_up”, “action_scroll_down”, “action_press_key”, and “action_visit_url”. 2. You can extract the html content, title, url, screenshot of the current web page by calling the corresponding properties, e.g. page_url, page_html, page_title, page_screenshot. 3. You can set or remove the interactive marks on the web page by calling the set_interactive_marks and remove_interactive_marks methods.
Examples
from agentscope.service import WebBrowser import time if __name__ == "__main__": browser = WebBrowser() # Visit the specific web page browser.action_visit_url("https://www.bing.com") # Set the interactive marks on the web page browser.set_interactive_marks() time.sleep(5) browser.close()
- action_click(element_id: int) ServiceResponse [source]
Click on the element with the given id.
- Parameters:
element_id (int) – The id of the element to click.
- Returns:
The response of the click action.
- Return type:
ServiceResponse
- action_press_key(key: str) ServiceResponse [source]
Press down a key in the current web page.
- Parameters:
key (str) – Chosen from F1 - F12, Digit0- Digit9, KeyA- KeyZ, Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, etc.
- action_scroll_down() ServiceResponse [source]
Scroll down the current web page.
- action_scroll_up() ServiceResponse [source]
Scroll up the current web page.
- action_type(element_id: int, text: str, submit: bool) ServiceResponse [source]
Type text into the element with the given id.
- Parameters:
element_id (int) – The id of the element to type text into.
text (str) – The text to type into the element.
submit (bool) – If press the “Enter” after typing text.
- Returns:
The response of the type action.
- Return type:
ServiceResponse
- action_visit_url(url: str) ServiceResponse [source]
Visit the given url.
- Parameters:
url (str) – The url to visit in browser.
- get_action_functions() dict[str, Callable] [source]
Return a dictionary of the action functions, where the key is the action name and the value is the corresponding function.
- set_interactive_marks() list[WebElementInfo] [source]
Mark the interactive elements on the current web page.
- property page_html: str
The html content of current page.
- property page_markdown: str
The content of current page in Markdown format.
- property page_screenshot: bytes
The screenshot of the current page.
- property page_title: str
The title of current page.
- property url: str
The url of current page.
- class WebElementInfo(*, html: str, tag_name: str, node_name: str, node_value: None | str, type: None | str, aria_label: None | str, is_clickable: str | bool, meta_data: list[str], inner_text: str, origin_x: float, origin_y: float, width: float, height: float)[source]
Bases:
BaseModel
The information of a web interactive element.
- aria_label: None | str
The aria label of the element.
- height: float
The height of the element.
- html: str
The html content of the element.
- inner_text: str
The text content of the element.
- is_clickable: str | bool
Whether the element is clickable. If clickable, the value is the link of the element, otherwise, the value is False.
- meta_data: list[str]
The meta data of the elements, e.g. attributes
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- node_name: str
The node name of the element.
- node_value: None | str
The node value of the element.
- origin_x: float
The x coordinate of the origin of the element.
- origin_y: float
The y coordinate of the origin of the element.
- tag_name: str
The tage name of the element.
- type: None | str
The type of the element.
- width: float
The width of the element.