agentscope.service.multi_modality.dashscope_services module

Use DashScope API to generate images, convert text to audio, and convert images to text. Please refer to the official documentation for more details: https://dashscope.aliyun.com/

dashscope_image_to_text(image_urls: str | Sequence[str], api_key: str, prompt: str = 'Describe the image', model: str = 'qwen-vl-plus') ServiceResponse[source]

Generate text based on the given images.

Parameters:
  • image_urls (Union[str, Sequence[str]]) – The url of single or multiple images.

  • api_key (str) – The api key for the dashscope api.

  • prompt (str, defaults to ‘Describe the image’) – The text prompt.

  • model (str, defaults to ‘qwen-vl-plus’) – The model to use in DashScope MultiModal API.

Returns:

A dictionary with two variables: status and`content`. If status is ServiceExecStatus.SUCCESS, the content is the generated text.

Return type:

ServiceResponse

Example

image_url = "image.jpg"
prompt = "Describe the image"
print(image_to_text(image_url, prompt))

> {‘status’: ‘SUCCESS’, ‘content’: ‘A beautiful sunset in the mountains’}

dashscope_text_to_audio(text: str, api_key: str, save_dir: str, model: str = 'sambert-zhichu-v1', sample_rate: int = 48000) ServiceResponse[source]

Convert the given text to audio.

Parameters:
  • text (str) – The text to be converted into audio.

  • api_key (str) – The api key for the dashscope API.

  • save_dir (str) – The directory to save the generated audio.

  • model (str, defaults to ‘sambert-zhichu-v1’) – The model to use. Full model list can be found in https://help.aliyun.com/zh/dashscope/model-list

  • sample_rate (int, defaults to 48000) – Samplerate of the audio.

Returns:

A dictionary with two variables: status and`content`. If status is ServiceExecStatus.SUCCESS, the content contains a dictionary with key “audio_path” and value is the path to the generated audio.

Return type:

ServiceResponse

Example

text = "How is the weather today?"
print(text_to_audio(text)) gives:

> {‘status’: ‘SUCCESS’, ‘content’: {“audio_path”: “AUDIO_PATH”}}

dashscope_text_to_image(prompt: str, api_key: str, n: int = 1, size: Literal['1024*1024', '720*1280', '1280*720'] = '1024*1024', model: str = 'wanx-v1', save_dir: str | None = None) ServiceResponse[source]

Generate image(s) based on the given prompt, and return image url(s).

Parameters:
  • prompt (str) – The text prompt to generate image.

  • api_key (str) – The api key for the dashscope api.

  • n (int, defaults to 1) – The number of images to generate.

  • (`Literal["1024*1024" (size) –

  • "720*1280"

  • "1280*720"]`

  • to (defaults)

  • "1024*1024") – Size of the image.

  • model (str, defaults to ‘“wanx-v1”’) – The model to use.

  • save_dir (Optional[str], defaults to ‘None’) – The directory to save the generated images. If not specified, will return the web urls.

Returns:

A dictionary with two variables: status and`content`. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘fig_paths” and value is a list of the paths to the generated images.

Return type:

ServiceResponse

Example

prompt = "A beautiful sunset in the mountains"
print(dashscope_text_to_image(prompt, "{api_key}"))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘IMAGE_URL1’, ‘IMAGE_URL2’]} > }