agentscope.service.multi_modality.openai_services module

Wrap OpenAI API calls as services. Refer the official OpenAI API documentation for more details. https://platform.openai.com/docs/overview

openai_audio_to_text(audio_file_url: str, api_key: str, language: str = 'en', temperature: float = 0.2) ServiceResponse[source]

Convert an audio file to text using OpenAI’s transcription service.

Parameters:
  • audio_file_url (str) – The file path or URL to the audio file that needs to be transcribed.

  • api_key (str) – The API key for the OpenAI API.

  • language (str, defaults to “en”) – The language of the input audio. Supplying the input language in [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) format will improve accuracy and latency.

  • temperature (float, defaults to 0.2) – The temperature for the transcription, which affects the randomness of the output.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content contains a dictionary with key ‘transcription’ and value as the transcribed text.

Return type:

ServiceResponse

Example

audio_file_url = "/path/to/audio.mp3"
api_key = "YOUR_API_KEY"
print(openai_audio_to_text(audio_file_url, api_key))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘transcription’: ‘This is the transcribed text from the audio file.’} > }

openai_create_image_variation(image_url: str, api_key: str, n: int = 1, size: Literal['256x256', '512x512', '1024x1024'] = '256x256', save_dir: str | None = None) ServiceResponse[source]

Create variations of an image and return the image URL(s) or save them locally.

Parameters:
  • image_url (str) – The file path or URL to the image from which variations will be generated.

  • api_key (str) – The API key for the OpenAI API.

  • n (int, defaults to 1) – The number of image variations to generate.

  • (`Literal["256x256" (size) –

  • "512x512"

  • "1024x1024"]`

  • ` (defaults to)

  • "256x256"`)

    The size of the generated image variations.

  • save_dir (Optional[str], defaults to None) – The directory to save the generated image variations. If not specified, will return the web URLs.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the generated images or URLs.

Return type:

ServiceResponse

Example

image_url = "/path/to/image.png"
api_key = "YOUR_API_KEY"
print(openai_create_image_variation(image_url, api_key))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘VARIATION_URL1’, ‘VARIATION_URL2’]} > }

openai_edit_image(image_url: str, prompt: str, api_key: str, mask_url: str | None = None, n: int = 1, size: Literal['256x256', '512x512', '1024x1024'] = '256x256', save_dir: str | None = None) ServiceResponse[source]

Edit an image based on the provided mask and prompt, and return the edited image URL(s) or save them locally.

Parameters:
  • image_url (str) – The file path or URL to the image that needs editing.

  • prompt (str) – The text prompt describing the edits to be made to the image.

  • api_key (str) – The API key for the OpenAI API.

  • mask_url (Optional[str], defaults to None) – The file path or URL to the mask image that specifies the regions to be edited.

  • n (int, defaults to 1) – The number of edited images to generate.

  • (`Literal["256x256" (size) –

  • "512x512"

  • "1024x1024"]`

  • to (defaults)

  • "256x256") – The size of the edited images.

  • save_dir (Optional[str], defaults to None) – The directory to save the edited images. If not specified, will return the web URLs.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the edited images or URLs.

Return type:

ServiceResponse

Example

image_url = "/path/to/original_image.png"
mask_url = "/path/to/mask_image.png"
prompt = "Add a sun to the sky"
api_key = "YOUR_API_KEY"
print(openai_edit_image(image_url, prompt, api_key, mask_url))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘EDITED_IMAGE_URL1’, ‘EDITED_IMAGE_URL2’]} > }

openai_image_to_text(image_urls: str | list[str], api_key: str, prompt: str = 'Describe the image', model: Literal['gpt-4o', 'gpt-4-turbo'] = 'gpt-4o') ServiceResponse[source]

Generate descriptive text for given image(s) using a specified model, and return the generated text.

Parameters:
  • image_urls (Union[str, list[str]]) – The URL or list of URLs pointing to the images that need to be described.

  • api_key (str) – The API key for the OpenAI API.

  • prompt (str, defaults to “Describe the image”) – The prompt that instructs the model on how to describe the image(s).

  • model (Literal[“gpt-4o”, “gpt-4-turbo”], defaults to “gpt-4o”) – The model to use for generating the text descriptions.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content contains the generated text description(s).

Return type:

ServiceResponse

Example

image_url = "https://example.com/image.jpg"
api_key = "YOUR_API_KEY"
print(openai_image_to_text(image_url, api_key))

> { > ‘status’: ‘SUCCESS’, > ‘content’: “A detailed description of the image…” > }

openai_text_to_audio(text: str, api_key: str, save_dir: str = '', model: Literal['tts-1', 'tts-1-hd'] = 'tts-1', voice: Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'] = 'alloy', speed: float = 1.0, res_format: Literal['mp3', 'wav', 'opus', 'aac', 'flac', 'pcm'] = 'mp3') ServiceResponse[source]

Convert text to an audio file using a specified model and voice, and save the audio file locally.

Parameters:
  • text (str) – The text to convert to audio.

  • api_key (str) – The API key for the OpenAI API.

  • save_dir (str defaults to ‘’) – The directory where the generated audio file will be saved.

  • model (Literal[“tts-1”, “tts-1-hd”], defaults to “tts-1”) – The model to use for text-to-speech conversion.

  • (`Literal["alloy" (voice) –

  • "echo"

  • "fable"

  • "onyx"

  • "nova"

  • "shimmer"]`

:param : :param defaults to “alloy”): The voice to use for the audio output. :param speed: The speed of the audio playback. A value of 1.0 is normal speed. :type speed: float, defaults to 1.0 :param res_format (Literal[“mp3”: :param “wav”: :param “opus”: :param “aac”: :param “flac”: :param : :param “wav”: :param “pcm”]: :param : :param defaults to “mp3”): The format of the audio file.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘audio_path’ and value is the path to the generated audio file.

Return type:

ServiceResponse

Example

text = "Hello, welcome to the text-to-speech service!"
api_key = "YOUR_API_KEY"
save_dir = "./audio_files"
print(openai_text_to_audio(text, api_key, save_dir))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘audio_path’: ‘./audio_files/Hello,_welco.mp3’} > }

openai_text_to_image(prompt: str, api_key: str, n: int = 1, model: Literal['dall-e-2', 'dall-e-3'] = 'dall-e-2', size: Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] = '256x256', quality: Literal['standard', 'hd'] = 'standard', style: Literal['vivid', 'natural'] = 'vivid', save_dir: str | None = None) ServiceResponse[source]

Generate image(s) based on the given prompt, and return image URL(s) or save them locally.

Parameters:
  • prompt (str) – The text prompt to generate images.

  • api_key (str) – The API key for the OpenAI API.

  • n (int, defaults to 1) – The number of images to generate.

  • model (Literal[“dall-e-2”, “dall-e-3”], defaults to “dall-e-2”) – The model to use for image generation.

  • (`Literal["256x256" (size) –

  • "512x512"

  • "1024x1024"

  • "1792x1024"

:param : :param “1024x1792”]`: The size of the generated image(s). :param defaults to “256x256”): The size of the generated image(s). :param quality: The quality of the generated images. :type quality: Literal[“standard”, “hdr”], defaults to “standard :param style: The style of the generated images. :type style: Literal[“vivid”, “natural”]], defaults to “vivid :param save_dir: The directory to save the generated images. If not specified, will

return the web URLs.

Returns:

A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the generated images or URLs.

Return type:

ServiceResponse

Example

prompt = "A futuristic city skyline at sunset"
print(openai_text_to_image(prompt, "{api_key}"))

> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘IMAGE_URL1’, ‘IMAGE_URL2’]} > }