agentscope.service.multi_modality.openai_services module
Wrap OpenAI API calls as services. Refer the official OpenAI API documentation for more details. https://platform.openai.com/docs/overview
- openai_audio_to_text(audio_file_url: str, api_key: str, language: str = 'en', temperature: float = 0.2) ServiceResponse [source]
Convert an audio file to text using OpenAI’s transcription service.
- Parameters:
audio_file_url (str) – The file path or URL to the audio file that needs to be transcribed.
api_key (str) – The API key for the OpenAI API.
language (str, defaults to “en”) – The language of the input audio. Supplying the input language in [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) format will improve accuracy and latency.
temperature (float, defaults to 0.2) – The temperature for the transcription, which affects the randomness of the output.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content contains a dictionary with key ‘transcription’ and value as the transcribed text.
- Return type:
ServiceResponse
Example
audio_file_url = "/path/to/audio.mp3" api_key = "YOUR_API_KEY" print(openai_audio_to_text(audio_file_url, api_key))
> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘transcription’: ‘This is the transcribed text from the audio file.’} > }
- openai_create_image_variation(image_url: str, api_key: str, n: int = 1, size: Literal['256x256', '512x512', '1024x1024'] = '256x256', save_dir: str | None = None) ServiceResponse [source]
Create variations of an image and return the image URL(s) or save them locally.
- Parameters:
image_url (str) – The file path or URL to the image from which variations will be generated.
api_key (str) – The API key for the OpenAI API.
n (int, defaults to 1) – The number of image variations to generate.
(`Literal["256x256" (size) –
"512x512"
"1024x1024"]`
` (defaults to)
"256x256"`) –
The size of the generated image variations.
save_dir (Optional[str], defaults to None) – The directory to save the generated image variations. If not specified, will return the web URLs.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the generated images or URLs.
- Return type:
ServiceResponse
Example
image_url = "/path/to/image.png" api_key = "YOUR_API_KEY" print(openai_create_image_variation(image_url, api_key))
> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘VARIATION_URL1’, ‘VARIATION_URL2’]} > }
- openai_edit_image(image_url: str, prompt: str, api_key: str, mask_url: str | None = None, n: int = 1, size: Literal['256x256', '512x512', '1024x1024'] = '256x256', save_dir: str | None = None) ServiceResponse [source]
Edit an image based on the provided mask and prompt, and return the edited image URL(s) or save them locally.
- Parameters:
image_url (str) – The file path or URL to the image that needs editing.
prompt (str) – The text prompt describing the edits to be made to the image.
api_key (str) – The API key for the OpenAI API.
mask_url (Optional[str], defaults to None) – The file path or URL to the mask image that specifies the regions to be edited.
n (int, defaults to 1) – The number of edited images to generate.
(`Literal["256x256" (size) –
"512x512"
"1024x1024"]`
to (defaults)
"256x256") – The size of the edited images.
save_dir (Optional[str], defaults to None) – The directory to save the edited images. If not specified, will return the web URLs.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the edited images or URLs.
- Return type:
ServiceResponse
Example
image_url = "/path/to/original_image.png" mask_url = "/path/to/mask_image.png" prompt = "Add a sun to the sky" api_key = "YOUR_API_KEY" print(openai_edit_image(image_url, prompt, api_key, mask_url))
> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘EDITED_IMAGE_URL1’, ‘EDITED_IMAGE_URL2’]} > }
- openai_image_to_text(image_urls: str | list[str], api_key: str, prompt: str = 'Describe the image', model: Literal['gpt-4o', 'gpt-4-turbo'] = 'gpt-4o') ServiceResponse [source]
Generate descriptive text for given image(s) using a specified model, and return the generated text.
- Parameters:
image_urls (Union[str, list[str]]) – The URL or list of URLs pointing to the images that need to be described.
api_key (str) – The API key for the OpenAI API.
prompt (str, defaults to “Describe the image”) – The prompt that instructs the model on how to describe the image(s).
model (Literal[“gpt-4o”, “gpt-4-turbo”], defaults to “gpt-4o”) – The model to use for generating the text descriptions.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content contains the generated text description(s).
- Return type:
ServiceResponse
Example
image_url = "https://example.com/image.jpg" api_key = "YOUR_API_KEY" print(openai_image_to_text(image_url, api_key))
> { > ‘status’: ‘SUCCESS’, > ‘content’: “A detailed description of the image…” > }
- openai_text_to_audio(text: str, api_key: str, save_dir: str = '', model: Literal['tts-1', 'tts-1-hd'] = 'tts-1', voice: Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'] = 'alloy', speed: float = 1.0, res_format: Literal['mp3', 'wav', 'opus', 'aac', 'flac', 'pcm'] = 'mp3') ServiceResponse [source]
Convert text to an audio file using a specified model and voice, and save the audio file locally.
- Parameters:
text (str) – The text to convert to audio.
api_key (str) – The API key for the OpenAI API.
save_dir (str defaults to ‘’) – The directory where the generated audio file will be saved.
model (Literal[“tts-1”, “tts-1-hd”], defaults to “tts-1”) – The model to use for text-to-speech conversion.
(`Literal["alloy" (voice) –
"echo"
"fable"
"onyx"
"nova"
"shimmer"]`
:param : :param defaults to “alloy”): The voice to use for the audio output. :param speed: The speed of the audio playback. A value of 1.0 is normal speed. :type speed: float, defaults to 1.0 :param res_format (Literal[“mp3”: :param “wav”: :param “opus”: :param “aac”: :param “flac”: :param : :param “wav”: :param “pcm”]: :param : :param defaults to “mp3”): The format of the audio file.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘audio_path’ and value is the path to the generated audio file.
- Return type:
ServiceResponse
Example
text = "Hello, welcome to the text-to-speech service!" api_key = "YOUR_API_KEY" save_dir = "./audio_files" print(openai_text_to_audio(text, api_key, save_dir))
> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘audio_path’: ‘./audio_files/Hello,_welco.mp3’} > }
- openai_text_to_image(prompt: str, api_key: str, n: int = 1, model: Literal['dall-e-2', 'dall-e-3'] = 'dall-e-2', size: Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] = '256x256', quality: Literal['standard', 'hd'] = 'standard', style: Literal['vivid', 'natural'] = 'vivid', save_dir: str | None = None) ServiceResponse [source]
Generate image(s) based on the given prompt, and return image URL(s) or save them locally.
- Parameters:
prompt (str) – The text prompt to generate images.
api_key (str) – The API key for the OpenAI API.
n (int, defaults to 1) – The number of images to generate.
model (Literal[“dall-e-2”, “dall-e-3”], defaults to “dall-e-2”) – The model to use for image generation.
(`Literal["256x256" (size) –
"512x512"
"1024x1024"
"1792x1024"
:param : :param “1024x1792”]`: The size of the generated image(s). :param defaults to “256x256”): The size of the generated image(s). :param quality: The quality of the generated images. :type quality: Literal[“standard”, “hdr”], defaults to “standard :param style: The style of the generated images. :type style: Literal[“vivid”, “natural”]], defaults to “vivid :param save_dir: The directory to save the generated images. If not specified, will
return the web URLs.
- Returns:
A dictionary with two variables: status and content. If status is ServiceExecStatus.SUCCESS, the content is a dict with key ‘image_urls’ and value is a list of the paths to the generated images or URLs.
- Return type:
ServiceResponse
Example
prompt = "A futuristic city skyline at sunset" print(openai_text_to_image(prompt, "{api_key}"))
> { > ‘status’: ‘SUCCESS’, > ‘content’: {‘image_urls’: [‘IMAGE_URL1’, ‘IMAGE_URL2’]} > }