Text to Audio#

Text to Audio Processor is a processor that converts text input into audio output. It plays a crucial role in transforming written text into spoken words, enabling applications to generate audio content dynamically.

Supported Input Port:

text: The Text to Audio Processor accepts input through the “text” port. The input should be a string representing the text that needs to be converted to audio.

Supported Output Port:

filepath: The processor produces output through the “filepath” port. The output is the file path of the generated audio file.

List of Implementations:#

Azure Implementation#

The Azure implementation of the Text to Audio Processor utilizes Azure’s text-to-speech capabilities to convert text to audio.






string (optional)

The name of the voice to be used for the audio output. The default value is “en-US-AriaNeural”.


string (optional)

The Azure region where the text-to-speech service is located. The default value is “eastus”.


string (required)

The API key for accessing Azure’s text-to-speech service. This key is user-provided.


string (optional)

The name of the output audio file. The default value is “output.mp3”.


string (optional)

The folder where the generated audio file will be stored. The default value is “assets”.

Sample processor configuration:#

NOTE: Processor is always added to a module(Input or Output). The module is then added to the pipeline.

    "processor_type": "text_to_audio",
    "processor_implementation_type": "text_to_audio_with_azure",
    "input_port": "text",
    "output_port": "filepath",
    "metadata": {},