Lyell Hintz

OpenAIAudio TOX [0.0.3 updated] (Patreon)

Published:

2023-11-14 20:48:30

Edited:

2024-04-21 02:17:29

Imported:

Content

Updated TOX here - https://www.patreon.com/posts/102742032?pr=true

OpenAI speech-to-text (STT) and text-to-speech (TTS) API Integration for TouchDesigner

Uploaded 0.0.3 - Removes timer with error par.Length + a few menu cleanups.

🎙️ Speech-to-Text - Uses the 'Whisper' API endpoint from OpenAI to transcribe audio speech into text. The operator also creates a segmented tableDAT with start and end points, similar to SRT.

🗣️ Text-to-Speech - Uses OpenAI's new text-to-speech API, to convert text into AI generated speech audio file, offering several voice options and allows users to control the speed of the speech output.

🔄 TD Project Integration:
- Extension Class: Most operator function can be triggered via custom python calls inside your network (middle mouse button over the operator to see function details).
- Custom Callbacks: Customizable callbacks for Generate/Done, separate for both TTS/STT calls.
- History Tables: Separate tables for history of TTS/STT calls. Including all transcription activities, TTS outputs, recording file paths including timestamps, file paths, and other generation details.
- Whisper (STT) Compatibility Checks: Popup verification for file types and sizes in accordance with the API's requirements.
- Transparent Logging: Detailed logs provide clarity and support troubleshooting and optimization.