Home Artists Posts Import Register

Content

Updated TOX here - https://www.patreon.com/posts/102742032?pr=true

OpenAI speech-to-text (STT) and text-to-speech (TTS) API Integration for TouchDesigner

Uploaded 0.0.3 - Removes timer with error par.Length + a few menu cleanups. 

🎙️ Speech-to-Text - Uses the 'Whisper' API endpoint from OpenAI to transcribe audio speech into text. The operator also creates a segmented tableDAT with start and end points, similar to SRT.

🗣️ Text-to-Speech - Uses OpenAI's new text-to-speech API, to convert text into AI generated speech audio file, offering several voice options and allows users to control the speed of the speech output.

🔄 TD Project Integration:
- Extension Class: Most operator function can be triggered via custom python calls inside your network (middle mouse button over the operator to see function details).
- Custom Callbacks: Customizable callbacks for Generate/Done, separate for both TTS/STT calls.
- History Tables: Separate tables for history of TTS/STT calls. Including all transcription activities, TTS outputs, recording file paths including timestamps, file paths, and other generation details.
- Whisper (STT) Compatibility Checks: Popup verification for file types and sizes in accordance with the API's requirements.
- Transparent Logging: Detailed logs provide clarity and support troubleshooting and optimization.

Comments

Luciano Ferrarezi

im having a trouble with timer chop "td.tdAtributeError: 'td.ParCollection' object has no attribute 'lenght' ..." do you know what are happening ?

Lyell Hintz

What the heck. How in the world has it been this long since I posted this and no one has said anything. It seems like there is a rogue timer that is inside that operator that is not meant to be there. Will update the TOX file now with a version that doesn't have that !

Luciano Ferrarezi

Thank you very much, it looks like the error has been resolved! But I still can't get it to work, would I have to change a directory in a folder or would I just need to enter my API key? Im tryng to use speech to text, but when I press it, it just starts initializing. Thanks in advance !

Lyell Hintz

might be easier to trouble shoot via discord if you can send a screeenrecording of what is happening or a photo of the logs from the last out with nulldat. I am less familiar with blindly troubleshooting that operator

Luciano Ferrarezi

Sure! i can send you a record of what i did !