Voice Agent
The VoiceBot open-source software (OSS) module is a versatile tool that utilizes OpenAI’s powerful APIs to perform text-to-speech conversion, audio transcription, and text summarization into structured notes. Here’s how you can harness each of these functionalities in your OSS applications:
Features
- Text-to-Speech: Convert texts into spoken words using various voices provided by OpenAI TTS.
- Transcribe: Transcribe spoken words into text using Whisper API.
- Text-to-Notes: Turn conversations into organized bullet points, ensuring no detail is omitted.
VoiceBot Feature Comparison
VoiceBot offers two distinct editions with different sets of features to cater to a range of users, from those who need basic functionalities to organizations requiring advanced capabilities.
Below is a comparative table highlighting the differences between the Open Source Edition (OSE) and the Enterprise Edition (EE).
Feature | Open Source Edition (OSE) | Enterprise Edition (EE) |
---|---|---|
Text-to-Speech (TTS) | Online API | Online API with Streaming for longer texts, Bark offline TTS, ElevenLabs online TTS |
Transcription (Speech to Text) | Online Whisper API only | Online Whisper API, Offline Whisper Models, Offline Distil-Whisper Models, AssemblyAI, Speaker Diarization |
Text-to-Notes | Online API | Online API |
You can also find a google collab notebook here.
Usage
lets start by making an object of the module first
Text-to-Speech
Transcription (Speech to Text)
Text-to-Notes
These functions make it simple to integrate advanced linguistic and speech capabilities into your applications, allowing you to create new user experiences or enhance existing workflows. Use the VoiceBot module to effectively manage content generation, comprehension, and accessibility tasks.
Limitations
- The open-source version requires an internet connection to utilize the online API.
- It offers a smaller subset of features compared to enterprise versions, focusing primarily on cloud-based services.