I am working on extracting intents from audio conversations. One obvious approach is to convert the audio to text, and then analyzing it.But often, how a person says a certain thing matters more than their exact words. The problem is, with this approach we miss out on the tone in which the user is speaking. So I am looking for a way to analyse the audio rather than converting it into text. Is there any library or API for this purpose?
No responses yet.