Hi John, Happy 2025, hope you have an awesome year ahead :-)
Thanks for reading, and glad this was useful to you.
Twilio is not a SIP client, they provide a managed VOIP solution, so basically, you can buy telephone numbers from them and programmatically connect code to a real-time call.
With the media streams, they fork the audio from the real-time call and stream it to your WebSocket server.
The WebSocket server then receives the audio and you can transcribe or pass that on to a real-time model and then send the audio back to the client.
You can achieve the same by setting up Asterisk, but that is a bit more complicated and a pain to manage.
One problem with Twilio is that the audio is streamed at 8Khz and is base64 encoded so there is a conversion step that might slow down your call. Here's some documentation on the Twilio approach:
twilio.com/docs/voice/tutorials/consume-real-timeβ¦
A better approach would be WebRTC(I built a simple client here: github.com/kevincoder-co-za/zazu-voiceai) or just build a SIP WebSocket server and use a provider like 3cx.
I did play with a few libraries, before going with Twilio (it was an MVP so shipping fast was essential). Maybe they'll be of use:
github.com/sipsorcery-org/sipsorcery
mizu-voip.com/Software/SIPSDK/JavaSIPSDK.aspx
pjsip.org
github.com/emiago/sipgo
linphone.org/en/voip-unified-communications-softwβ¦
Mizu worked nicely, I just didn't have much time to fully implement the SDK and it's a commercial product so there are licensing costs.
Hope this helps.
Amazing article ! Thanks a lot for the insights.
I am currently contemplating building an in-house client for SIP virtual phone numbers.
We are building an AI conversational agent on top open AI realtime audio and after diving into how the VOIP works I realized it was not difficult to cut out Twilio. Since it would represent most of our costs in the future it makes sense to remove it.
My issue is that I am not sure if the client alone would be a good enough replacement. In your article you mention audio quality, background noise, silence being handled properly by Twilio.
I can't find relevant information online on the behind the scene added value of using Twilio as an SIP client.
The worst would be to build the client ourselves (which is not difficult) and underestimate what we really goes on in the backend at Twilio.
Do you have any insights or resources on the matter ?
Your help would be greatly appreciated!
Happy new year