Hi John, Happy 2025, hope you have an awesome year ahead :-) Thanks for reading, and glad this was useful to you. Twilio is not a SIP client, they provide a managed VOIP solution, so basically, you can buy telephone numbers from them and programmatically connect code to a real-time call. With the media streams, they fork the audio from the real-time call and stream it to your WebSocket server. The WebSocket server then receives the audio and you can transcribe or pass that on to a real-time model and then send the audio back to the client. You can achieve the same by setting up Asterisk, but that is a bit more complicated and a pain to manage. One problem with Twilio is that the audio is streamed at 8Khz and is base64 encoded so there is a conversion step that might slow down your call. Here's some documentation on the Twilio approach: https://www.twilio.com/docs/voice/tutorials/consume-real-time-media-stream-using-websockets-python-and-flask#create-a-socket-decorator A better approach would be WebRTC(I built a simple client here: https://github.com/kevincoder-co-za/zazu-voiceai) or just build a SIP WebSocket server and use a provider like 3cx. I did play with a few libraries, before going with Twilio (it was an MVP so shipping fast was essential). Maybe they'll be of use: https://github.com/sipsorcery-org/sipsorcery https://www.mizu-voip.com/Software/SIPSDK/JavaSIPSDK.aspx https://www.pjsip.org/ https://github.com/emiago/sipgo https://www.linphone.org/en/voip-unified-communications-software Mizu worked nicely, I just didn't have much time to fully implement the SDK and it's a commercial product so there are licensing costs. Hope this helps.