Question about Telephony with Conversational AI, parallel agent handling

suvid · March 22

Hi Daily team! I saw some great videos and code at DailyAI SDK on how to build conversational AI. I have a small client business that automates their support using Twilio and we have built a websocket for them to communicate. But Twilio's latency isn't meeting their requirements.

How can I connect Twilio/Vonage to Daily.co? The docs say that SIP can only handle 2 participants simultaneously, and PSTN requires Dial In. My client would be required to handle calls that can speak to the AI agent in parallel. Is this possible with Daily? On a different WebRTC website (LiveKit), I saw that they spin up workers in the same room, orchestrate parallel calls, and associate a room via SIP.

The requirement is audio only. My client just wants 3-4 phone numbers that he can associate with 3-4 different support agents; Any help would be super appreciated!

vr000m · March 25

We have a new feature being developed that will be equivalent to handling this in TwiML, we are calling this PIN-less dialin, for use-cases where the room or PIN is not known ahead of time. With this feature, a user will dial-in to a phone number, which will trigger a webhook or auto-create a DAILY ROOM where a user or aily-python can join.

Currently, if you are already using Twilio phone-numbers, the best way to handle this is along the lines that you suggested, with a minor change to the workflow. When a call comes into to Twilio, and the TwiML/webhook handles the call, you can create a new DAILY ROOM for each incoming call, enable SIP on that DAILY ROOM, the configured SIP URI should be provided to Twilio in the TwiML, so that Twilio can forward that call to Dailyu. In paralllel, after the NEW DAILY ROOM is created, you can assign one or more daily-pythonbots. In summary, Twilio will fork a new Daily room for each incoming user that dials into the phone number, and within that DAILY ROOM you should be able to assign as many bots you need.

Once we have the PIN-less workflow in Q2, it would make the call handling easier, as we can automate the room creation process.

vr000m · March 22

@suvid the recommended way is to spin up a room with PSTN dialin on Daily. This means that each Daily room will have a phone number and an associated PIN that is unique for that room. Multiple users can dialin to that room using the phone and pin.

In the above use-case, when the a person dials in, I dont think you need multiple SIP workers (or I am missing something), i.e., the two AI agents (using daily-ai) or the human agent can join the Daily Room directly over webrtc.

Thinking more about the multi-agent use-case: Are the agents working on the same conversation or disparate conversations, i.e., users are dialing-in and you have a queue that associates each user to a unique agent? I am happy to get on a call if a synchronous call helps iron out the architecture.

suvid · March 23

Thanks for the reply @vr000m

I have a use case similar to other companies you might have seen in the space (Vapi.ai — I believe, does build on daily , RetellAI), where we have conversational AI that acts as an agent for customer support in the travel industry. So right now, we have some customers who have 3-4 customer support agents, and the way we have the workflow right now is:

Twilio streams via webhook. —> Buy a Twilio number, set up a webhook to an endpoint, and communicate via WebSocket. This scales easily for inbound calling, so easy handing parallel callers for same agent.

What I understand with daily is that there are two numbers available and a Dial-in code and, so if users were to port Twilio phone numbers, we would need to call the PSTN phone then enter the dial-in code, which, if it cannot be done programmatically, defeats the purpose.

In the WebRTC space, We're looking for something similar to this: https://docs.livekit.io/agents/
Where agents are essentially treated as workers, multiple workers can join a room to handle parallel calls via dial-in (aka user uses telephony to call the inbound-call number connected to the room via SIP).

I am not sure if this clarifies the problem, but I don't think PSTN sounds like the way to go 🤷‍♂️

suvid · March 23

Maybe I'm getting confused with the terminology of rooms and might not need SIP/PSTN at all. My requirement is pretty simple: transport audio from a phone call (Twilio) to my server and do that over webrtc. So I'm assuming I can have an endpoint on the server that handles receiving and sending audio built using Daily, and the Twilio Webhook just points to that server. So I'm just using the DailyTransport service for the audio without thinking about multiple "users" being in the same room. Is my understanding correct?

suvid · March 31

Thanks, @vr000m that makes a lot of sense.

A couple of Qs: The approach you mentioned is what I'm doing — the phone comes in, create a new room, enable SIP, and then use Twilio to dial in. This is causing echo on speaker phones, and I am curious if this would be on Twilio's side or Daily's and if enabling Krisp would help solve this. (Kind of a deal breaker, on speaker, the echo causes reverberations, and the bot thinks the user said something and throws the conversation out of flow) — I tried Twilio/Vonage with the same echo issue; I spun up and connected SIP (Twilio) to my webRTC and didn't have this issue.

2. I love the approach for simplifying this. A suggestion - I thought something like this would be great —

Have a SIP trunk that is applied to each daily.co domain. We can use that with a PSTN (Twilio/Telnyx) and create a dispatch rule to create rooms automatically; if a SIP participant calls, they join a room, and if the last participant leaves, we destroy the room.

Instead of having PSTN numbers tied to rooms and disabling dial-in pins, we have a user call a phone number connected to daily via SIP trunking, and we figure out rooms. This would be better, at least for contact center, and customer service type of use-cases, since customers would be able to have a phone number that others can call and be connected to an agent :)

vr000m · April 1

@suvid is the echo happening on daily prebuilt or the daily-python? it depends on what devices were used as capturer in the getUserMedia API call (inside `startCamera` echo-cancellation should automatically be applied) — if you have a demo page, could you DM me?

On 2. Yes, the plan for the PIN-less case is to offer a seamless way of handling incoming requests on SIP. it is going to be either spin up a room for you or allow the developer to spin up the room. However, the determining factor is timing and what the originating SIP service needs to keep alive that SIP dialog, while waiting for the third-party or Daily. We will keep you posted on this as soon as we have this.

suvid · April 1

@vr000m Thanks for replying! I'm running it locally — just a modified version of the patient-intake Demo at DailiAI SDK: https://github.com/daily-co/daily-ai-sdk/blob/main/examples/starter-apps/patient-intake.py

The code is just an API called like this: https://codeshare.io/ApKk38

I'm not using daily prebuilt. It's just a Twilio webhook that hits my server, triggers a new room creation, uses Twilio SIP to dial into that room, and the conversation happens.

Everything works as expected, except, as I mentioned, when you put it on speakerphone, the bot's echo makes it seem like a human user is speaking, and then the conversation continues by itself.

I'm only narrowing it down to daily room since I did spin up an SIP dial-in with Twilio/Telnyx on two other webrtc hosted services and didn't experience the echo issue

suvid · April 2

I seem to have found the code in dailyTransport but that still intermittently works: https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/daily_transport.py

I set noiseCancellation, echoCancellation and autoGainControl to True and Camera to false

vr000m · April 2

Is there a public URL that or phone number that I can call to test your demo?

goodwords · April 24

hi @vr000m I'm having a similar issue. I have a very similar setup but I'm finding that the conversation is going out of flow.

In my project, my client is react-native and I'm using your react-native SDK:

Here is my configuration attempt:
Client:
this.callObject = Daily.createCallObject({
dailyConfig: {userMediaAudioConstraints: {
echoCancellation: { exact: true },
noiseSuppression: { exact: true },

autoGainControl: {exact: true },

}}})

And I've hooked it up with the server here: https://github.com/daily-demos/llm-talk Which has virtual speaker and microphones, and uses the TTS audio frames generated to capture the reply.

It seems like this echo cancellation is make or break for me, because when I'm on speaker phone and using my react-native app (iOS), I am able to get about 1 sentence from the server before it automatically starts having a conversation with itself because the audio it generates on my phone speaker is considered a reply automatically for it.

Question about Telephony with Conversational AI, parallel agent handling

Best Answer

Answers

Categories