How to add noise-cancellation/echo-cancellation?

suvid · March 30

HI! I'm building a conversational AI bot but connecting it to Telephony using Twilio/Vonage.

Twilio Webhook hits my server, where we create a SIP room and use the SIP URI to dial in.

But somewhere along the way, echo/noise is introduced ( I can tell by the transcripts being updated to show user messages eerily similar to assistant messages)

I'm not sure if this is a CODEC problem. My understanding is Daily uses OPUS, but I don't think there's a way to set it in Twilio. So my best guess is that I need to get noise suppression/echo cancellation in the Daily rooms (but I'm not sure if it's from the Twilio Side).

So, in short, I'm wondering if anyone else has faced this problem: Would creating echo-cancellation/noise-cancellation in Daily rooms work?

For some reason, it's not a problem when I talk in the daily room on a laptop on the microphone or through a Twilio phone when held to the ear, and it only happens when on speaker-phone from Twilio.

vr000m · April 2

Based on your description, it seems that the phone mic is picking up output from the phone speaker. Is it possible to test using headphones on the phone?

suvid · April 2

@vr000m It works without using headphones; issues only come from using the phone speaker.

I turn on the speaker, make the call, and after it connects, I play a TTS message; this message reverberates and becomes the user's message. This behavior does not happen using headphones, phone-on-the-ear, or daily room directly where, after the message, we wait for the user to speak.

I did find the code here (it has something regarding noise/echo cancellation)— https://github.com/daily-co/daily-ai-sdk/blob/main/src/dailyai/transports/daily_transport.py

@aconchillo could you please help here if possible (saw your code in daily-ai sdk)

suvid · April 2

It seems like there could be a solution (not sure how to implement it), but in echo situations, the reverberated message gets added back to the transcript as an assistant message. But looking through the logs, the ASR (Deepgram) isn't the one printing the transcript, so need to create a solution where

if Deepgram transcript AND VAD_interrupt: —> Make OpenAI call.

If !Deepgram_transcript AND VAD_interrupt —> Do not make an OpenAI call; wait.

Here are the logs in the Pastebin (Interruption starts right after the intro_message without the user speaking; it seems like the echo words get added back to the agent, and the cycle continues.)
https://pastebin.com/RZ2pBy23

The code for the Agent I'm using: https://pastebin.com/E39gr6CU

This is just a band-aid fix. Ideally, we should deal with the Echo better, but my knowledge here is limited. (Could it be Twilio, phone microphone, sample_rates, or a codec?)

@chad Since I already mentioned you in the other post for the same issue 😅

How to add noise-cancellation/echo-cancellation?

Answers

Categories