how to get the video frames webrtc then pass it to python

Sam
Sam Member

Hello everyone,

I just wanna ask, i was trying to build a video call application (like google meet and zoom) with object detection on it. with my limited knowledge in webrtc i dont even know if this is even possible. but for now i manage to create a simple peer to peer connection between two users.

my questions are:

  1. how do i send the frames from webrtc localstream to python server side so i can do object detection?
  2. Is it even possible to do object detection in a video call application in webrtc?
Tagged:

Answers

  • Sam
    Sam Member

    any help will be appreciated :)

  • Hi @Sam ! Just a couple of questions: will this be running in a browser or in a mobile app? Where would you want the object detection to happen and how do you want to use the result? For example, if user A is sending video to user B, do you want user A and/or B to know about the objects detected or is the object detection something you just want to do in a server?

    I know I'm not answering any question yet :-), but it's important to understand your use case first.

    Thank you!

  • RaviAtDaily
    RaviAtDaily Dailynista
    edited October 2022

    Hi @Sam ,

    Thank you the question!

    Short answer; TLDR - Absolutely! It is possible to run object detection (or any other video processing model inference) on a frame-by-frame basis in WebRTC. Infact if it makes you feel any better, any video background blur or video background replacement features you might have used are doing precisely this.

    Longer answer:

    Conceptually, you want to setup some sort of an interval that will check your videoElement for new frames being available for processing. There are 2 ways to do this:

    1. Using requestAnimationFrame() (see https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame) . One can look for a new video frame 60 times every sec using requestAnimationFrame()
    2. Using setInterval() or setTimeout() with explicit rates.

    Quick side note here: requestAnimationFrame() gets throttled when the browser tab is backgrounded. If that is not a concern, it is the best way to enable this.

    One can look at the timestamp on the videoElement and tell if there is a new video frame available for processing.

    Here is a talk we gave on this topic a few months ago. It contains code snippets and the concepts.

    https://www.youtube.com/watch?v=R45TRS7p_ko

    Feel free to drop questions here if you are still stuck after watching the video.. Happy to help!

    -------

    1.how do i send the frames from webrtc localstream to python server side so i can do object detection?

    Once you detect a new video frame, you can pipe the new video frame to your model for running inference. Ideally, you will want to execute the model on the client side (not server side) if you want to have low latency.

    2.Is it even possible to do object detection in a video call application in webrtc?

    Yes, as detailed in the talk ^

  • aconchillo
    aconchillo Dailynista

    Hi @Sam . This demo provides an example on how to do object detection on the server side and send the object detection back to the meeting:

    https://github.com/daily-co/daily-python/blob/main/demos/yolo/yolo.py