how to get the video frames webrtc then pass it to python
I just wanna ask, i was trying to build a video call application (like google meet and zoom) with object detection on it. with my limited knowledge in webrtc i dont even know if this is even possible. but for now i manage to create a simple peer to peer connection between two users.
my questions are:
- how do i send the frames from webrtc localstream to python server side so i can do object detection?
- Is it even possible to do object detection in a video call application in webrtc?
any help will be appreciated :)0
Hi @Sam ! Just a couple of questions: will this be running in a browser or in a mobile app? Where would you want the object detection to happen and how do you want to use the result? For example, if user A is sending video to user B, do you want user A and/or B to know about the objects detected or is the object detection something you just want to do in a server?
I know I'm not answering any question yet :-), but it's important to understand your use case first.
Hi @Sam ,
Thank you the question!
Short answer; TLDR - Absolutely! It is possible to run object detection (or any other video processing model inference) on a frame-by-frame basis in WebRTC. Infact if it makes you feel any better, any video background blur or video background replacement features you might have used are doing precisely this.
Conceptually, you want to setup some sort of an
intervalthat will check your
videoElementfor new frames being available for processing. There are 2 ways to do this:
requestAnimationFrame()(see https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame) . One can look for a new video frame 60 times every sec using
setTimeout()with explicit rates.
Quick side note here:
requestAnimationFrame()gets throttled when the browser tab is backgrounded. If that is not a concern, it is the best way to enable this.
One can look at the timestamp on the
videoElementand tell if there is a new video frame available for processing.
Here is a talk we gave on this topic a few months ago. It contains code snippets and the concepts.
Feel free to drop questions here if you are still stuck after watching the video.. Happy to help!
1.how do i send the frames from webrtc localstream to python server side so i can do object detection?
Once you detect a new video frame, you can pipe the new video frame to your model for running inference. Ideally, you will want to execute the model on the client side (not server side) if you want to have low latency.
2.Is it even possible to do object detection in a video call application in webrtc?
Yes, as detailed in the talk ^3