Hackathon Project: Adding hand gesture detection to Daily Prebuilt

Options
nienke
nienke Moderator, Dailynista admin
edited October 2023 in Code Share

It can be difficult to get back into the swing of things after the holiday season, so at Daily, we decided to spend the first week of the year hacking around on projects both personal and Daily-related ⚒️

On Monday morning, the start of our hackathon week, I hadn’t yet decided on a project. I knew I wanted to do something fun and a little outside of my comfort zone. One idea I toyed around with was automatic hand raising. We added hand raising to Prebuilt not too long ago, and wouldn’t it be neat if an app could respond to me raising my hand IRL?

Whilst letting this and other ideas (automatic cat-in-video detection!) marinate, I turned to the website of one of my favorite creative technologists, Charlie Gerard, for inspiration. And lo and behold, she wrote a blog post on controlling Figma with hand gestures. This was the sign I was looking for: I would build automatic hand raising in Daily Prebuilt. Here's the result:

Setting things up

The first thing to do was to get “the computer” to recognize my hand on screen. For this, I used TensorFlow.js, a JavaScript library by Google that lets you add machine learning functions to any web app. TFJS comes with pre-trained models. They’re hosted on npm, so they’re pretty easy to add to a web app. For hand detection, I needed Handpose. This is a light-weight model capable of detecting one hand. You feed it an image or a video, and it’ll tell you whether it thinks there’s a hand, and if so, it’ll return key points within the hand, outlining the location of each finger joint and the palm. I wasn’t too bothered by it not being capable of tracking more than one hand: a hand raise is usually done with one hand anyway. Also, since its implementation would require real-time monitoring in the form of a video track, it being lightweight was the most important thing.

Handpose is capable of predicting the skeleton of a hand, but it can also derive gestures. To make it easier to work with gestures, I used Fingerpose on top of Handpose. To get the model to recognize gestures, you need to describe them in a way that a computer will understand, and Fingerpose makes that really straightforward!

I added the libraries to the Prebuilt codebase, and created a new React component called FingerPose. This component is passed a reference to the local user’s `HTMLVideoElement`. Passing that video element to the Handpose model, it was clear that Handpose was capable of recognizing and drawing my hand on a <canvas> overlay:

Detecting a gesture

Recognising the fact that there’s a hand on screen isn’t enough. I need the model to tell me what gesture it thinks I’m making. Enter Fingerpose. I described several sets of gestures: 👌, ✋, 👍, 👎, 🖖, ✌️, 🤟, 🤘, and 🫶. Then I feed Handpose’s information into Fingerpose's GestureEstimator which will try to match Handpose’s data points with my pre-defined gestures. To give the user (i.e. me) some feedback that the model and Fingerpose detected a gesture, I decided to send out an emoji reaction with the same gesture on detection. This worked too:


So now I could make Prebuilt aware of several things:

  • The local user is showing a hand
  • This hand is in the form of a specific gesture
  • When gesture X is detected, do Y

Given this, I could make a UI with some debug information for the user. And most importantly: when the gesture “Hand raise” is detected, call `raiseHand()` and raise the user’s virtual hand in Prebuilt for everyone to see! ✋

Real-world applications

I started this on a lark, but when you think about it, this could form the basis of an app that helps you learn sign language! Other use cases I could think of:

  • Allow non-verbal users to participate in calls in different ways than just text chat
  • Allow users to create their own gesture command flows: e.g., if I make a peace sign, unmute myself

DIY 

Prebuilt is a Daily product, but under the hood it doesn’t use any special or secret Daily JS APIs: it’s as much a consumer of Daily as any other application would be. If you’re interested in adding gesture detection to your own Daily app, all you need is a video track. I’ve made a demo Daily React app that includes hand gesture recognition. You can find the code here on GitHub, open it directly in CodeSandbox, or see it in action here: all you’ll need is a Daily room URL ✨


Comments