Update from the W3C TPAC 2022, Part 2
When there is a v1 API, there are always discussions of a
next version (NV) API. WebRTC was born in 2010, the world has changed since, especially in the way we build webapps.
Boldly, in 2022, we want Multi-threaded applications that can deliver both low-latency and large scale. We'd need to do this through low-level access to building blocks across the complete media pipeline, namely, capture, A/V processing (think machine learning), encode/decode, transport, and rendering.
This would require deconstructing the v1 media pipeline, which is an ongoing discussion. For historical context, the original
webrtc-pc API was very opaque, more opaque than what we use today and did not have several objects that we use and love today. Namely, the
RtpSender/RtpReceiver objects were introduced in 2016, 5 years after the initial work, and these objects gave us some level of low-level control. We can now attach tracks via code, as opposed to describe it via the Session Description Protocol (SDP). A good mental model of the pipeline's evolution is described by Jan-Ivar in the attached screenshot.
Now vs Past
In 2022, we have a lot more building blocks compared to 2011 or even 2016. For example (a list of newer things):
- In addition to vanilla RTP, we now have new transport protocols, QUIC, HTTP/3, WebTransport, Media over QUIC (MoQ), DASH.
- We have new codecs from ITU-T and AoMedia.
- At the W3C/WHATWG we have new paradigms defined WHATWG Streams somewhat like a flow-based programming
- We also have new media formats via WebCodecs, Media Source Extensions (EME) v2, and Encrypted Media Extensions (EME) for DRM.
- We have the evolution to
webrtcvia RTCPeerConnection, RTCDataChannel, media-capture, screen-capture, mediacapture-transform, encoded-transform.
- We have newer rendering via Canvas, WebGL, WebGPU, etc.
- Lastly, we now have WASM.
Picking a few topics from the meeting that I think were interesting
Pluggable codecs -- there are several new codecs in the market: Video has Google's VP9 and AO Media's AV1 codecs, these update the VP8 codec, HEVC/H.265 and VVC/H.266 updates the H.264. Google and Microsoft have been working on lyra and satin audio codecs, which would essentially replace Opus in some low bitrate (5-25 kbps) scenarios.
Opus, H.264, and VP8 are mandatory to implement, but to get these newer codecs, we'd have to get all the browser vendors to implement them and that is a tad difficult. Thus, none of these codecs are currently available unless you're building a native application (where you can add any codec you want, it would require both users to be on the mobile app for the codec to be selected). Thus, there is a strong desire to make these newer codecs available via
wasm and that these
wasm libraries extend
webrtc could then use.
The primary work here is to make
webrtc work together. For example, RTCEncodedAudioFrame and EncodedAudioChunk and RTCEncodedVideoFrame and EncodedVideoChunk. In addition, data in
webrtc is mutable while
webcodecs it is not.
RtpTransport -- Another aspect to consider is does the
webcodec provide an encoded stream? and does the application need to encode the
webcodec output into RTP or should the
webcodec spew out an RTP packet instead. If the
webcodec does not encode the RTP payload format, then the browser would need to do that, unfortunately, we do not have a good way of encoding newer codecs unless these new codecs are backwards compatible with an existing payload format that the browsers have already implemented. Ideally, we would have an RtpTransport, so the
webcodec could spew out the encoded bytestream and the RtpTransport could encode it and push it into the RtpSender that would send it on the network/peerconnection.
To summarise, there are two main focus areas for next version of
- new capabilities, especially, with the advent of client-side machine learning, where developers envision building moderation features using gaze detection, face/object detection. The spec webrtc-nv-usecases covers a slew of ideas: internet of things (iot), VR/AR, speech and video related machine learning, performance improvements by moving to workers, etc.
- deconstruction of the webrtc-pc API, i.e., give developers more control over the the media pipeline, for example, how to introduce new codecs? (VP8 and H.264 were mandated, how does one get new codecs incorporated)
Happy to hear your thoughts on this evolution.