What is SDP?
SDP is the acronym for Session Description Protocol. I guess the question now is "what is Session Description Protocol?".
WebRTC is a way to send video/audio between peers. Initially, those peers have no idea how they are going to exchange those video/audio streams or what kind of video/audio the other peer is going to send and what encoding is it going to use or if the audio is stereo or mono? So, SDP is a way to exchange all that information between peers.
In this page you can take a look of how an SDP example looks like and interact a bit with it:
There's a more in-depth explanation here https://webrtchacksstg.wpengine.com/webrtc-sdp-inaki-baz-castillo/.2
Tasha, that's a great question about the evolution of the SDP and related APIs in the last few years.
The short answer is that exchanging an SDP at the beginning of a call is still required by current WebRTC implementations. But there are many things that in the past required "SDP munging," which you can do in a much simpler way now with the WebRTC 1.0 APIs.
Here are a few things that used to require modifying the SDP by hand, but that now you can do with the transceiver, encoding APIs, and friends include:
- setting a target bitrate cap for a media track
- sending stereo audio
- preferring h264 over vp8, and vice versa
We truly live in the future. ;-)1
This is very helpful. I like the rollover graphic in that resource you linked, but I also enjoyed the in-depth article that compares the use of SDP to ORTC.
This part was particularly interesting:
ICE, DTLS and RTP parameters must be exchanged by peers in order to establish the multimedia session. However no specific format (such as SDP) is mandated by the API. It is up to the developer to determine the format and negotiation procedure.
The article was written in 2017 so I was wondering:
🤔 In 2022 is SDP the most common way to exchange this info before establishing a session?1
What is SDP?
SDP is the Session Description protocol that does two things, a) describes the capabilities of an endpoint, and b) provides a methodology to exchange the described capabilities. SDP description is text-based, kinda human readable and while a text based, it has no explicit hierarchy (for example, XML has tags, JSON has nested dictionaries, SDP has rules defined as an Augmented Backus–Naur form).
The protocol aspect of SDP follows two forms:
I) Declarative Use: the sending endpoint defines a set of capabilities in the SDP, the endpoint that receives these SDP can either set up the call successfully because it has capabilities described in the SDP or fails to set up the connection because there is a capability mismatch. This works for streaming services where the streaming server describes a media session with several codec capabilities (e.g., streaming VP8 at 4K, 30FPS or H.264 at FullHD, 30FPS ). The streaming client can pick the stream that it can decode, in this case there is no back-and-forth exchange of capabilities, if the streaming client does not implement either of the two codecs, it would fail to connect.
II) Offer/Answer (O/A) Use: the Sending endpoint's capabilities in an SDP Offer, the receiving endpoint compare's its capabilities with the ones that were received in the SDP Offer and responds with an SDP Answer. The SDP Answer represents the chosen features, which are a strict subset of the capabilities defined in the SDP Offer. Typically, in this case, the sending endpoint expresses as many as its capabilities (it may want to save some secondary capabilities in case nothing in the primary fit.) The receiver on receiving the sender's capabilities, decides what it can send and receive, then send back the appropriate codec in the SDP Answer. For e.g., the sender may choose VP8, H.264 Constrained Baseline profile as primary for the first O/A exchange and try to upgrade with a second O/A exchange to the H.264 Constrained High Profile if the baseline is picked. The main reason to delegate to a later is to curtail the size of the resulting SDP text blob.
We erred on the side of caution
In 2010, when picking the existing protocol features for WebRTC, we picked the O/A mode for SDP and that was not the best compromise that we could've made. The main reason at the time to prefer this mode was flexibility and interoperability. At the time, there were several entrenched players in the market, with hardware deployed that would benefit from having the flexibility. Ex. Contact Centres could roll out web-based call UIs without having to re-engineer their infrastructure, there are similar motivations for video conferencing service (a lot of hardware in meeting rooms around the world that would become obsolete).
In retrospect, we need an intermediary server when talking to these legacy systems anyway, partly because of features that are in webrtc but not on these old legacy systems, namely, transport-level encryption protocols, firewall/NAT traversal protocols.
The case for declarative mode over O/A
In my opinion, most services that use webrtc, do not federate with some other webrtc/VoIP service (modulo PSTN), which means that they have greater control on what to expect from their install base and as a consequence have more control over the experience that their service provides.
With that assumption in mind, webapps and native mobile apps can very easily become aware about their own device capabilities, which they can share with the Signalling server (outside of the calling context). These capabilities can be shared asynchronously, when a user logs in or routinely as part of the presence data. When the Signalling Server receives intents from participants to join a particular call, given that it already has all the capabilities cached, it could automatically decide what is the best set of common features and tell the participants what capabilities to use. The oft-chance that the capabilities change, the Signalling Server would send new capabilities.
SDP in the Signalling vs SDP in the APIs
In 2022 is SDP the most common way to exchange this info before establishing a session?
The main reason to not define a signalling protocol was to make it possible NOT to send SDPs over the wire between endpoints. The webapp can discover the capabilities and exchange snippets of the capabilities (in any form, does not need to be SDP, but quite often is). For example, an endpoint can do the following:
- The Signalling Server can create all the Identifiers (stream, track, ssrc, msid, etc) for local and remote candidate and share this information with the participating endpoints.
- On receiving the identifiers, the participating endpoints, can call
createOffer()which returns an SDP
- The participating endpoint can munge the SDP with the local identifiers that they received from the Signalling Server, remove and call
- Munge the cloned SDP with the remote identifiers, and call
What is ORTC and impact on WebRTC
If we were to remove the SDP as input and output of the WebRTC API, we would need to replace it with something. Since SDP describes the pipeline, to replace it, we could describing the sending and receiving pipelines as a series of objects that are connected to each other. The advantage would be that the code gives the developer flexibility on how to arrange the pipeline and to set/unset features (that we need to currently do by munging the SDP in the current WebRTC API).
ORTC defined an object model for the pipeline, because the WebRTC API was initially very opaque and hid all the sending and receiving attributes of the pipeline, i.e., there were no objects corresponding to the pipeline exposed in the API. By 2016, the WebRTC API introduced some of the objects defined in the ORTC model giving developers more flexibility and control (and not have to munge the SDP). For example, RTCSenders, RTCReceivers, and related family of objects.
ORTC and SDP?
In the case (declarative mode), where the app developer knows exactly how the media is sent and received, without any exception, there may be no need for signalling. -- the endpoint could just hard-code how the pipeline would handle when a new participant joins the call. This can work for several simple scenarios like two participant or small group calls.
However, in situations where the pipeline needs to adapt (o/a mode), then it makes sense for the Signalling server to describe or define the pipeline and send it to each endpoint. In this case, the ORTC objects need to be stringified in some way. A developer could do this in JSON, BSON, XML, or even in SDP. Personally, I would have loved a JSON hierarchy to represent the pipeline and a BSON conversion for sending over the wire. But I sometimes wonder if building a JSON representation would be foolish errand. The advantage of using SDP is that it defines a lot of rules on how to describe a pipeline, alas some of it may be based on assumptions made in the 90s, and we need to weigh defining a new protocol in JSON or recreating the behemoth that we hate. :)1