Paper title: Unlocking WebRTC for End User Driven Innovation.
Speaker: Kundan Singh

“Smart endpoint and
a dumb network”

“Smart ~~endpoint~~end users and
a dumb ~~network~~service”

Hello everyone. My name is Kundan Singh.

This presentation is really about how to apply this fundamental principle of the Internet - which is smart endpoint and a dumb network - to modern days of multimedia communication and collaboration - as smart user and a dumb service. In particular, I show how to unlock the innovation and flexibility of web audio/video from walled gardens of cloud hosted software-as-a-service systems to the endpoints controlled by the end users.

RTC Helper is a software tool that can intercept WebRTC and related APIs and change the behavior of the web apps in real-time. This can be done on third-party web apps using a browser extension, or by web developers in their existing apps.

First, I argue that even though WebRTC or web real-time communication standards is open, and various browsers implementing it can interoperate, the modern web apps that use WebRTC are often locked.

The paper mentions several motivational examples that the end user should be able to do, but can't because of the limitations or restrictions of the existing apps. For example, you should be able to use your mobile phone as webcam on your desktop video call.

Or to send auto-transcribed caption and your webcam overlaid on your shared screen presentation in a call.

Or to allow an organization's IT to restrict the use of data channel in any web apps used from within its intranet. Or to help the web developers to quickly emulate failure or success conditions for testing and debugging without having to rebuild and redeploy after every change.

Or to blur your face and body for privacy in a video call on third-party web app that does not support such a feature.

The tight coupling between client apps and the service provided by a vendor hinders innovation. For example, if you signup for video conference from a vendor, you must use that vendor's client app. Not only that, you must also use that vendor's note taking, text chat, meeting recording and all other tools. Unlike traditional VoIP interoperability, web apps for communication and collaboration tend to gravitate towards a closed ecosystem model.

The end result is that only the top few popular video conferencing apps dictate the complete user experience of a vast majority of the end users. Moreover, once enough users are locked into the vendor's ecosystem, there is not much motivation to innovate beyond following the features list of the top few products in the market.

This in turn opens up the opportunity for end user driven innovation, because vendor driven innovation often lags behind on many fronts. And this is specially applicable to many WebRTC related features that are largely available in the end point or the browser, and not in the network or service.

The goal of my project is to allow the end users to customize their multimedia communication experience of third party web apps, and also to allow web developers to quickly create implementations of innovative ideas on top of WebRTC.

The web developer can include the software in the source directly, and the end user can use a browser extension as part of this software.

The project is available on github, and includes an extensive documentation. There is also a getting started video guide on youtube.

Project on github
Watch the Getting Started video

Let me briefly describe how this works. First, if you don't know about WebRTC, or web real-time communication, it is a web standard, protocol and API, that allows web developers to create audio/video communication apps directly in JavaScript, and facilitates end-to-end media path directly between the browsers of different users if needed. It also has APIs for accessing your local camera and microphone. This diagram shows the flow of media, audio and video, from left to right and only audio from right to left. It shows various JavaScript APIs that are used to create such an app.

These APIs can be intercepted in the browser. An example code snippet is shown here to intercept the getUserMedia API. There are many successful projects that do this already.

In this software, instead of generically intercepting all APIs, I divide the intercepts into separate categories, shown in red here, based on the use cases and motivational examples. These intercepts can then be customized by injecting JavaScript functions, to modify the behavior of those APIs.

These customization categories are summarized in the paper, and detailed in the documentation. The software provides a way for the end users to inject a function in any of these categories.

For example, this one shows the customization function of the Record category, on a third-party video call app.

There are more than hundred predefined customization functions included in the software, for a wide range of use cases in various categories.

Most of the functions are very small, only a few tens of lines of code. But the goal is for the developers and end users to be able to edit, modify and create such functions.

There are many other details in the paper. I will highlight the questions here, and leave it to the interested readers to read the paper for the answers.

What's more in the paper?

How do you install and use this?
How do different functions interact?

What are the motivational use cases?
How does our work differ from others?

What are limitiations of this approach?
What are the security implications?

This is the gist of what "Unlocking WebRTC for End User Innovation" is about. With RTC Helper, I show that many innovative ideas can be implemented and tried by the end user without depending on a vendor ecosystem for video conferencing and collaboration.

RTC Helper shows how to
unlock WebRTC
for innovation by the
end user

Why? How?

That concludes my presentation. Thanks you.