Movie Conference Part 1: These Things Suck – Ben Garney

Movie Conference Part 1: These Things Suck

What I cannot create, I do not understand. – Richard Feynman

I do a lot of movie talk for work. If it’s not a one on one, it’s pair programming. If it’s not pair programming, it’s a client meeting. I use a lot of Skype and Hangouts.

Sometimes they don’t work for unclear reasons. Sometimes file transfers fail. Sometimes screenshare violates, or when it’s active you don’t get webcam, too. Or the connection lags or drops even tho’ everything is running rapid.

Every time I practice such a failure, I get truly angry and think, “I could do this better!” But I never fairly got angry enough… until now. I guess the weight of years of frustration ultimately got to me.

I wrote my own (prototype) movie conferencing app. It turned out pretty well. And that’s what these posts are about.

Conventions & Caveats

We will be referencing a 640×480 twenty four bit color 24fps movie stream across this series of posts. We will spell out bits vs bytes in most cases to avoid confusion, and when abbreviating will use Mb (lowercase) for bits and MB (uppercase) for bytes.

I am not a movie codec professional and this is an experiment. MPEG-2, H.264, VP9 and other codecs represent the state of the art and are tremendously more sophisticated and capable than what I will describe. I believe there are some good tradeoffs to my system (which I will discuss later), but it is by no means exhaustively optimized or tuned. There are slew of evident improvements I simply didn’t have time to explore in this side project.

Basic Update Algorithm

I began by prototyping a basic algorithm with no network communication – just moving data around in-process. I used dear imgui for the UI, and videoinput for the webcam capture. (I indeed loved working with both for the very first time.) I maintain two buffers, one holding the current framework from the webcam, and the other holding a model of what I’ve “sent” over the simulated network. I also demonstrate the per pixel error inbetween the two.

I divide the pictures into 16px by 16px macroblocks, and calculate the error for each macroblock by taking the root mean square (RMS) of the client framework’s RGB values vs. the local framework’s RGB values for that region. I prioritize blocks with high error and transfer as many as I can every packet. I went with 16px macroblocks out of laziness – there’s lots of research and sample code based on that size.

As you can see, this is a self correcting system. Macroblocks with large errors – indicated by white – are permanently being “transmitted” and diminished to lower error – indicated by black. That is, it’s always attempting to minimize the difference inbetween what the client is eyeing and the current state of the movie feed. The rate at which the system converges on the fresh state is proportional to how rapid we can transfer macroblocks. This permits us to scale to varying bandwidth situations, and it also strongly motivates us to have good compression.

As long as we have some feedback from the client, we can also treat data corrupted or dropped by the network. When we learn about lost data, we’ll update our model of the client state, and let the error correcting behavior treat it. More on that in a later post.

The main flaws with the system at this point are a) we aren’t networking anything yet and b) even if we did, it would require two hundred twenty one megabits/2nd for 480p 30hz movie sent as uncompressed RGB24. This means you’d have to have a well tuned 802.11n network at minimum – 802.11g would not be even close to prompt enough. Peak LTE spectacle also would not come close to treating this much traffic.

Presently, macroblock updates cost us approximately twenty six bits per pixel. We are a bit worse than just sending eight bit RGB values because of overhead in the data protocol – we have to send macroblock positions, note the current compression settings, and so on.

Raw, zip & lzo

So we have an overall treatment, but the bandwidth is way too high for any sort of real world use. We need to reduce our bandwidth by a factor of thirty for this system to be remotely plausible for use on the average seven megabit broadband internet connection!

As MPEG-2, H.264, HEVC, VP9 and other codecs demonstrate, compressing movie is undoubtedly possible. �� But those codecs are all complicated and often elaborate to integrate, modify or debug (said as someone maintaining a production system using ffmpeg). For example, x264 is 100k lines of code without a lot of comments. Some codecs have substantial licensing fees. They also tend to have problems when data is lost during transmission. And many introduce substantial latency due to sophisticated (but very efficient) encoding processes.

A good rule for prototyping is to do the simplest thing very first, then iterate. Add complexity as needed.

So I grabbed miniz and minilzo, and set it up so I could choose which compression technology to use on the macroblock data.

Since these are lossless compression algorithms, there was no switch in photo quality. However, we do see switches in macroblock update size. ZLib at level nine achieved 23.8 bits per pixel. LZO achieved 28.9 bits per pixel. Not so good! Why are we getting such terrible results?

The thickest reason is that neither algorithm is particularly good at brief blocks of data. Both have a “startup” phase where they can’t efficiently compress until a history is built up. Since every packet in our data stream must be self contained, we can’t rely on a collective history. This leads to a big efficiency loss. Even if we could have big blocks, noisy pic data is hard to compress with this family of compressors – they are much better at repetitive, byte aligned data such as text.

We found basic run of the mill compressors to be a bust, but we did build the infrastructure to have compressed macroblocks, which is a vital step forward.

Next Time

Enough for prototyping today! We built a basic algorithm, got some basic spectacle parameters, and took our very first baby steps with compression. We also set up an application framework that can display a complicated UI and capture movie.

Join us next time in Part two as we get our bandwidth down to a plausible range with some lossy compression!

Related video:

Leave a Reply