Xiph logo

Monty : Announcing VP9 (And Opus and Ogg and Vorbis) support coming to Microsoft Edge

Oh look, an official announcement: http://blogs.windows.com/msedgedev/2015/09/08/announcing-vp9-support-coming-to-microsoft-edge/

Called it!

In any case, welcome to the party Microsoft. Feel free to help yourself to the open bar. :-)

(And, to be fair, this isn't as fantastically unlikely as some pundits have been saying. After all, MS does own an IP stake in Opus).

by Monty (monty@xiph.org) at September 08, 2015 11:23 PM

Monty : Comments on the Alliance for Open Media, or, "Oh Man, What a Day"

I assume folks who follow video codecs and digital media have already noticed the brand new Alliance for Open Media jointly announced by Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix. I expect the list of member companies to grow somewhat in the near future.

One thing that's come up several times today: People contacting Xiph to see if we're worried this detracts from the IETF's NETVC codec effort. The way the aomedia.org website reads right now, it might sound as if this is competing development. It's not; it's something quite different and complementary.

Open source codec developers need a place to collaborate on and share patent analysis in a forum protected by client-attorney privilege, something the IETF can't provide. AOMedia is to be that forum. I'm sure some development discussion will happen there, probably quite a bit in fact, but pooled IP review is the reason it exists.

It's also probably accurate to view the Alliance for Open Media (the Rebel Alliance?) as part of an industry pushback against the licensing lunacy made obvious by HEVCAdvance. Dan Rayburn at Streaming Media reports a third HEVC licensing pool is about to surface. To-date, we've not yet seen licensing terms on more than half of the known HEVC patents out there.

In any case, HEVC is becoming rather expensive, and yet increasingly uncertain licensing-wise. Licensing uncertainty gives responsible companies the tummy troubles. Some of the largest companies in the world are seriously considering starting over rather than bet on the mess...

Is this, at long last, what a tipping point feels like?

Oh, and one more thing--

As of today, just after Microsoft announced its membership in the Open Media Alliance, they also quietly changed the internal development status of Vorbis, Opus, WebM and VP9 to indicate they intend to ship all of the above in the new Windows Edge browser. Cue spooky X-files theme music.

by Monty (monty@xiph.org) at September 02, 2015 06:37 AM

Monty : Cisco introduces the Thor video codec

I want to mention this on its own, because it's a big deal, and so that other news items don't steal its thunder: Cisco has begun blogging about their own FOSS video codec project, also being submitted as an initial input to the IETF NetVC video codec working group effort:

"World, Meet Thor – a Project to Hammer Out a Royalty Free Video Codec"

Heh. Even an appropriate almost-us-like nutty headline for the post. I approve :-)

In a bit, I'll write more about what's been going on in the video codec realm in general. Folks have especially been asking lots of questions about HEVCAdvance licensing recently. But it's not their moment right now. Bask in the glow of more open development goodness, we'll get to talking about the 'other side' in a bit.

by Monty (monty@xiph.org) at August 11, 2015 08:29 PM

Monty : libvpx 1.4.0 ("Indian Runner Duck") officially released

Johann Koenig just posted that the long awaited libvpx 1.4.0 is officially released.

"Much like the Indian Runner Duck, the theme of this release is less bugs, less bloat and more speed...."

by Monty (monty@xiph.org) at April 03, 2015 09:20 PM

Monty : Today's WTF Moment: A Competing HEVC Licensing Pool

Had this happened next week, I'd have thought it was an April Fools' joke.

Out of nowhere, a new patent licensing group just announced it has formed a second, competing patent pool for HEVC that is independent of MPEG LA. And they apparently haven't decided what their pricing will be... maybe they'll have a fee structure ready in a few months.

Video on the Net (and let's be clear-- video's future is the Net) already suffers endless technology licensing problems. And the industry's solution is apparently even more licensing.

In case you've been living in a cave, Google has been trying to establish VP9 as a royalty- and strings-free alternative (new version release candidate just out this week!), and NetVC, our own next-next-generation royalty-free video codec, was just conditionally approved as an IETF working group on Tuesday and we'll be submitting our Daala codec as an input to the standardization process. The biggest practical question surrounding both efforts is 'how can you possibly keep up with the MPEG behemoth'?

Apparently all we have to do is stand back and let the dominant players commit suicide while they dance around Schroedinger's Cash Box.

by Monty (monty@xiph.org) at March 27, 2015 06:35 PM

Monty : Daala Blog-Like Update: Bug or feature? [or, the law of Unintentionally Intentional Behaviors]

Codec development is often an exercise in tracking down examples of "that's funny... why is it doing that?" The usual hope is that unexpected behaviors spring from a simple bug, and finding bugs is like finding free performance. Fix the bug, and things usually work better.

Often, though, hunting down the 'bug' is a frustrating exercise in finding that the code is not misbehaving at all; it's functioning exactly as designed. Then the question becomes a thornier issue of determining if the design is broken, and if so, how to fix it. If it's fixable. And the fix is worth it.

[continue reading at Xiph.Org....]

by Monty (monty@xiph.org) at March 26, 2015 04:49 PM

Silvia Pfeiffer : WebVTT Audio Descriptions for Elephants Dream

When I set out to improve accessibility on the Web and we started developing WebSRT – later to be renamed to WebVTT – I needed an example video to demonstrate captions / subtitles, audio descriptions, transcripts, navigation markers and sign language.

I needed a freely available video with spoken text that either already had such data available or that I could create it for. Naturally I chose “Elephants Dream” by the Orange Open Movie Project , because it was created under the Creative Commons Attribution 2.5 license.

As it turned out, the Blender Foundation had already created a collection of SRT files that would represent the English original as well as the translated languages. I was able to reuse them by merely adding a WEBVTT header.

Then there was a need for a textual audio description. I read up on the plot online and finally wrote up a time-alignd audio description. I’m hereby making that file available under the Create Commons Attribution 4.0 license. I’ve added a few lines to the medadata headers so it doesn’t confuse players. Feel free to reuse at will – I know there are others out there that have a similar need to demonstrate accessibility features.

by silvia at March 10, 2015 12:50 PM

Monty : A Fabulous Daala Holiday Update

Before we get into the update itself, yes, the level of magenta in that banner image got away from me just a bit. Then it was just begging for inappropriate abuse of a font...


Hey everyone! I just posted a Daala update that mostly has to do with still image performance improvements (yes, still image in a video codec. Go read it to find out why!). The update includes metric plots showing our improvement on objective metrics over the past year and relative to other codecs. Since objective metrics are only of limited use, there's also side-by-side interactive image comparisons against jpeg, vp8, vp9, x264 and x265.

The update text (and demo code) was originally for a July update, as still image work was mostly in the beginning of the year. That update get held up and hadn't been released officially, though it had been discovered by and discussed at forums like doom9. I regenerated the metrics and image runs to use latest versions of all the codecs involved (only Daala and x265 improved) for this official better-late-than-never progress report!

by Monty (monty@xiph.org) at December 23, 2014 09:09 PM

Monty : Daala Demo 6: Perceptual Vector Quantization (by J.M. Valin)

Jean-Marc has finished the sixth Daala demo page, this one about PVQ, the foundation of our encoding scheme in both Daala and Opus.

(I suppose this also means we've finally settled on what the acronym 'PVQ' stands for: Perceptual Vector Quantization. It's based on, and expanded from, an older technique called Pyramid Vector Quantization, and we'd kept using 'PVQ' for years even though our encoding space was actually spherical. I'd suggested we call it 'Pspherical Vector Quantization' with a silent P so that we could keep the acronym, and that name appears in some of my slide decks. Don't get confused, it's all the same thing!)

by Monty (monty@xiph.org) at November 19, 2014 08:12 PM

Jean-Marc Valin : Daala: Perceptual Vector Quantization (PVQ)

Here's my new contribution to the Daala demo effort. Perceptual Vector Quantization has been one of the core ideas in Daala, so it was time for me to explain how it works. The details involve lots of maths, but hopefully this demo will make the general idea clear enough. I promise that the equations in the top banner are the only ones you will see!

Read more!

November 18, 2014 08:37 PM

Monty : Intra-Paint: A new Daala demo from Jean-Marc Valin

Intra paint is not a technique that's part of the original Daala plan and, as of right now, we're not currently using it in Daala. Jean-Marc envisioned it as a simpler, potentially more effective replacement for intra-prediction. That didn't quite work out-- but it has useful and visually pleasing qualities that, of all things, make it an interesting postprocessing filter, especially for deringing.

Several people have said 'that should be an Instagram filter!' I'm sure Facebook could shake a few million loose for us to make that happen ;-)

by Monty (monty@xiph.org) at September 24, 2014 07:36 PM

Jean-Marc Valin : Daala: Painting Images For Fun (and Profit?)

As a contribution to Monty's Daala demo effort, I decided to demonstrate a technique I've recently been developing for Daala: image painting. The idea is to represent images as directions and 1-D patterns.

Read more!

September 24, 2014 05:24 PM

Jean-Marc Valin : Opus beats all other codecs. Again.

Three years ago Opus got rated higher than HE-AAC and Vorbis in a 64 kb/s listening test. Now, the results of the recent 96 kb/s listening test are in and Opus got the best ratings, ahead of AAC-LC and Vorbis. Also interesting, Opus at 96 kb/s sounded better than MP3 at 128 kb/s.

Full plot

September 18, 2014 01:32 AM

Silvia Pfeiffer : Progress with rtc.io

At the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.


Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.

WDCNZ presentation page5

One of the most exciting opportunities is still under-exploited: the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.

Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.


We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections:

Packages per day across popular platforms (Shamelessly copied from: http://blog.nodejitsu.com/npm-innovation-through-modularity/)

For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.

For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.

Using rtc.io rtc JS library
Using rtc.io rtc JS library

So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course: enjoy WebRTC! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team!

On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating!!

by silvia at August 12, 2014 07:06 AM

Silvia Pfeiffer : AppRTC : Google’s WebRTC test app and its parameters

If you’ve been interested in WebRTC and haven’t lived under a rock, you will know about Google’s open source testing application for WebRTC: AppRTC.

When you go to the site, a new video conferencing room is automatically created for you and you can share the provided URL with somebody else and thus connect (make sure you’re using Google Chrome, Opera or Mozilla Firefox).

We’ve been using this application forever to check whether any issues with our own WebRTC applications are due to network connectivity issues, firewall issues, or browser bugs, in which case AppRTC breaks down, too. Otherwise we’re pretty sure to have to dig deeper into our own code.

Now, AppRTC creates a pretty poor quality video conference, because the browsers use a 640×480 resolution by default. However, there are many query parameters that can be added to the AppRTC URL through which the connection can be manipulated.

Here are my favourite parameters:

  • hd=true : turns on high definition, ie. minWidth=1280,minHeight=720
  • stereo=true : turns on stereo audio
  • debug=loopback : connect to yourself (great to check your own firewalls)
  • tt=60 : by default, the channel is closed after 30min – this gives you 60 (max 1440)

For example, here’s how a stereo, HD loopback test would look like: https://apprtc.appspot.com/?r=82313387&hd=true&stereo=true&debug=loopback .

This is not the limit of the available parameter, though. Here are some others that you may find interesting for some more in-depth geekery:

  • ss=[stunserver] : in case you want to test a different STUN server to the default Google ones
  • ts=[turnserver] : in case you want to test a different TURN server to the default Google ones
  • tp=[password] : password for the TURN server
  • audio=true&video=false : audio-only call
  • audio=false : video-only call
  • audio=googEchoCancellation=false,googAutoGainControl=true : disable echo cancellation and enable gain control
  • audio=googNoiseReduction=true : enable noise reduction (more Google-specific parameters)
  • asc=ISAC/16000 : preferred audio send codec is ISAC at 16kHz (use on Android)
  • arc=opus/48000 : preferred audio receive codec is opus at 48kHz
  • dtls=false : disable datagram transport layer security
  • dscp=true : enable DSCP
  • ipv6=true : enable IPv6

AppRTC’s source code is available here. And here is the file with the parameters (in case you want to check if they have changed).

Have fun playing with the main and always up-to-date WebRTC application: AppRTC.

UPDATE 12 May 2014

AppRTC now also supports the following bitrate controls:

  • arbr=[bitrate] : set audio receive bitrate
  • asbr=[bitrate] : set audio send bitrate
  • vsbr=[bitrate] : set video receive bitrate
  • vrbr=[bitrate] : set video send bitrate

Example usage: https://apprtc.appspot.com/?r=&asbr=128&vsbr=4096&hd=true

by silvia at July 23, 2014 05:02 AM

Monty : Presentations, presentations, presentations and slide decks

The spring 'conference tour' season is finally coming to a close. I'm a bit surprised by the number of people who have asked for my slide decks. Well, slide deck actually, since of course I mostly reused the same slides across the talks this spring.

In any case, I 've posted the latest iteration (from my presentation at the just-finished Google VP9 Summit) for anyone who's interested:


by Monty (monty@xiph.org) at June 07, 2014 06:29 PM

Ben Schwartz : Gooseberry

Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.

by Ben at May 07, 2014 04:16 AM

Ben Schwartz : Stingy/stylish

My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.


by Ben at February 18, 2014 02:58 AM

Ben Schwartz : Efficiency

So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11″ output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

by Ben at January 21, 2014 07:00 AM

Thomas Daede : Zynq DMA performance

I finally got my inverse transform up to snuff, complete with hardware matrix transposer. The transposer was trivial to implement – I realized that I could use the Xilinx data width converter IP to register entire 4×4 blocks at once, allowing my transposer to simply be a bunch of wires (assign statements in Verilog).

Screenshot from 2014-01-08 22:04:27

Unfortunately, I wasn’t getting the performance I was expecting. At a clock speed of 100MHz and a 64-bit width, I expected to be able to perform 25 million transforms per second. However, I was having trouble even getting 4 million. To debug the problem, I used the Xilinx debug cores in Vivado:


There are several problems. Here’s an explanation of what is happening in the above picture:

  1. The CPU configures the DMA registers and starts the transfer. This works for a few clock cycles.
  2. The Stream to Memory DMA (s2mm) starts a memory transfer, but its FIFOs fill up almost immediately and it has to stall (tready goes low).
  3. The transform stream pipeline also stalls, making its tready go low.
  4. The s2mm DMA is able to start its first burst transfer, and everything goes smoothly.
  5. The CPU sees that the DMA has completed, and schedules the second pass. The turnaround time for this is extremely large, and ends up taking the majority of the time.
  6. The same process happens again, but the latency is even larger due to writing to system memory.

Fortunately, the solution isn’t that complicated. I am going to switch to a scatter-gather DMA engine, which allows me to construct a request chain, and then the DMA will execute the operations without CPU intervention, avoiding the CPU latency. In addition, a FIFO can be used to reduce the impact of the initial write latency somewhat, though this costs FPGA area and it might be better just to strive for longer DMA requests.

There are other problems with my memory access at the moment – the most egregious being that my hardware expects a tiled buffer, but the Daala reference implementation uses linear buffers everywhere. This is the problem that I plan to tackle next.

by Thomas Daede at January 09, 2014 04:26 AM

Silvia Pfeiffer : Use deck.js as a remote presentation tool

deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.

Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.

I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.

I loaded a slide master URL:
and the room loaded the URL without query string:

Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.

How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!

We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!

Here are the few things we added to deck.js to make it work:

  • Code added to index.html to make the video connection work:
    <meta name="rtc-signalhost" content="http://rtc.io/switchboard/">
    <meta name="rtc-room" content="lca2014">
    <video id="localV" rtc-capture="camera" muted></video>
    <video id="peerV" rtc-peer rtc-stream="localV"></video>
    <script src="glue.js"></script>
    glue.config.iceServers = [{ url: 'stun:stun.l.google.com:19302' }];

    The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.

  • Added glue.js library to deck.js:

    Downloaded from https://raw.github.com/rtc-io/rtc-glue/master/dist/glue.js into the source directory of deck.js.

  • Code added to index.html to synchronize slide navigation:
    glue.events.once('connected', function(signaller) {
      if (location.search.slice(1) !== '') {
        $(document).bind('deck.change', function(evt, from, to) {
          signaller.send('/slide', {
            idx: to,
            sender: signaller.id
      signaller.on('slide', function(data) {
        console.log('received notification to change to slide: ', data.idx);
        $.deck('go', data.idx);

    This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.

And that’s it!

You can find my slide deck on GitHub.

Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!

Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.

Many thanks to Damon Oehlman for his help in getting this working.

BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. 😉

by silvia at January 08, 2014 02:28 AM

Maik Merten : Tinkering with a H.261 encoder

On the rtcweb mailing list the struggle regarding what Mandatory To Implement (MTI) video codec should be chosen rages on. One camp favors H.264 ("We cannot have VP8"), the other VP8 ("We cannot have H.264"). Some propose that there should be a "safe" fallback codec anyone can implement, and H.261 is "as old and safe" as it can get. H.261 was specified in the final years of the 1980ies and is generally believed to have no non-expired patents left standing. Roughly speaking, this old gem of coding technology can transport CIF resolution (352x288) video at full framerate (>= 25 fps) with (depending on your definition) acceptable quality starting roughly in the 250 to 500 kbit/s range (basically, I've witnessed quite some Skype calls with similar perceived effective resolution, mostly driven by mediocre webcams, and I can live with that as long as the audio part is okay). From today's perspective, H.261 is very very light on computation, memory, and code footprint.

H.261 is, of course, outgunned by any semi-decent more modern video codec, which can, for instance, deliver video with higher resolution at similar bitrates. Those, however, don't have the luxury of having their patents expired with "as good as it can be" certainty.

People on the rtcweb list were quick to point out that having an encoder with modern encoding techniques may by itself be problematic regarding patents. Thankfully, for H.261, a public domain reference-style en- and decoder from 1993 can be found, e.g., at http://wftp3.itu.int/av-arch/video-site/h261/PVRG_Software/P64v1.2.2.tar - so that's a nice reference on what encoding technologies were state-of-the-art in the early 1990ies.

With some initial patching done by Ron Lee this old code builds quite nicely on modern platforms - and as it turns out, the encoder also produces intact video, even on x86_64 machines or on a Raspberry Pi. Quite remarkably portable C code (although not the cleanest style-wise). The original code is slow, though: It barely does realtime encoding of 352x288 video on a 1.65 GHz netbook, and I can barely imagine having it encode video on machines from 1993! Some fun was had in making it faster (it's now about three times faster than before) and the program can now encode from and decode to YUV4MPEG2 files, which is quite a lot more handy than the old mode of operation (still supported), where each frame would consist of three files (.Y, .U, .V).

For those interested, the patched up code is available at https://github.com/maikmerten/p64 - however, be aware that the original coding style (yay, global variables) is still alive and well.

So, is it "useful" to resurrect such an old codebase? Depends on the definition of "useful". For me, as long as it is fun and teaches me something, it's reasonably useful.

So is it fun to toy around with this ancient coding technology? Yes, especially as most modern codecs still follow the same overall design, but H.261 is the most basic instance of "modern" video coding and thus most easy to grasp. Who knows, with some helpful comments here and there that old codebase could be used for teaching basic principles of video coding.

January 07, 2014 07:35 PM

Thomas Daede : Off by one

My hardware had a bug that made about 50% of the outputs off by one. I compared my Verilog code to the original C and it was a one-for-one match, with the exception of a few OD_DCT_RSHIFT that I translated into arithmetic shifts. That turned out to break the transform. Looking at the definition of OD_DCT_RSHIFT:

/*This should translate directly to 3 or 4 instructions for a constant _b:
#define OD_UNBIASED_RSHIFT(_a,_b) ((_a)+(((1<<(_b))-1)&-((_a)<0))>>(_b))*/
/*This version relies on a smart compiler:*/
# define OD_UNBIASED_RSHIFT(_a, _b) ((_a)/(1<<(_b)))

I had always thought a divide by a power of two equals a shift, but this is wrong: an integer divide rounds towards zero, whereas a shift rounds towards negative infinity. The solution is simple: if the value to be shifted is less than zero, add a mask before shifting. Rather than write this logic in Verilog, I simply switched my code to the / operator as in the C code above, and XST inferred the correct logic. After verifying operation with random inputs, I also wrote a small benchmark to test the performance of my hardware:

[root@alarm hwtests]# ./idct4_test 
Filling input buffer... Done.
Running software benchmark... Done.
Time: 0.030000 s
Running hardware benchmark... Done.
Time: 0.960000 s

Not too impressive, but the implementation is super basic, so it’s not unsurprising. Most of the time is spent shuffling data across the extremely slow MMIO interface.

At the same time I was trying to figure out why the 16-bit version of the intra predictor performed uniformly worse than the double precision version – I thought 16 bits ought to be enough. The conversion is done by expanding the range to fill a 16 bit integer and then rounding:

OD_PRED_WEIGHTS_4x4_FIXED[i] = (od_coeff)floor(OD_PRED_WEIGHTS_4x4[i] * 32768 + 0.5);

The 16 bit multiplies with the 16 bit coefficients are summed in a 32 bit accumulator. The result is then truncated to the original range. I did this with a right shift – after the previous ordeal, I tried swapping it with the “round towards zero” macro. Here are the results:


The new 16 bit version even manages to outperform the double version slightly. I believe the reason why the “round to zero” does better than simply rounding down is because it tends to create a slightly negative bias in the encoded coefficients, decreasing coding gain.

by Thomas Daede at January 03, 2014 12:39 AM

Thomas Daede : Daala 4-input idct (tenatively) working!

I’ve implemented Daala’s smallest inverse transform in Verilog code. It appears as an AXI-Lite slave, with 2 32-bit registers for input and two for output. Right now it can do one transform per clock cycle, though at a pitiful 20MHz. I also haven’t verified that all of its output values are identical to the C version yet, but it passes preliminary testing with peek and poke.

Screenshot from 2014-01-01 23:23:22

I also have yet to figure out why XST only uses one DSP block even though my design has 3 multipliers…

by Thomas Daede at January 02, 2014 05:25 AM

Silvia Pfeiffer : New Challenges

I finished up at Google last week and am now working at NICTA, an Australian ICT research institute.

My work with Google was exciting and I learned a lot. I like to think that Google also got a lot out of me – I coded and contributed to some YouTube caption features, I worked on Chrome captions and video controls, and above all I worked on video accessibility for HTML at the W3C.

I was one of the key authors of the W3C Media Accessibility Requirements document that we created in the Media Accessibility Task Force of the W3C HTML WG. I then went on to help make video accessibility a reality. We created WebVTT and the <track> element and applied it to captions, subtitles, chapters (navigation), video descriptions, and metadata. To satisfy the need for synchronisation of video with other media resources such as sign language video or audio descriptions, we got the MediaController object and the @mediagroup attribute.

I must say it was a most rewarding time. I learned a lot about being productive at Google, collaborate successfully over the distance, about how the WebKit community works, and about the new way of writing W3C standard (which is more like pseudo-code). As one consequence, I am now a co-editor of the W3C HTML spec and it seems I am also about to become the editor of the WebVTT spec.

At NICTA my new focus of work is WebRTC. There is both a bit of research and a whole bunch of application development involved. I may even get to do some WebKit development, if we identify any issues with the current implementation. I started a week ago and am already amazed by the amount of work going on in the WebRTC space and the amazing number of open source projects playing around with it. Video conferencing is a new challenge and I look forward to it.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : Video Conferencing in HTML5: WebRTC via Socket.io

Six months ago I experimented with Web sockets for WebRTC and the early implementations of PeerConnection in Chrome. Last week I gave a presentation about WebRTC at Linux.conf.au, so it was time to update that codebase.

I decided to use socket.io for the signalling following the idea of Luc, which made the server code even smaller and reduced it to a mere reflector:

 var app = require('http').createServer().listen(1337);
 var io = require('socket.io').listen(app);

 io.sockets.on('connection', function(socket) {
         socket.on('message', function(message) {
         socket.broadcast.emit('message', message);

Then I turned to the client code. I was surprised to see the massive changes that PeerConnection has gone through. Check out my slide deck to see the different components that are now necessary to create a PeerConnection.

I was particularly surprised to see the SDP object now fully exposed to JavaScript and thus the ability to manipulate it directly rather than through some API. This allows Web developers to manipulate the type of session that they are asking the browsers to set up. I can imaging e.g. if they have support for a video codec in JavaScript that the browser does not provide built-in, they can add that codec to the set of choices to be offered to the peer. While it is flexible, I am concerned if this might create more problems than it solves. I guess we’ll have to wait and see.

I was also surprised by the need to use ICE, even though in my experiment I got away with an empty list of ICE servers – the ICE messages just got exchanged through the socket.io server. I am not sure whether this is a bug, but I was very happy about it because it meant I could run the whole demo on a completely separate network from the Internet.

The most exciting news since my talk is that Mozilla and Google have managed to get a PeerConnection working between Firefox and Chrome – this is the first cross-browser video conference call without a plugin! The code differences are minor.

Since the specification of the WebRTC API and of the MediaStream API are now official Working Drafts at the W3C, I expect other browsers will follow. I am also looking forward to the possibilities of:

The best places to learn about the latest possibilities of WebRTC are webrtc.org and the W3C WebRTC WG. code.google.com has open source code that continues to be updated to the latest released and interoperable features in browsers.

The video of my talk is in the process of being published. There is a MP4 version on the Linux Australia mirror server, but I expect it will be published properly soon. I will update the blog post when that happens.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : The use cases for a element in HTML

The W3C HTML WG and the WHATWG are currently discussing the introduction of a <main> element into HTML.

The <main> element has been proposed by Steve Faulkner and is specified in a draft extension spec which is about to be accepted as a FPWD (first public working draft) by the W3C HTML WG. This implies that the W3C HTML WG will be looking for implementations and for feedback by implementers on this spec.

I am supportive of the introduction of a <main> element into HTML. However, I believe that the current spec and use case list don’t make a good enough case for its introduction. Here are my thoughts.

Main use case: accessibility

In my opinion, the main use case for the introduction of <main> is accessibility.

Like any other users, when blind users want to perceive a Web page/application, they need to have a quick means of grasping the content of a page. Since they cannot visually scan the layout and thus determine where the main content is, they use accessibility technology (AT) to find what is known as “landmarks”.

“Landmarks” tell the user what semantic content is on a page: a header (such as a banner), a search box, a navigation menu, some asides (also called complementary content), a footer, …. and the most important part: the main content of the page. It is this main content that a blind user most often wants to skip to directly.

In the days of HTML4, a hidden “skip to content” link at the beginning of the Web page was used as a means to help blind users access the main content.

In the days of ARIA, the aria @role=main enables authors to avoid a hidden link and instead mark the element where the main content begins to allow direct access to the main content. This attribute is supported by AT – in particular screen readers – by making it part of the landmarks that AT can directly skip to.

Both the hidden link and the ARIA @role=main approaches are, however, band aids: they are being used by those of us that make “finished” Web pages accessible by adding specific extra markup.

A world where ARIA is not necessary and where accessibility developers would be out of a job because the normal markup that everyone writes already creates accessible Web sites/applications would be much preferable over the current world of band-aids.

Therefore, to me, the primary use case for a <main> element is to achieve exactly this better world and not require specialized markup to tell a user (or a tool) where the main content on a page starts.

An immediate effect would be that pages that have a <main> element will expose a “main” landmark to blind and vision-impaired users that will enable them to directly access that main content on the page without having to wade through other text on the page. Without a <main> element, this functionality can currently only be provided using heuristics to skip other semantic and structural elements and is for this reason not typically implemented in AT.

Other use cases

The <main> element is a semantic element not unlike other new semantic elements such as <header>, <footer>, <aside>, <article>, <nav>, or <section>. Thus, it can also serve other uses where the main content on a Web page/Web application needs to be identified.

Data mining

For data mining of Web content, the identification of the main content is one of the key challenges. Many scholarly articles have been published on this topic. This stackoverflow article references and suggests a multitude of approaches, but the accepted answer says “there’s no way to do this that’s guaranteed to work”. This is because Web pages are inherently complex and many <div>, <p>, <iframe> and other elements are used to provide markup for styling, notifications, ads, analytics and other use cases that are necessary to make a Web page complete, but don’t contribute to what a user consumes as semantically rich content. A <main> element will allow authors to pro-actively direct data mining tools to the main content.

Search engines

One particularly important “data mining” tool are search engines. They, too, have a hard time to identify which sections of a Web page are more important than others and employ many heuristics to do so, see e.g. this ACM article. Yet, they still disappoint with poor results pointing to findings of keywords in little relevant sections of a page rather than ranking Web pages higher where the keywords turn up in the main content area. A <main> element would be able to help search engines give text in main content areas a higher weight and prefer them over other areas of the Web page. It would be able to rank different Web pages depending on where on the page the search words are found. The <main> element will be an additional hint that search engines will digest.

Visual focus

On small devices, the display of Web pages designed for Desktop often causes confusion as to where the main content can be found and read, in particular when the text ends up being too small to be readable. It would be nice if browsers on small devices had a functionality (maybe a default setting) where Web pages would start being displayed as zoomed in on the main content. This could alleviate some of the headaches of responsive Web design, where the recommendation is to show high priority content as the first content. Right now this problem is addressed through stylesheets that re-layout the page differently depending on device, but again this is a band-aid solution. Explicit semantic markup of the main content can solve this problem more elegantly.


Finally, naturally, <main> would also be used to style the main content differently from others. You can e.g. replace a semantically meaningless <div id=”main”> with a semantically meaningful <main> where their position is identical. My analysis below shows, that this is not always the case, since oftentimes <div id=”main”> is used to group everything together that is not the header – in particular where there are multiple columns. Thus, the ease of styling a <main> element is only a positive side effect and not actually a real use case. It does make it easier, however, to adapt the style of the main content e.g. with media queries.

Proposed alternative solutions

It has been proposed that existing markup serves to satisfy the use cases that <main> has been proposed for. Let’s analyse these on some of the most popular Web sites. First let’s list the propsed algorithms.

Proposed solution No 1: Scooby-Doo

On Sat, Nov 17, 2012 at 11:01 AM, Ian Hickson <ian@hixie.ch> wrote:
| The main content is whatever content isn't
| marked up as not being main content (anything not marked up with <header>,
| <aside>, <nav>, etc).

This implies that the first element that is not a <header>, <aside>, <nav>, or <footer> will be the element that we want to give to a blind user as the location where they should start reading. The algorithm is implemented in https://gist.github.com/4032962.

Proposed solution No 2: First article element

On Sat, Nov 17, 2012 at 8:01 AM, Ian Hickson  wrote:
| On Thu, 15 Nov 2012, Ian Yang wrote:
| >
| > That's a good idea. We really need an element to wrap all the <p>s,
| > <ul>s, <ol>s, <figure>s, <table>s ... etc of a blog post.
| That's called <article>.

This approach identifies the first <article> element on the page as containing the main content. Here’s the algorithm for this approach.

Proposed solution No 3: An example heuristic approach

The readability plugin has been developed to make Web pages readable by essentially removing all the non-main content from a page. An early source of readability is available. This demonstrates what a heuristic approach can perform.

Analysing alternative solutions


I’ve picked 4 typical Websites (top on Alexa) to analyse how these three different approaches fare. Ideally, I’d like to simply apply the above three scripts and compare pictures. However, since the semantic HTML5 elements <header>, <aside>, <nav>, and <footer> are not actually used by any of these Web sites, I don’t actually have this choice.

So, instead, I decided to make some assumptions of where these semantic elements would be used and what the outcome of applying the first two algorithms would be. I can then compare it to the third, which is a product so we can take screenshots.


http://google.com – search for “Scooby Doo”.

The search results page would likely be built with:

  • a <nav> menu for the Google bar
  • a <header> for the search bar
  • another <header> for the login section
  • another <nav> menu for the search types
  • a <div> to contain the rest of the page
  • a <div> for the app bar with the search number
  • a few <aside>s for the left and right column
  • a set of <article>s for the search results
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element before the app bar in this case. Interestingly, there is a <div @id=main> already in the current Google results page, which “Scooby Doo” would likely also pick. However, there are a nav bar and two asides in this div, which clearly should not be part of the “main content”. Google actually placed a @role=main on a different element, namely the one that encapsulates all the search results.

“First Article” would find the first search result as the “main content”. While not quite the same as what Google intended – namely all search results – it is close enough to be useful.

The “readability” result is interesting, since it is not able to identify the main text on the page. It is actually aware of this problem and brings a warning before displaying this page:

Readability of google.com



A user page would likely be built with:

  • a <header> bar for the search and login bar
  • a <div> to contain the rest of the page
  • an <aside> for the left column
  • a <div> to contain the center and right column
  • an <aside> for the right column
  • a <header> to contain the center column “megaphone”
  • a <div> for the status posting
  • a set of <article>s for the home stream
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains all three columns. It’s actually a <div @id=content> already in the current Facebook user page, which “Scooby Doo” would likely also pick. However, Facebook selected a different element to place the @role=main : the center column.

“First Article” would find the first news item in the home stream. This is clearly not what Facebook intended, since they placed the @role=main on the center column, above the first blog post’s title. “First Article” would miss that title and the status posting.

The “readability” result again disappoints but warns that it failed:



A video page would likely be built with:

  • a <header> bar for the search and login bar
  • a <nav> for the menu
  • a <div> to contain the rest of the page
  • a <header> for the video title and channel links
  • a <div> to contain the video with controls
  • a <div> to contain the center and right column
  • an <aside> for the right column with an <article> per related video
  • an <aside> for the information below the video
  • a <article> per comment below the video
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div @id=content> already in the current YouTube video page, which “Scooby Doo” would likely also pick. However, YouTube’s related videos and comments are unlikely to be what the user would regard as “main content” – it’s the video they are after, which generously has a <div id=watch-player>.

“First Article” would find the first related video or comment in the home stream. This is clearly not what YouTube intends.

The “readability” result is not quite as unusable, but still very bare:


http://wikipedia.com (“Overscan” page)

A Wikipedia page would likely be built with:

  • a <header> bar for the search, login and menu items
  • a <div> to contain the rest of the page
  • an &ls; article> with title and lots of text
  • <article> an <aside> with the table of contents
  • several <aside>s for the left column
Good news: “Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div id=”content” role=”main”> element on Wikipedia, which “Scooby Doo” would likely also pick.

“First Article” would find the title and text of the main element on the page, but it would also include an <aside>.

The “readability” result is also in agreement.


In the following table we have summarised the results for the experiments:

Site Scooby-Doo First article Readability
Facebook.com FAIL FAIL FAIL

Clearly, Wikipedia is the prime example of a site where even the simple approaches find it easy to determine the main content on the page. WordPress blogs are similarly successful. Almost any other site, including news sites, social networks and search engine sites are petty hopeless with the proposed approaches, because there are too many elements that are used for layout or other purposes (notifications, hidden areas) such that the pre-determined list of semantic elements that are available simply don’t suffice to mark up a Web page/application completely.


It seems that in general it is impossible to determine which element(s) on a Web page should be the “main” piece of content that accessibility tools jump to when requested, that a search engine should put their focus on, or that should be highlighted to a general user to read. It would be very useful if the author of the Web page would provide a hint through a <main> element where that main content is to be found.

I think that the <main> element becomes particularly useful when combined with a default keyboard shortcut in browsers as proposed by Steve: we may actually find that non-accessibility users will also start making use of this shortcut, e.g. to get to videos on YouTube pages directly without having to tab over search boxes and other interactive elements, etc. Worthwhile markup indeed.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : What is “interoperable TTML”?

I’ve just tried to come to terms with the latest state of TTML, the Timed Text Markup Language.

TTML has been specified by the W3C Timed Text Working Group and released as a RECommendation v1.0 in November 2010. Since then, several organisations have tried to adopt it as their caption file format. This includes the SMPTE, the EBU (European Broadcasting Union), and Microsoft.

Both, Microsoft and the EBU actually looked at TTML in detail and decided that in order to make it usable for their use cases, a restriction of its functionalities is needed.


The EBU released EBU-TT, which restricts the set of valid attributes and feature. “The EBU-TT format is intended to constrain the features provided by TTML, especially to make EBU-TT more suitable for the use with broadcast video and web video applications.” (see EBU-TT).

In addition, EBU-specific namespaces were introduce to extend TTML with EBU-specific data types, e.g. ebuttdt:frameRateMultiplierType or ebuttdt:smpteTimingType. Similarly, a bunch of metadata elements were introduced, e.g. ebuttm:documentMetadata, ebuttm:documentEbuttVersion, or ebuttm:documentIdentifier.

The use of namespaces as an extensibility mechanism will ascertain that EBU-TT files continue to be valid TTML files. However, any vanilla TTML parser will not know what to do with these custom extensions and will drop them on the floor.

Simple Delivery Profile

With the intention to make TTML ready for “internet delivery of Captions originated in the United States”, Microsoft proposed a “Simple Delivery Profile for Closed Captions (US)” (see Simple Profile). The Simple Profile is also a restriction of TTML.

Unfortunately, the Microsoft profile is not the same as the EBU-TT profile: for example, it contains the “set” element, which is not conformant in EBU-TT. Similarly, the supported style features are different, e.g. Simple Profile supports “display-region”, while EBU-TT does not. On the other hand, EBU-TT supports monospace, sans-serif and serif fonts, while the Simple profile does not.

Thus files created for the Simple Delivery Profile will not work on players that expect EBU-TT and the reverse.

Fortunately, the Simple Delivery Profile does not introduce any new namespaces and new features, so at least it is an explicit subpart of TTML and not both a restriction and extension like EBU-TT.


SMPTE also created a version of the TTML standard called SMPTE-TT. SMPTE did not decide on a subset of TTML for their purposes – it was simply adopted as a complete set. “This Standard provides a framework for timed text to be supported for content delivered via broadband means,…” (see SMPTE-TT).

However, SMPTE extended TTML in SMPTE-TT with an ability to store a binary blob with captions in another format. This allows using SMPTE-TT as a transport format for any caption format and is deemed to help with “backwards compatibility”.

Now, instead of specifying a profile, SMPTE decided to define how to convert CEA-608 captions to SMPTE-TT. Even if it’s not called a “profile”, that’s actually what it is. It even has its own namespace: “m608:”.


With all these different versions of TTML, I ask myself what a video player that claims support for TTML will do to get something working. The only chance it has is to implement all the extensions defined in all the different profiles. I pity the player that has to deal with a SMPTE-TT file that has a binary blob in it and is expected to be able to decode this.

Now, what is a caption author supposed to do when creating TTML? They obviously cannot expect all players to be able to play back all TTML versions. Should they create different files depending on what platform they are targeting, i.e. a EBU-TT version, a SMPTE-TT version, a vanilla TTML version, and a Simple Delivery Profile version? Should they by throwing all the features of all the versions into one TTML file and hope that the players will pick out the right things that they require and drop the rest on the floor?

Maybe the best way to progress would be to make a list of the “safe” features: those features that every TTML profile supports. That may be the best way to get an “interoperable TTML” file. Here’s me hoping that this minimal set of features doesn’t just end up being the usual (starttime, endtime, text) triple.


I just found out that UltraViolet have their own profile of SMPTE-TT called CFF-TT (see UltraViolet FAQ and spec). They are making some SMPTE-TT fields optional, but introduce a new @forcedDisplayMode attribute under their own namespace “cff:”.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : Why I became a HTML5 co-editor

A few weeks ago, I had the honor to be appointed as part of the editorial team of the W3C HTML5 specification.

Since Ian Hickson had recently decided to focus solely on editing the WHATWG HTML living standard specification, the W3C started looking for other editors to take the existing HTML5 specification to REC level. REC level is what other standards organizations call a “ratified standard”.

But what does REC level really mean for HTML?

In my probably somewhat subjective view, recommendation level means that a snapshot is taken of the continuously evolving HTML spec, which has a comprehensive feature set, that is implemented in a cross-browser interoperable way, has a complete test set for the features, and has received wide review. The latter implies that other groups in the W3C have had a chance to look at the specification and make sure it satisfies their basic requirements, which include e.g. applicability to all users (accessibility, internationalization), platforms, and devices (mobile, TV).

Basically it means that we stop for a “moment”, take a deep breath, polish the feature set that we’ve been working on this far, and make sure we all agree on it, before we get back to changing the world with cool new stuff. In a software project we would call it a release branch with feature freeze.

Now, as productive as that may sound for software – it’s not actually that exciting for a specification. Firstly, the most exciting things happen when writing new features. Secondly, development of browsers doesn’t just magically stop to get the release (REC) happening. And lastly, if we’ve done our specification work well, there should be only little work to do. Basically, it’s the unthankful work of tidying up that we’re looking at here. :-)

So, why am I doing it? I am not doing this for money – I’m currently part-time contracting to Google’s accessibility team working on video accessibility and this editor work is not covered by my contract. It wasn’t possible to reconcile polishing work on a specification with the goals of my contract, which include pushing new accessibility features forward. Therefore, when invited, I decided to offer my spare time to the W3C.

I’m giving this time under the condition that I’d only be looking at accessibility and video related sections. This is where my interest and expertise lie, and where I’m passionate to get things right. I want to make sure that we create accessibility features that will be implemented and that we polish existing video features. I want to make sure we don’t digress from implementations which continue to get updated and may follow the WHATWG spec or HTML.next or other needs.

I am not yet completely sure what the editorship will entail. Will we look at tests, too? Will we get involved in HTML.next? This far we’ve been preparing for our work by setting up adequate version control repositories, building a spec creation process, discussing how to bridge to the WHATWG commits, and analysing the long list of bugs to see how to cope with them. There’s plenty of actual text editing work ahead and the team is shaping up well! I look forward to the new experiences.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : Video Conferencing in HTML5: WebRTC via Web Sockets

A bit over a week ago I gave a presentation at Web Directions Code 2012 in Melbourne. Maxine and John asked me to speak about something related to HTML5 video, so I went for the new shiny: WebRTC – real-time communication in the browser.

Presentation slides

I only had 20 min, so I had to make it tight. I wanted to show off video conferencing without special plugins in Google Chrome in just a few lines of code, as is the promise of WebRTC. To a large extent, I achieved this. But I made some interesting discoveries along the way. Demos are in the slide deck.

UPDATE: Opera 12 has been released with WebRTC support.

Housekeeping: if you want to replicate what I have done, you need to install a Google Chrome Web Browser 19+. Then make sure you go to chrome://flags and activate the MediaStream and PeerConnection experiment(s). Restart your browser and now you can experiment with this feature. Big warning up-front: it’s not production-ready, since there are still changes happening to the spec and there is no compatible implementation by another browser yet.

Here is a brief summary of the steps involved to set up video conferencing in your browser:

  1. Set up a video element each for the local and the remote video stream.
  2. Grab the local camera and stream it to the first video element.
  3. (*) Establish a connection to another person running the same Web page.
  4. Send the local camera stream on that peer connection.
  5. Accept the remote camera stream into the second video element.

Now, the most difficult part of all of this – believe it or not – is the signalling part that is required to build the peer connection (marked with (*)). Initially I wanted to run completely without a server and just enter the remote’s IP address to establish the connection. This is, however, not a functionality that the PeerConnection object provides [might this be something to add to the spec?].

So, you need a server known to both parties that can provide for the handshake to set up the connection. All the examples that I have seen, such as https://apprtc.appspot.com/, use a channel management server on Google’s appengine. I wanted it all working with HTML5 technology, so I decided to use a Web Socket server instead.

I implemented my Web Socket server using node.js (code of websocket server). The video conferencing demo is in the slide deck in an iframe – you can also use the stand-alone html page. Works like a treat.

While it is still using Google’s STUN server to get through NAT, the messaging for setting up the connection is running completely through the Web Socket server. The messages that get exchanged are plain SDP message packets with a session ID. There are OFFER, ANSWER, and OK packets exchanged for each streaming direction. You can see some of it in the below image:

WebRTC demo

I’m not running a public WebSocket server, so you won’t be able to see this part of the presentation working. But the local loopback video should work.

At the conference, it all went without a hitch (while the wireless played along). I believe you have to host the WebSocket server on the same machine as the Web page, otherwise it won’t work for security reasons.

A whole new world of opportunities lies out there when we get the ability to set up video conferencing on every Web page – scary and exciting at the same time!

by silvia at January 01, 2014 08:53 AM

Thomas Daede : Quantized intra predictor

Daala’s intra predictor currently uses doubles. Floating point math units are really expensive in hardware, and so is loading 64 bit weights. Therefore, I modified Daala to see what would happen if the weights were rounded to signed 16 bit. The result is below:ssimRed is before quantization, green after. This is too much loss – I’ll have to figure out why this happened. Worst case I move to 32 bit weights, though maybe my floor(+0.5) method of rounding is also suspect? Maybe the intra weights should be trained taking quantization into account?

by Thomas Daede at December 31, 2013 11:37 PM

Thomas Daede : First Zynq bitstream working!

Screenshot from 2013-12-31 14:51:20I got my first custom PL hardware working! Following the Zedboard tutorials, it was relatively straightforward, though using Vivado 2013.3 required a bit of playing around – I ended up making my own clock sources and reset controller until I realized that the Zynq PS had them if you enabled them. Next up: ChipScope or whatever it’s called in Vivado.

I crashed the chip numerous times until realizing that the bitstream file name had changed somewhere in the process, so I was uploading an old version of the bitstream….

by Thomas Daede at December 31, 2013 08:56 PM

Thomas Daede : Daala profiling on ARM

I reran the same decoding as yesterday, but this time on the Zynq Cortex-A9 instead of x86. Following is the histogram data, again with the functions I plan to accelerate highlighted:

 19.60%  lt-dump_video  [.] od_intra_pred16x16_mult
  6.66%  lt-dump_video  [.] od_intra_pred8x8_mult
  6.02%  lt-dump_video  [.] od_bin_idct16
  4.88%  lt-dump_video  [.] .divsi3_skip_div0_test
  4.54%  lt-dump_video  [.] od_bands_from_raster
  4.21%  lt-dump_video  [.] laplace_decode
  4.03%  lt-dump_video  [.] od_chroma_pred
  3.92%  lt-dump_video  [.] od_raster_from_bands
  3.66%  lt-dump_video  [.] od_post_filter16
  3.20%  lt-dump_video  [.] od_intra_pred4x4_mult
  3.09%  lt-dump_video  [.] od_apply_filter_cols
  3.08%  lt-dump_video  [.] od_bin_idct8
  2.60%  lt-dump_video  [.] od_post_filter8
  2.00%  lt-dump_video  [.] od_tf_down_hv
  1.69%  lt-dump_video  [.] od_intra_pred_cdf
  1.55%  lt-dump_video  [.] od_ec_decode_cdf_unscaled
  1.46%  lt-dump_video  [.] od_post_filter4
  1.45%  lt-dump_video  [.] od_convert_intra_coeffs
  1.44%  lt-dump_video  [.] od_convert_block_down
  1.28%  lt-dump_video  [.] generic_model_update
  1.24%  lt-dump_video  [.] pvq_decoder
  1.21%  lt-dump_video  [.] od_bin_idct4

The results are very similar as expected to x86, however there are a few oddities. One is that the intra prediction is even slower than on x86, and another is that the software division routine shows up relatively high in the list. It turns out that the division comes from the inverse lapping filters – although division by a constant can be replaced by a fixed point multiply, the compiler seems not to have done this, which hurts performance a lot.

For fun, let’s see what happens when remove the costly transforms and force 4×4 block sizes only:

 26.21%  lt-dump_video  [.] od_intra_pred4x4_mult
  7.35%  lt-dump_video  [.] od_intra_pred_cdf
  6.28%  lt-dump_video  [.] od_post_filter4
  6.17%  lt-dump_video  [.] od_chroma_pred
  5.77%  lt-dump_video  [.] od_bin_idct4
  4.04%  lt-dump_video  [.] od_bands_from_raster
  3.94%  lt-dump_video  [.] generic_model_update
  3.86%  lt-dump_video  [.] od_apply_filter_cols
  3.64%  lt-dump_video  [.] od_raster_from_bands
  3.29%  lt-dump_video  [.] .divsi3_skip_div0_test
  2.47%  lt-dump_video  [.] od_convert_intra_coeffs
  2.07%  lt-dump_video  [.] od_intra_pred4x4_get
  1.95%  lt-dump_video  [.] od_apply_postfilter
  1.82%  lt-dump_video  [.] od_tf_up_hv_lp
  1.81%  lt-dump_video  [.] laplace_decode
  1.74%  lt-dump_video  [.] od_ec_decode_cdf
  1.67%  lt-dump_video  [.] pvq_decode_delta
  1.61%  lt-dump_video  [.] od_apply_filter_rows
  1.55%  lt-dump_video  [.] od_bin_idct4x4

The 4×4 intra prediction has now skyrocketed to the top, with the transforms and filters increasing as well. I was surprised by the intra prediction decoder (od_intra_pred_cdf) taking up so much time, but it can be explained by much more prediction data coded relative to the image size due to the smaller blocks. The transform still doesn’t take much time, which I suppose shouldn’t be surprising given how simple it is – my hardware can even do it in 1 cycle.

by Thomas Daede at December 30, 2013 12:23 AM

Thomas Daede : Daala profiling on x86

Given that the purpose of my hardware acceleration is to run Daala at realtime speeds, I decided to benchmark the Daala player on my Core 2 Duo laptop. I used a test video at 720p24, encoded with -v 16 and no reference frames (intra only). The following is the perf annotate output:

 19.49%  lt-player_examp  [.] od_state_upsample8
 11.64%  lt-player_examp  [.] od_intra_pred16x16_mult
  5.74%  lt-player_examp  [.] od_intra_pred8x8_mult

20 percent for od_state_upsample8? Turns out that the results of this aren’t even used in intra only mode, so commenting it out yields a more reasonable result:

 14.50%  lt-player_examp  [.] od_intra_pred16x16_mult
  7.17%  lt-player_examp  [.] od_intra_pred8x8_mult
  6.37%  lt-player_examp  [.] od_bin_idct16
  5.09%  lt-player_examp  [.] od_post_filter16
  4.63%  lt-player_examp  [.] laplace_decode
  4.41%  lt-player_examp  [.] od_bin_idct8
  4.10%  lt-player_examp  [.] od_post_filter8
  3.86%  lt-player_examp  [.] od_apply_filter_cols
  3.28%  lt-player_examp  [.] od_chroma_pred
  3.18%  lt-player_examp  [.] od_raster_from_bands
  3.14%  lt-player_examp  [.] od_intra_pred4x4_mult
  2.84%  lt-player_examp  [.] pvq_decoder
  2.76%  lt-player_examp  [.] od_ec_decode_cdf_unscaled
  2.71%  lt-player_examp  [.] od_tf_down_hv
  2.58%  lt-player_examp  [.] od_post_filter4
  2.45%  lt-player_examp  [.] od_bands_from_raster
  2.13%  lt-player_examp  [.] od_intra_pred_cdf
  1.98%  lt-player_examp  [.] od_intra_pred16x16_get
  1.89%  lt-player_examp  [.] pvq_decode_delta
  1.50%  lt-player_examp  [.] od_convert_intra_coeffs
  1.43%  lt-player_examp  [.] generic_model_update
  1.37%  lt-player_examp  [.] od_convert_block_down
  1.21%  lt-player_examp  [.] od_ec_decode_cdf
  1.18%  lt-player_examp  [.] od_ec_dec_normalize
  1.18%  lt-player_examp  [.] od_bin_idct4

I have bolded the functions that I plan to implement in hardware. As you can see, they sum to only about 23% of the total execution time – this means that accelerating these functions alone won’t bring me into realtime decoding performance. Obvious other targets include the intra prediction matrix multiplication, though this might be better handled by NEON acceleration for now – I’m not too familiar with that area of the code yet.

by Thomas Daede at December 29, 2013 03:44 AM

Jean-Marc Valin : Opus 1.1 released

Opus 1.1
After more than two years of development, we have released Opus 1.1. This includes:

  • new analysis code and tuning that significantly improves encoding quality, especially for variable-bitrate (VBR),

  • automatic detection of speech or music to decide which encoding mode to use,

  • surround with good quality at 128 kbps for 5.1 and usable down to 48 kbps, and

  • speed improvements on all architectures, especially ARM, where decoding uses around 40% less CPU and encoding uses around 30% less CPU.

These improvements are explained in more details in Monty's demo (updated from the 1.1 beta demo). Of course, this new version is still fully compliant with the Opus specification (RFC 6716).

December 06, 2013 03:08 AM

Jean-Marc Valin : Opus 1.1-rc is out

We just released Opus 1.1-rc, which should be the last step before the final 1.1 release. Compared to 1.1-beta, this new release further improves surround encoding quality. It also includes better tuning of surround and stereo for lower bitrates. The complexity has been reduced on all CPUs, but especially ARM, which now has Neon assembly for the encoder.

With the changes, stereo encoding now produces usable audio (of course, not high fidelity) down to about 40 kb/s, with surround 5.1 sounding usable down to 48-64 kb/s. Please give this release a try and report any issues on the mailing list or by joining the #opus channel on irc.freenode.net. The more testing we get, the faster we'll be able to release 1.1-final.

As usual, the code can be downloaded from: http://opus-codec.org/downloads/

November 27, 2013 06:08 AM

Thomas Daede : Senior Honors Thesis – Daala in Hardware

not actually the daala logo

For my honors thesis, I am implementing part of the Daala decoder in hardware. This is not only a way for me to learn more about video coding and hardware, but also a way to provide feedback to the Daala project and create a reference hardware implementation.

The Chip

Part of the reason for a hardware implementation of any video codec is to make it possible to decode on an otherwise underpowered chip, such as the mobile processors common in smart phones and tablets. A very good model of this sort of chip is the Xilinx Zynq processor, which has two midrange ARM Cortex cores, and a large FPGA fabric surrounding them. The custom video decoder will be implemented in the FPGA, with high speed direct-memory-access providing communication with the ARM cores.

The Board

Image from zedboard.org

I will be using the ZedBoard, a low cost prototyping board based on the Zynq 7020 system-on-chip. It includes 512MB of DDR, both HDMI and VGA video output, Ethernet, serial, and boots Linux out of the box. The only thing that could make it better would be a cute kitten on the silkscreen.

Choosing what to accelerate

For now, parts of the codec will still run in software. This is because many of them would be very complicated state machines in hardware, and more importantly, it allows me to incrementally add hardware acceleration while maintaining a functional decoder. To make a particular algorithm a good candidate for hardware acceleration, it needs to have these properties:

  • Stable – Daala is a rapidly changing codec, and while much of it is expected to change, it takes time to implement hardware, and it’s much easier if the reference isn’t changing under my feet.
  • Parallel – Compared to a CPU, hardware can excel at exploiting parallelism. CPUs can do it too with SIMD instructions, but hardware can be tailor made to the application.
  • Independent – The hardware accelerator can act much like a parallel thread, which means that locking and synchronization comes into play. Ideally the hardware and CPU should rarely have to wait for each other.
  • Interesting – The hardware should be something unique to Daala.

The best fit that I have found for these is the transform stage of Daala. The transform stage is a combination of the discrete cosine transform (actually an integer approximation), and a lapping filter. While the DCT is an old concept, the 2D lapping filter is pretty unique to Daala, and implementing both in tightly coupled hardware can create a large performance benefit. More info on the transform stage can be found on Monty’s demo pages.

by Thomas Daede at November 26, 2013 03:04 AM

Thomas Daede : Inside a TTL crystal oscillator

Inside Crystal View 1

In case you ever wanted to know what is inside an oscillator can… I used a dremel so that now you can know. The big transparent disc on the right is the precisely cut quartz resonator, suspended on springs. On the left is a driver chip and pads for loading capacitors to complete the oscillator circuit. The heat from my dremel was enough to melt the solder and remove the components. Your average crystal can won’t have the driver chip or capacitors – most microcontrollers now have the driver circuitry built-in.


by Thomas Daede at November 25, 2013 10:49 PM

Jean-Marc Valin : Back from AES

I just got back from the 135th AES convention, where Koen Vos and I presented two papers on voice and music coding in Opus.

Also of interest at the convention was the Fraunhofer codec booth. It appears that Opus is now causing them some concerns, which is a good sign. And while we're on that topic, the answer is yes :-)


October 24, 2013 01:48 AM

Jean-Marc Valin : Opus 1.1-beta release and a demo page by Monty

Opus 1.1-beta
We just released Opus 1.1-beta, which includes many improvements over the 1.0.x branch. For this release, Monty made a nice demo page showing off most of the new features. In other news, the AES has accepted my paper on the CELT part of Opus, as well as a paper proposal from Koen Vos on the SILK part.

July 12, 2013 04:31 PM

Thomas B. Rücker : Icecast – How to prevent listeners from being disconnected

A set of hints for common questions and situations people encounter while setting up a radio station using Icecast.

  • The source client is on an unstable IP connection.
    • You really want to avoid such situations. If this is a professional setup you might even consider a dedicated internet connection for the source client. The least would be QoS with guaranteed bandwidth.
      I’ve seen it far too often over the years that people complain about ‘Icecast drops my stream all the time’ and after some digging we find that the same person or someone on their network runs a BitTorrent or some other bandwidth intensive application
    • If the TCP connection just stalls from time to time for a few seconds, then you might improve the situation by increasing the source-timeout, but don’t increase it too much as that will lead to weird behaviour for your listeners too. Beyond 15-20s I’d highly recommend to consider the next option instead.
    • If the TCP connection breaks, so the source client gets fully disconnected and has to reconnect to Icecast, then you really want to look into setting up fallbacks for your streams, as otherwise this also immediately disconnects all listeners.
      What this does is it transfers all listeners to a backup stream (or static file) instead of disconnecting them. Preferably you’d have that fallback stream generated locally on the same machine as the Icecast server, e.g. some ‘elevator music’ with an announcement of ‘technical difficulties, regular programming will resume soon’.
      There is a special case: If you opt for ‘silence’ as your fallback and you stream in e.g. Ogg/Vorbis, then it will compress digital silence to a few bit per second, as it’s incredibly compressible. This completely messes up most players. The workaround is to inject some -90dB AWGN, which is below hearing level, but keeps the encoder busy and the bit-rate up.
      Important note: To avoid problems you should match the parameters of both streams closely to avoid problems in playback, as many players won’t cope well if the codec, sample rate or other things change mid stream…


  • You want to have several different live shows during the week and in between some automated playlist driven programming
    • use (several) fallbacks
      primary mountpoint: live shows
      secondary mountpoint: playlist
      tertiary mountpoint: -90dB AWGN (optional, e.g. as a insurance if the playlist fails)
    • If you want to have this all completely hidden from the listeners with one mount point, that is automatically also listed on http://dir.xiph.org, then you need to add one more trick:
      Set up a local relay of the primary mountpoint, set it to force YP submission.
      Set all the three other mounts to hidden and force disable YP submission.
      This gives you one visible mountpoint for your users and all the ‘magic’ is hidden.


I’ll expand this post by some simple configuration examples later today/tomorrow.

by Thomas B. Rücker at July 08, 2013 07:08 AM

Thomas B. Rücker : Icecast · analyzing your logs to create useful statistics

There are many ways to analyze Icecast usage and generate statistics. It has a special statistics TCP stream, it has hooks that trigger on source or listener connect/disconnect events, you can query it over custom XSLT or read the stats.xml or you can analyze its log files. This post however will be only about the latter. I’ll be writing separate posts on the other topics soon.

  • Webalizer streaming version
    A fork of webalizer 2.10 extended to support icecast2 logs and produce nice listener statistics.
    I’ve used this a couple of years ago. It could use some love (it expects a manual input of ‘average bitrate’ while that could be calculated), but otherwise does a very good job of producing nice statistics.
  • AWStats – supports ‘streaming’ logs.
    Haven’t used this one myself for Icecast analysis, but it’s a well established and good open source log analysis tool.
  • Icecast log parser  – Log parser to feed a mySQL database
    I ran into this one recently, it seems to be a small script that helps you feed a mySQL database from your log files. Further analysis is then possible through SQL queries. A nice building block if you want to build a custom solution.

In case you want to parse the access log yourself you should be aware of several things. Largely the log is compatible with other httpd log formats, but there are slight differences. Please look at those two log entries: - - [02/Jun/2013:20:35:50 +0200] "GET /test.webm HTTP/1.1" 200 3379662 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0" 88 - - [02/Jun/2013:20:38:13 +0200] "PUT /test.webm HTTP/1.1" 200 19 "-" "-" 264

The first one is a listener (in this case Firefox playing a WebM stream). You can see it received 3379662 bytes, but the interesting part is the last entry on that line “88”. It’s the number of seconds it was connected. Something important for streaming, less important for a file serving httpd.

The second entry is a source client (in this case using the HTTP PUT method we’re adding for Icecast 2.4). Note that it only says “19” after the HTTP status code. That might seem very low at first, but it’s important as we’re not logging the number of bytes sent by the client to the server, but from the server to the client. If necessary you could extract this from the error.log though. The “264” at the end once again indicates it was streaming for 264 seconds.

Another thing, as we’ve heard that question pop up repeatedly. The log entry is only generated once the listener/source actually disconnects, as only then we know all the details that should go into that line. If you are looking for real time statistics, then the access.log is the wrong place, sorry.

There are also some closed source solutions, but I’ve never used them and I don’t think they provide significant benefit over the available open source solutions mentioned above.

If you know a good Icecast tool, not only for log analysis, please write a comment or send a mail to one of the Icecast mailing lists.

by Thomas B. Rücker at June 02, 2013 06:52 PM

Jean-Marc Valin : The Myth of the $100,000 Listening Test

Ever since we started working on Opus at the IETF, it's been a recurring theme. "You guys don't know how to test codecs", "You can't be serious unless you spend $100,000 testing your codec with several independent labs", or even "designing codecs is easy, it's testing that's hard". OK, subjective testing is indeed important. After all, that's the main thing that differentiates serious signal processing work from idiots using $1000 directional, oxygen-free speaker cable. However, just like speaker cables, more expensive listening tests do not necessarily mean more useful results. In this post I'm going to explain why this kind of thinking is wrong. I will avoid naming anyone here because I want to attack the myth of the $100,000 listening test, not the people who believe in it.

In the Beginning

Back in the 70s and 80s, digital audio equipment was very expensive, complicated to deploy, and difficult to test at all. Not everyone could afford analog-to-digital converters (ADC) or digital-to-analog converters (DAC), so any testing required using expensive, specialized labs. When someone came up with a new piece of equipment or a codec, it could end up being deployed for several decades, so it made sense to give it to one of these labs to test the hell out of it. At the same time, it wasn't too hard to do a good job in testing because algorithms were generally simple and codecs only supported one or two modes of operation. For example, a codec like G.711 only has a single bit-rate and can be implemented in less than 10 lines of code. With something that simple, it's generally not too hard to have 100% code coverage and make sure all corner cases are handled correctly. Considering the investments involved, it just made sense to pay tens or hundreds of thousands of dollars to make sure nothing blows up. This was paid by large telcos and their suppliers, so they could afford it anyway.

Things remained pretty much the same through the 90s. When G.729 was standardized in 1995, it still only had a single bit-rate, and the computational complexity was still beyond what a PC could do in real-time. A few years later, we finally got codecs like AMR-NB that supported several bit-rates, though the number was still small enough that you could test each of them.

Enter Opus

When we first attempted to create a codec working group (WG) at the IETF, some folks were less than thrilled to have their "codec monopoly" challenged. The first objection we heard was "you're not competent enough to write a codec". After pointing out that we already had three candidate codecs on the table (SILK, CELT, BroadVoice), created by the authors of 3 already-deployed codecs (iSAC, Speex, G.728), the objection quickly switched to testing. After all, how was the IETF going to review this work and make sure it was any good?

The best answer came from an old-time ("gray beard") IETF participant and was along the lines of: "we at the IETF are used to reviewing things that are a lot harder to evaluate, like crypto standards. When it comes to audio, at least all of us have two ears". And it makes sense. Among all the things the IETF does (transport protocols, security, signalling, ...), codecs are among the easiest to test because at least you know the criteria and they're directly measurable. Audio quality is a hell of a lot easier to measure than "is this cipher breakable?", "is this signalling extensible enough?", or "Will this BGP update break the Internet?"

Of course, that was not the end of the testing story. For many months in 2011 we were again faced with never-ending complaints that Opus "had not been tested". There was this implicit assumption that testing the final codec improves the codec. Yeah right! Apparently, the Big-Test-At-The-End is meant to ensure that the codec is good and if it's not then you have to go back to the drawing board. Interestingly, I'm not aware of a single ITU-T codec for which that happened. On the other hand, I am aware of at least one case where the Big-Test-At-The-End revealed someting wrong. Let's look at the listening test results from the AMR-WB (a.k.a. G.722.2) codec. AMR-WB has 9 bitrates, ranging from 6.6 kb/s to 23.85 kb/s. The interesting thing with the results is that when looking at the two highest rates (23.05 and 23.85) one notices that the 23.85 kb/s mode actually has lower quality than the lower 23.05 bitrate. That's a sign that something's gone wrong somewhere. I'm not aware of why that was the case or what exactly happened from there, but apparently it didn't bother people enough to actually fix the problem. That's the problem with final tests, they're final.

A Better Approach

What I've learned from Opus is that it's possible to have tests that are far more useful and much cheaper. First, final tests aren't that useful. Although we did conduct some of those, ultimately their main use ends up being for marketing and bragging rights. After all, if you still need these tests to convince yourself that your codec is any good, something's very wrong with your development process. Besides, when you look at a codec like Opus, you have about 1200 possible bitrates, using three different coding modes, four different frame sizes, and either mono or stereo input. That's far more than one can reliably test with traditional subjective listening tests. Even if you could, modern codecs are complex enough that some problems may only occur with very specific audio signals.

The single testing approach that gave us the most useful results was also the simplest: just put the code out there so people can use it. That's how we got reports like "it works well overall, but not on this rare piece of post-neo-modern folk metal" or "it worked for all our instruments except my bass". This is not something you can catch with ITU-style testing. It's one of the most fundamental principles of open-source development: "given enough eyeballs, all bugs are shallow". Another approach was simply to throw tons of audio at it and evaluate the quality using PEAQ-style objective measurement tools. While these tools are generally unreliable for precise evaluation of a codec quality, they're pretty good at flagging files the codec does badly on for further analysis.

We ended up using more than a dozen different approaches to testing, including various flavours of fuzzing. In the end, when it comes to the final testing, nothing beats having the thing out there. After all, as our Skype friends would put it:

Which codec do you trust more? The codec that's been tested by dozens of listeners in a highly controlled lab, or the codec that's been tested by hundreds of millions of listeners in just about all conditions imaginable?
It's not like we actually invented anything here either. Software testing has evolved quite a bit since the 80s and we've mainly attempted to follow the best practices rather than use antiquated methods "because that's what we've always done".

March 18, 2013 07:02 PM

Jean-Marc Valin : Defending Opus IPR Status

For those who had been wondering what we thought of the recent France Telecom IPR declaration against Opus, here's our response. It's nice to be working for a company that isn't afraid of speaking publicly about patents.

February 06, 2013 09:11 PM

Chris Pearce : HTML5 video playbackRate and Ogg chaining support landed in Firefox

Paul Adenot has recently landed patches in Firefox to enable the playbackRate attribute on HTML5 <audio> and <video> elements.

This is a cool feature that I've been looking forward to for a while; it means users can speed up playback of videos (and audio) so that you can for example watch presentations sped up and only slow down for the interesting bits. Currently Firefox's <video> controls don't have support for playbackRate, but it is accessible from JavaScript, and hopefully we'll get support added to our built-in controls soon.

Paul has also finally landed support for Ogg chaining. This has been strongly desired for quite some time by community members, and the final patch also had contributions from "oneman" (David Richards), who also was a strong advocate for this feature.

We decided to reduce the scope of our chaining implementation in order to make it easier and quicker to implement. We targeted the features most desired by internet radio providers, and so we only support chaining in Ogg Vorbis and Opus audio files and we disable seeking in chained files.

Thanks Paul and David for working on these features!

by Chris Pearce (noreply@blogger.com) at December 23, 2012 02:28 AM

Jean-Marc Valin : Releasing Opus 1.1-alpha

We just released Opus 1.1-alpha, which includes more than one year of development compared to the 1.0.x branch. There are quality improvements, optimizations, bug fixes, as well as an experimental speech/music detector for mode decisions. That being said, it's still an alpha release, which means it can also do stupid things sometimes. If you come across any of those, please let us know so we can fix it. You can send an email to the mailing list, or join us on IRC in #opus on irc.freenode.net. The main reason for releasing this alpha is to get feedback about what works and what does not.

Quality improvements

Most of the quality improvements come from the unconstrained variable bitrate (VBR). In the 1.0.x encoder VBR always attempts to meet its target bitrate. The new VBR code is free to deviate from its target depending on how difficult the file is to encode. In addition to boosting the rate of transients like 1.0.x goes, the new encoder also boosts the rate of tonal signals which are harder to code for Opus. On the other hand, for signals with a narrow stereo image, Opus can reduce the bitrate. What this means in the end is that some files may significantly deviate from the target. For example, someone encoding his music collection at 64 kb/s (nominal) may find that some files end up using as low as 48 kb/s, while others may use up to about 96 kb/s. However, for a large enough collection, the average should be fairly close to the target.

There are a few more ways in which the alpha improves quality. The dynamic allocation code was improved and made more aggressive, the transient detector was once again rewritten, and so was the tf analysis code. A simple thing that improves quality of some files is the new DC rejection (3-Hz high-pass) filter. DC is not supposed to be present in audio signals, but it sometimes is and harms quality. At last, there are many minor improvements for speech quality (both on the SILK side and on the CELT side), including changes to the pitch estimator.

Speech/music detector

Another big feature is automatic detection of speech and music. This is useful for selecting the optimal encoding mode between SILK-only/hybrid and CELT-only. Unlike what some people think, it's not as simple as encoding all music with CELT and all speech with SILK. It also depends on the bitrate (at very low rate, we'll use SILK for music and at high rate, we'll use CELT for speech). Automatic detection isn't easy, but doing so in real-time (with no look-ahead) is even harder. Because of that the detector tends to take 1-2 seconds before reacting to transitions and will sometimes make bad decisions. We'd be interested in knowing about any screw ups of the algorithm.

Bandwidth detection

The new encoder can also detect the bandwidth of the input signal. This is useful to avoid wasting bits encoding frequencies that aren't present in the signal. While easier than speech/music detection, bandwidth detection isn't as easy as it sounds because of aliasing, quantization and dithering. The current algorithm should do a reasonable job, but again we'd be interested in knowing about any failure.

December 22, 2012 05:04 PM

Thomas Daede : Centaurus 3 Lights

Centaurus 3′s lights gathered some attention during FSGP 2012. Lights are a surprisingly hard thing to get right on a solar car – they need to be bright, efficient, aerodynamic, and pretty.

Previous solar cars have tried many things, with varying success. A simple plastic window and row of T1 LEDs works well, but if the window needs to be a strange shape, it can get difficult and ugly. Acrylic light pipes make very thin and aerodynamic lights, but it can be difficult to get high efficiency, or even acceptable brightness.

There are two general classes of LEDs. One is the low power, highly focused, signaling types of LEDs. These generally draw less than 75mA, and usually come with focused or diffuse lenses, such as a T1 3/4 case. This is also the type of LED in the ASC 2012 reference lights. The other type of LED is the high brightness lighting-class LED, such as those made by Philips, Cree, or Seoul Semiconductor. These usually draw 350mA or more, are rated for total lumen output rather than millicandelas+viewing angle, and require a heat sink.

We eventually chose the low-current type of LEDs, because the lumen count we needed was low, and the need for heatsinking would significantly complicate matters. We ended up using the 30-LED version of the ASC 2012 reference lighting. The outer casing is ugly, so the LED PCB was removed using a utility knife.

Now we need a new lens. I originally looked for acrylic casting, but settled on Alumilite Water-Clear urethane casting material. This material is much stronger than acrylic and supposedly easier to use.

This product is designed to be used with rubber molds. Instead, I used our team’s sponsorship from Stratasys to 3D print some molds:

The molds are made as two halves which bolt together with AN3 fasteners. They are sanded, painted, and sanded again, then coated with mold release wax and PVA. Then, each part of the alumilite urethane is placed in a vacuum pot to remove bubbles. Then the two parts are mixed, and poured into the mold. Next, the lights are dipped into the mold, suspended by two metal pins. The wires are bent and taped around the outside of the mold to hold the LED lights in place. Then, the entire mold is placed in the vacuum pot for about one minute to pull out more bubbles, then the mold is moved to an oven and allowed to cure for 4 more hours at 60C. Using an oven is not strictly necessary but makes the process happen faster and makes sure it reaches full hardness.

Once cured, the lights are removed, epoxied into place in the shell, and bondo’d. Then they are bondo’d and masked before sending off to paint. The masked area is somewhat smaller than the size of the cast part, to make the lights look more seamless and allow room for bondo to fix any bumps.

The lights generally come out with a rather diffuse surface, however it becomes water clear after the clear coat of automotive paint. The finished product looks like this:


by Thomas Daede at October 27, 2012 03:19 PM

Ben Schwartz : It’s Google

I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

by Ben at June 29, 2012 05:10 PM

David Schleef : GStreamer Streaming Server Library

Introducing the GStreamer Streaming Server Library, or GSS for short.

This post was originally intended to be a release announcement, but I started to wander off to work on other projects before the release was 100% complete.  So perhaps this is a pre-announcement.  Or it’s merely an informational piece with the bonus that the source code repository is in a pretty stable and bug-free state at the moment.  I tagged it with “gss-0.5.0″.

What it is

GSS is a standalone HTTP server implemented as a library.  Its special focus is to serve live video streams to thousands of clients, mainly for use inside an HTML5 video tag.  It’s based on GStreamer, libsoup, and json-glib, and uses Bootstrap and BrowserID in the user interface.

GSS comes with a streaming server application that is essentially a small wrapper around the library.  This application is referred to as the Entropy Wave Streaming Server (ew-stream-server); the code that is now GSS was originally split out of this application.  The app can be found in the tools/ directory in the source tree.


  • Streaming formats: WebM, Ogg, MPEG-TS.  (FLV support is waiting for a flvparse element in GStreamer.)
  • Streams in different formats/sizes/bitrates are bundled into a single “program”.
  • Streaming to Flash via HTTP.
  • Authentication using BrowserID.
  • Automatic conversion from properly formed MPEG-TS to HTTP Live Streaming.
  • Automatic conversion to RTP/RTSP (Experiemental, works for Ogg/Theora/Vorbis only.)
  • Stream upload via HTTP PUT (3 different varieties), Icecast, raw TCP socket.
  • Stream pull from another HTTP streaming server.
  • Content protection via automatic one-time URLs.
  • (Experimental) Video-on-Demand stream types.
  • Per-stream, per-program, and server metrics.
  • HTTP configuration interface and REST API is used to control the server, allowing standalone operation and easy integration with other web servers.

What’s not there?

  • Other types of authentication, LDAP or other authorization.
  • RTMP support.  (Maybe some day, but there are several good open-source Flash servers out there already.)
  • Support for upload using HTTP PUT with no 100-Continue header.  Several HTTP libraries do this.
  • Decent VOD support, with rate-controlled streaming, burst start, and seeking.

The details


by David Schleef at June 14, 2012 06:21 PM

David Schleef : GStreamer backend for video in Firefox

Good news to hear that the GStreamer backend for video playback in Firefox has landed, due to a flurry of work by Alessandro Decina in the last few months.  Of course, this isn’t part of the standard Firefox build (but maybe some day?), but it’s very useful for putting Firefox on mobile and embedded platforms, since GStreamer has a well-established ecosystem of vendor-provided plugins for hardware decoding.

by David Schleef at April 29, 2012 10:19 PM