Xiph logo

Ben Schwartz : The Clock

I watched The Clock at the Boston Museum of Fine Arts from about 4:45 to 6:30 on Friday. My thoughts on The Clock:

The experiment works in part because scenes with clocks in them are usually frenetic. In a movie, the presence of a clock usually means someone is in a rush, and so most of the sequences convey urgency.

It’s often hard to spot the clock in each scene. In less exciting sequences, this serves as a game to pass the time. In many cases the clock in question is never in focus, or is moving too fast for the viewer to notice. The editors must have done careful freeze-frames and zoomed in on wristwatches to work out the indicated time.

The selected films are mostly in English, with a fair number in French and very few in any other languages. This feels fairly arbitrary to me.

Scenes from multiple films are often mixed within each segment. It seems like the editors adopted a relaxed rule, maybe something like: “if a clock appeared in an original, then a one minute window around the moment of appearance is fair game to include during that minute of The Clock, spliced together with other clips in any order”.

The editing makes heavy use of L cuts and audio crossfades to make the fairly random assortment of sources feel more cohesive.

I swear I saw a young Michael Cain at least twice in two different roles.

Some of the sources were distinctly low-fidelity, often due to framerate matching issues. I think this might be the first production I’ve seen that would really have benefited from a full Variable Frame Rate render and display pipeline.

I started to wonder about connections to deep learning. Could we train an image captioning network to identify images of clocks and watches, then run it on a massive video corpus to generate The Clock automatically?

Or, could we construct a spatial analogue to The Clock’s time-for-time conceit? How about a service that notifies you of a film clip shot at your current location? With a large GPS-tagged corpus (or a location-finder neural network) it might be possible to do this with pretty broad coverage.

by Ben at November 28, 2016 03:03 AM

Jean-Marc Valin : Opus 1.2-alpha is out!

We just released Opus 1.2-alpha. It's an alpha release for the upcoming Opus 1.2. It includes many quality improvements for low-bitrate speech and music. It also includes new features, as well as a large number of bug fixes. See the announcement for more details.

November 03, 2016 08:07 PM

Monty : Daala progress update 20160606: Revisiting Daala Technology Demos

I've not posted many entries in the past year-- practically none at all in fact. There's something entirely different about coding and writing prose that makes it difficult to shift between one and the other. Nor have I been particularly good (or even acceptably bad) about responding to questions and replies, especially those asking where Daala is, or where Daala is going in the era of AOM.

In the meantime, though, Jean-Marc has written another demo-style post that revisits many of the techniques we've tried out in Daala while applying the benefit of hindsight. I've got a number of my own comments I want to make about what he's written, but in the meantime it's an excellent technical summary.

by Monty (monty@xiph.org) at June 06, 2016 08:39 PM

Jean-Marc Valin : Revisiting Daala Technology Demos

Over the last three years, we have published a number of Daala technology demos. With pieces of Daala being contributed to the Alliance for Open Media's AV1 video codec, now seems like a good time to go back over the demos and see what worked, what didn't, and what changed compared to the description we made in the demos.

Read more!

June 06, 2016 07:43 PM

Jean-Marc Valin : A Deringing Filter for Daala... And Beyond

Deringing banner

Here's the latest addition to the Daala demo series. This demo describes the new Daala deringing filter that replaces a previous attempt with a less complex algorithm that performs much better. Those who like the know all the math details can also check out the full paper.

Read more!

April 07, 2016 10:31 AM

Silvia Pfeiffer : WebRTC predictions for 2016

I wrote these predictions in the first week of January and meant to publish them as encouragement to think about where WebRTC still needs some work. I’d like to be able to compare the state of WebRTC in the browser a year from now. Therefore, without further ado, here are my thoughts.

WebRTC Browser support

I’m quite optimistic when it comes to browser support for WebRTC. We have seen Edge bring in initial support last year and Apple looking to hire engineers to implement WebRTC. My prediction is that we will see the following developments in 2016:

  • Edge will become interoperable with Chrome and Firefox, i.e. it will publish VP8/VP9 and H.264/H.265 support
  • Firefox of course continues to support both VP8/VP9 and H.264/H.265
  • Chrome will follow the spec and implement H.264/H.265 support (to add to their already existing VP8/VP9 support)
  • Safari will enter the WebRTC space but only with H.264/H.265 support

Codec Observations

With Edge and Safari entering the WebRTC space, there will be a larger focus on H.264/H.265. It will help with creating interoperability between the browsers.

However, since there are so many flavours of H.264/H.265, I expect that when different browsers are used at different endpoints, we will get poor quality video calls because of having to negotiate a common denominator. Certainly, baseline will work interoperably, but better encoding quality and lower bandwidth will only be achieved if all endpoints use the same browser.

Thus, we will get to the funny situation where we buy ourselves interoperability at the cost of video quality and bandwidth. I’d call that a “degree of interoperability” and not the best possible outcome.

I’m going to go out on a limb and say that at this stage, Google is going to consider strongly to improve the case of VP8/VP9 by improving its bandwidth adaptability: I think they will buy themselves some SVC capability and make VP9 the best quality codec for live video conferencing. Thus, when Safari eventually follows the standard and also implements VP8/VP9 support, the interoperability win of H.264/H.265 will become only temporary overshadowed by a vastly better video quality when using VP9.

The Enterprise Boundary

Like all video conferencing technology, WebRTC is having a hard time dealing with the corporate boundary: firewalls and proxies get in the way of setting up video connections from within an enterprise to people outside.

The telco world has come up with the concept of SBCs (session border controller). SBCs come packed with functionality to deal with security, signalling protocol translation, Quality of Service policing, regulatory requirements, statistics, billing, and even media service like transcoding.

SBCs are a total overkill for a world where a large number of Web applications simply want to add a WebRTC feature – probably mostly to provide a video or audio customer support service, but it could be a live training session with call-in, or an interest group conference all.

We cannot install a custom SBC solution for every WebRTC service provider in every enterprise. That’s like saying we need a custom Web proxy for every Web server. It doesn’t scale.

Cloud services thrive on their ability to sell directly to an individual in an organisation on their credit card without that individual having to ask their IT department to put special rules in place. WebRTC will not make progress in the corporate environment unless this is fixed.

We need a solution that allows all WebRTC services to get through an enterprise firewall and enterprise proxy. I think the WebRTC standards have done pretty well with firewalls and connecting to a TURN server on port 443 will do the trick most of the time. But enterprise proxies are the next frontier.

What it takes is some kind of media packet forwarding service that sits on the firewall or in a proxy and allows WebRTC media packets through – maybe with some configuration that is necessary in the browsers or the Web app to add this service as another type of TURN server.

I don’t have a full understanding of the problems involved, but I think such a solution is vital before WebRTC can go mainstream. I expect that this year we will see some clever people coming up with a solution for this and a new type of product will be born and rolled out to enterprises around the world.


So these are my predictions. In summary, they address the key areas where I think WebRTC still has to make progress: interoperability between browsers, video quality at low bitrates, and the enterprise boundary. I’m really curious to see where we stand with these a year from now.

It’s worth mentioning Philipp Hancke’s tweet reply to my post:

— we saw some clever people come up with a solution already. Now it needs to be implemented :-)

by silvia at February 17, 2016 09:48 PM

Silvia Pfeiffer : My journey to Coviu

My new startup just released our MVP – this is the story of what got me here.

I love creating new applications that let people do their work better or in a manner that wasn’t possible before.

German building and loan socityMy first such passion was as a student intern when I built a system for a building and loan association’s monthly customer magazine. The group I worked with was managing their advertiser contacts through a set of paper cards and I wrote a dBase based system (yes, that long ago) that would manage their customer relationships. They loved it – until it got replaced by an SAP system that cost 100 times what I cost them, had really poor UX, and only gave them half the functionality. It was a corporate system with ongoing support, which made all the difference to them.

Dr Scholz und Partner GmbHThe story repeated itself with a CRM for my Uncle’s construction company, and with a resume and quotation management system for Accenture right after Uni, both of which I left behind when I decided to go into research.

Even as a PhD student, I never lost sight of challenges that people were facing and wanted to develop technology to overcome problems. The aim of my PhD thesis was to prepare for the oncoming onslaught of audio and video on the Internet (yes, this was 1994!) by developing algorithms to automatically extract and locate information in such files, which would enable users to structure, index and search such content.

Many of the use cases that we explored are now part of products or continue to be challenges: finding music that matches your preferences, identifying music or video pieces e.g. to count ads on the radio or to mark copyright infringement, or the automated creation of video summaries such as trailers.


This continued when I joined the CSIRO in Australia – I was working on segmenting speech into words or talk spurts since that would simplify captioning & subtitling, and on MPEG-7 which was a (slightly over-engineered) standard to structure metadata about audio and video.

In 2001 I had the idea of replicating the Web for videos: i.e. creating hyperlinked and searchable video-only experiences. We called it “Annodex” for annotated and indexed video and it needed full-screen hyperlinked video in browsers – man were we ahead of our time! It was my first step into standards, got several IETF RFCs to my name, and started my involvement with open codecs through Xiph.

vquence logoAround the time that YouTube was founded in 2006, I founded Vquence – originally a video search company for the Web, but pivoted to a video metadata mining company. Vquence still exists and continues to sell its data to channel partners, but it lacks the user impact that has always driven my work.

As the video element started being developed for HTML5, I had to get involved. I contributed many use cases to the W3C, became a co-editor of the HTML5 spec and focused on video captioning with WebVTT while contracting to Mozilla and later to Google. We made huge progress and today the technology exists to publish video on the Web with captions, making the Web more inclusive for everybody. I contributed code to YouTube and Google Chrome, but was keen to make a bigger impact again.

NICTA logoThe opportunity came when a couple of former CSIRO colleagues who now worked for NICTA approached me to get me interested in addressing new use cases for video conferencing in the context of WebRTC. We worked on a kiosk-style solution to service delivery for large service organisations, particularly targeting government. The emerging WebRTC standard posed many technical challenges that we addressed by building rtc.io , by contributing to the standards, and registering bugs on the browsers.

Fast-forward through the development of a few further custom solutions for customers in health and education and we are starting to see patterns of need emerge. The core learning that we’ve come away with is that to get things done, you have to go beyond “talking heads” in a video call. It’s not just about seeing the other person, but much more about having a shared view of the things that need to be worked on and a shared way of interacting with them. Also, we learnt that the things that are being worked on are quite varied and may include multiple input cameras, digital documents, Web pages, applications, device data, controls, forms.

Coviu logoSo we set out to build a solution that would enable productive remote collaboration to take place. It would need to provide an excellent user experience, it would need to be simple to work with, provide for the standard use cases out of the box, yet be architected to be extensible for specialised data sharing needs that we knew some of our customers had. It would need to be usable directly on Coviu.com, but also able to integrate with specialised applications that some of our customers were already using, such as the applications that they spend most of their time in (CRMs, practice management systems, learning management systems, team chat systems). It would need to require our customers to sign up, yet their clients to join a call without sign-up.

Collaboration is a big problem. People are continuing to get more comfortable with technology and are less and less inclined to travel distances just to get a service done. In a country as large as Australia, where 12% of the population lives in rural and remote areas, people may not even be able to travel distances, particularly to receive or provide recurring or specialised services, or to achieve work/life balance. To make the world a global village, we need to be able to work together better remotely.

The need for collaboration is being recognised by specialised Web applications already, such as the LiveShare feature of Invision for Designers, Codassium for pair programming, or the recently announced Dropbox Paper. Few go all the way to video – WebRTC is still regarded as a complicated feature to support.

Coviu in action

With Coviu, we’d like to offer a collaboration feature to every Web app. We now have a Web app that provides a modern and beautifully designed collaboration interface. To enable other Web apps to integrate it, we are now developing an API. Integration may entail customisation of the data sharing part of Coviu – something Coviu has been designed for. How to replicate the data and keep it consistent when people collaborate remotely – that is where Coviu makes a difference.

We have started our journey and have just launched free signup to the Coviu base product, which allows individuals to own their own “room” (i.e. a fixed URL) in which to collaborate with others. A huge shout out goes to everyone in the Coviu team – a pretty amazing group of people – who have turned the app from an idea to reality. You are all awesome!

With Coviu you can share and annotate:

  • images (show your mum photos of your last holidays, or get feedback on an architecture diagram from a customer),
  • pdf files (give a presentation remotely, or walk a customer through a contract),
  • whiteboards (brainstorm with a colleague), and
  • share an application window (watch a YouTube video together, or work through your task list with your colleagues).

All of these are regarded as “shared documents” in Coviu and thus have zooming and annotations features and are listed in a document tray for ease of navigation.

This is just the beginning of how we want to make working together online more productive. Give it a go and let us know what you think.


by silvia at October 27, 2015 10:08 AM

Monty : Announcing VP9 (And Opus and Ogg and Vorbis) support coming to Microsoft Edge

Oh look, an official announcement: http://blogs.windows.com/msedgedev/2015/09/08/announcing-vp9-support-coming-to-microsoft-edge/

Called it!

In any case, welcome to the party Microsoft. Feel free to help yourself to the open bar. :-)

(And, to be fair, this isn't as fantastically unlikely as some pundits have been saying. After all, MS does own an IP stake in Opus).

by Monty (monty@xiph.org) at September 08, 2015 11:23 PM

Monty : Comments on the Alliance for Open Media, or, "Oh Man, What a Day"

I assume folks who follow video codecs and digital media have already noticed the brand new Alliance for Open Media jointly announced by Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix. I expect the list of member companies to grow somewhat in the near future.

One thing that's come up several times today: People contacting Xiph to see if we're worried this detracts from the IETF's NETVC codec effort. The way the aomedia.org website reads right now, it might sound as if this is competing development. It's not; it's something quite different and complementary.

Open source codec developers need a place to collaborate on and share patent analysis in a forum protected by client-attorney privilege, something the IETF can't provide. AOMedia is to be that forum. I'm sure some development discussion will happen there, probably quite a bit in fact, but pooled IP review is the reason it exists.

It's also probably accurate to view the Alliance for Open Media (the Rebel Alliance?) as part of an industry pushback against the licensing lunacy made obvious by HEVCAdvance. Dan Rayburn at Streaming Media reports a third HEVC licensing pool is about to surface. To-date, we've not yet seen licensing terms on more than half of the known HEVC patents out there.

In any case, HEVC is becoming rather expensive, and yet increasingly uncertain licensing-wise. Licensing uncertainty gives responsible companies the tummy troubles. Some of the largest companies in the world are seriously considering starting over rather than bet on the mess...

Is this, at long last, what a tipping point feels like?

Oh, and one more thing--

As of today, just after Microsoft announced its membership in the Open Media Alliance, they also quietly changed the internal development status of Vorbis, Opus, WebM and VP9 to indicate they intend to ship all of the above in the new Windows Edge browser. Cue spooky X-files theme music.

by Monty (monty@xiph.org) at September 02, 2015 06:37 AM

Monty : Cisco introduces the Thor video codec

I want to mention this on its own, because it's a big deal, and so that other news items don't steal its thunder: Cisco has begun blogging about their own FOSS video codec project, also being submitted as an initial input to the IETF NetVC video codec working group effort:

"World, Meet Thor – a Project to Hammer Out a Royalty Free Video Codec"

Heh. Even an appropriate almost-us-like nutty headline for the post. I approve :-)

In a bit, I'll write more about what's been going on in the video codec realm in general. Folks have especially been asking lots of questions about HEVCAdvance licensing recently. But it's not their moment right now. Bask in the glow of more open development goodness, we'll get to talking about the 'other side' in a bit.

by Monty (monty@xiph.org) at August 11, 2015 08:29 PM

Monty : libvpx 1.4.0 ("Indian Runner Duck") officially released

Johann Koenig just posted that the long awaited libvpx 1.4.0 is officially released.

"Much like the Indian Runner Duck, the theme of this release is less bugs, less bloat and more speed...."

by Monty (monty@xiph.org) at April 03, 2015 09:20 PM

Monty : Today's WTF Moment: A Competing HEVC Licensing Pool

Had this happened next week, I'd have thought it was an April Fools' joke.

Out of nowhere, a new patent licensing group just announced it has formed a second, competing patent pool for HEVC that is independent of MPEG LA. And they apparently haven't decided what their pricing will be... maybe they'll have a fee structure ready in a few months.

Video on the Net (and let's be clear-- video's future is the Net) already suffers endless technology licensing problems. And the industry's solution is apparently even more licensing.

In case you've been living in a cave, Google has been trying to establish VP9 as a royalty- and strings-free alternative (new version release candidate just out this week!), and NetVC, our own next-next-generation royalty-free video codec, was just conditionally approved as an IETF working group on Tuesday and we'll be submitting our Daala codec as an input to the standardization process. The biggest practical question surrounding both efforts is 'how can you possibly keep up with the MPEG behemoth'?

Apparently all we have to do is stand back and let the dominant players commit suicide while they dance around Schroedinger's Cash Box.

by Monty (monty@xiph.org) at March 27, 2015 06:35 PM

Monty : Daala Blog-Like Update: Bug or feature? [or, the law of Unintentionally Intentional Behaviors]

Codec development is often an exercise in tracking down examples of "that's funny... why is it doing that?" The usual hope is that unexpected behaviors spring from a simple bug, and finding bugs is like finding free performance. Fix the bug, and things usually work better.

Often, though, hunting down the 'bug' is a frustrating exercise in finding that the code is not misbehaving at all; it's functioning exactly as designed. Then the question becomes a thornier issue of determining if the design is broken, and if so, how to fix it. If it's fixable. And the fix is worth it.

[continue reading at Xiph.Org....]

by Monty (monty@xiph.org) at March 26, 2015 04:49 PM

Silvia Pfeiffer : WebVTT Audio Descriptions for Elephants Dream

When I set out to improve accessibility on the Web and we started developing WebSRT – later to be renamed to WebVTT – I needed an example video to demonstrate captions / subtitles, audio descriptions, transcripts, navigation markers and sign language.

I needed a freely available video with spoken text that either already had such data available or that I could create it for. Naturally I chose “Elephants Dream” by the Orange Open Movie Project , because it was created under the Creative Commons Attribution 2.5 license.

As it turned out, the Blender Foundation had already created a collection of SRT files that would represent the English original as well as the translated languages. I was able to reuse them by merely adding a WEBVTT header.

Then there was a need for a textual audio description. I read up on the plot online and finally wrote up a time-alignd audio description. I’m hereby making that file available under the Create Commons Attribution 4.0 license. I’ve added a few lines to the medadata headers so it doesn’t confuse players. Feel free to reuse at will – I know there are others out there that have a similar need to demonstrate accessibility features.

by silvia at March 10, 2015 12:50 PM

Monty : A Fabulous Daala Holiday Update

Before we get into the update itself, yes, the level of magenta in that banner image got away from me just a bit. Then it was just begging for inappropriate abuse of a font...


Hey everyone! I just posted a Daala update that mostly has to do with still image performance improvements (yes, still image in a video codec. Go read it to find out why!). The update includes metric plots showing our improvement on objective metrics over the past year and relative to other codecs. Since objective metrics are only of limited use, there's also side-by-side interactive image comparisons against jpeg, vp8, vp9, x264 and x265.

The update text (and demo code) was originally for a July update, as still image work was mostly in the beginning of the year. That update get held up and hadn't been released officially, though it had been discovered by and discussed at forums like doom9. I regenerated the metrics and image runs to use latest versions of all the codecs involved (only Daala and x265 improved) for this official better-late-than-never progress report!

by Monty (monty@xiph.org) at December 23, 2014 09:09 PM

Monty : Daala Demo 6: Perceptual Vector Quantization (by J.M. Valin)

Jean-Marc has finished the sixth Daala demo page, this one about PVQ, the foundation of our encoding scheme in both Daala and Opus.

(I suppose this also means we've finally settled on what the acronym 'PVQ' stands for: Perceptual Vector Quantization. It's based on, and expanded from, an older technique called Pyramid Vector Quantization, and we'd kept using 'PVQ' for years even though our encoding space was actually spherical. I'd suggested we call it 'Pspherical Vector Quantization' with a silent P so that we could keep the acronym, and that name appears in some of my slide decks. Don't get confused, it's all the same thing!)

by Monty (monty@xiph.org) at November 19, 2014 08:12 PM

Jean-Marc Valin : Daala: Perceptual Vector Quantization (PVQ)

Here's my new contribution to the Daala demo effort. Perceptual Vector Quantization has been one of the core ideas in Daala, so it was time for me to explain how it works. The details involve lots of maths, but hopefully this demo will make the general idea clear enough. I promise that the equations in the top banner are the only ones you will see!

Read more!

November 18, 2014 08:37 PM

Monty : Intra-Paint: A new Daala demo from Jean-Marc Valin

Intra paint is not a technique that's part of the original Daala plan and, as of right now, we're not currently using it in Daala. Jean-Marc envisioned it as a simpler, potentially more effective replacement for intra-prediction. That didn't quite work out-- but it has useful and visually pleasing qualities that, of all things, make it an interesting postprocessing filter, especially for deringing.

Several people have said 'that should be an Instagram filter!' I'm sure Facebook could shake a few million loose for us to make that happen ;-)

by Monty (monty@xiph.org) at September 24, 2014 07:36 PM

Jean-Marc Valin : Daala: Painting Images For Fun (and Profit?)

As a contribution to Monty's Daala demo effort, I decided to demonstrate a technique I've recently been developing for Daala: image painting. The idea is to represent images as directions and 1-D patterns.

Read more!

September 24, 2014 05:24 PM

Jean-Marc Valin : Opus beats all other codecs. Again.

Three years ago Opus got rated higher than HE-AAC and Vorbis in a 64 kb/s listening test. Now, the results of the recent 96 kb/s listening test are in and Opus got the best ratings, ahead of AAC-LC and Vorbis. Also interesting, Opus at 96 kb/s sounded better than MP3 at 128 kb/s.

Full plot

September 18, 2014 01:32 AM

Silvia Pfeiffer : Progress with rtc.io

At the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.


Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.

WDCNZ presentation page5

One of the most exciting opportunities is still under-exploited: the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.

Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.


We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections:

Packages per day across popular platforms (Shamelessly copied from: http://blog.nodejitsu.com/npm-innovation-through-modularity/)

For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.

For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.

Using rtc.io rtc JS library
Using rtc.io rtc JS library

So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course: enjoy WebRTC! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team!

On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating!!

by silvia at August 12, 2014 07:06 AM

Silvia Pfeiffer : AppRTC : Google’s WebRTC test app and its parameters

If you’ve been interested in WebRTC and haven’t lived under a rock, you will know about Google’s open source testing application for WebRTC: AppRTC.

When you go to the site, a new video conferencing room is automatically created for you and you can share the provided URL with somebody else and thus connect (make sure you’re using Google Chrome, Opera or Mozilla Firefox).

We’ve been using this application forever to check whether any issues with our own WebRTC applications are due to network connectivity issues, firewall issues, or browser bugs, in which case AppRTC breaks down, too. Otherwise we’re pretty sure to have to dig deeper into our own code.

Now, AppRTC creates a pretty poor quality video conference, because the browsers use a 640×480 resolution by default. However, there are many query parameters that can be added to the AppRTC URL through which the connection can be manipulated.

Here are my favourite parameters:

  • hd=true : turns on high definition, ie. minWidth=1280,minHeight=720
  • stereo=true : turns on stereo audio
  • debug=loopback : connect to yourself (great to check your own firewalls)
  • tt=60 : by default, the channel is closed after 30min – this gives you 60 (max 1440)

For example, here’s how a stereo, HD loopback test would look like: https://apprtc.appspot.com/?r=82313387&hd=true&stereo=true&debug=loopback .

This is not the limit of the available parameter, though. Here are some others that you may find interesting for some more in-depth geekery:

  • ss=[stunserver] : in case you want to test a different STUN server to the default Google ones
  • ts=[turnserver] : in case you want to test a different TURN server to the default Google ones
  • tp=[password] : password for the TURN server
  • audio=true&video=false : audio-only call
  • audio=false : video-only call
  • audio=googEchoCancellation=false,googAutoGainControl=true : disable echo cancellation and enable gain control
  • audio=googNoiseReduction=true : enable noise reduction (more Google-specific parameters)
  • asc=ISAC/16000 : preferred audio send codec is ISAC at 16kHz (use on Android)
  • arc=opus/48000 : preferred audio receive codec is opus at 48kHz
  • dtls=false : disable datagram transport layer security
  • dscp=true : enable DSCP
  • ipv6=true : enable IPv6

AppRTC’s source code is available here. And here is the file with the parameters (in case you want to check if they have changed).

Have fun playing with the main and always up-to-date WebRTC application: AppRTC.

UPDATE 12 May 2014

AppRTC now also supports the following bitrate controls:

  • arbr=[bitrate] : set audio receive bitrate
  • asbr=[bitrate] : set audio send bitrate
  • vsbr=[bitrate] : set video receive bitrate
  • vrbr=[bitrate] : set video send bitrate

Example usage: https://apprtc.appspot.com/?r=&asbr=128&vsbr=4096&hd=true

by silvia at July 23, 2014 05:02 AM

Ben Schwartz : Gooseberry

Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.

by Ben at May 07, 2014 04:16 AM

Ben Schwartz : Stingy/stylish

My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.


by Ben at February 18, 2014 02:58 AM

Ben Schwartz : Efficiency

So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11″ output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

by Ben at January 21, 2014 07:00 AM

Thomas Daede : Zynq DMA performance

I finally got my inverse transform up to snuff, complete with hardware matrix transposer. The transposer was trivial to implement – I realized that I could use the Xilinx data width converter IP to register entire 4×4 blocks at once, allowing my transposer to simply be a bunch of wires (assign statements in Verilog).

Screenshot from 2014-01-08 22:04:27

Unfortunately, I wasn’t getting the performance I was expecting. At a clock speed of 100MHz and a 64-bit width, I expected to be able to perform 25 million transforms per second. However, I was having trouble even getting 4 million. To debug the problem, I used the Xilinx debug cores in Vivado:


There are several problems. Here’s an explanation of what is happening in the above picture:

  1. The CPU configures the DMA registers and starts the transfer. This works for a few clock cycles.
  2. The Stream to Memory DMA (s2mm) starts a memory transfer, but its FIFOs fill up almost immediately and it has to stall (tready goes low).
  3. The transform stream pipeline also stalls, making its tready go low.
  4. The s2mm DMA is able to start its first burst transfer, and everything goes smoothly.
  5. The CPU sees that the DMA has completed, and schedules the second pass. The turnaround time for this is extremely large, and ends up taking the majority of the time.
  6. The same process happens again, but the latency is even larger due to writing to system memory.

Fortunately, the solution isn’t that complicated. I am going to switch to a scatter-gather DMA engine, which allows me to construct a request chain, and then the DMA will execute the operations without CPU intervention, avoiding the CPU latency. In addition, a FIFO can be used to reduce the impact of the initial write latency somewhat, though this costs FPGA area and it might be better just to strive for longer DMA requests.

There are other problems with my memory access at the moment – the most egregious being that my hardware expects a tiled buffer, but the Daala reference implementation uses linear buffers everywhere. This is the problem that I plan to tackle next.

by Thomas Daede at January 09, 2014 04:26 AM

Silvia Pfeiffer : Use deck.js as a remote presentation tool

deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.

Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.

I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.

I loaded a slide master URL:
and the room loaded the URL without query string:

Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.

How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!

We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!

Here are the few things we added to deck.js to make it work:

  • Code added to index.html to make the video connection work:
    <meta name="rtc-signalhost" content="http://rtc.io/switchboard/">
    <meta name="rtc-room" content="lca2014">
    <video id="localV" rtc-capture="camera" muted></video>
    <video id="peerV" rtc-peer rtc-stream="localV"></video>
    <script src="glue.js"></script>
    glue.config.iceServers = [{ url: 'stun:stun.l.google.com:19302' }];

    The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.

  • Added glue.js library to deck.js:

    Downloaded from https://raw.github.com/rtc-io/rtc-glue/master/dist/glue.js into the source directory of deck.js.

  • Code added to index.html to synchronize slide navigation:
    glue.events.once('connected', function(signaller) {
      if (location.search.slice(1) !== '') {
        $(document).bind('deck.change', function(evt, from, to) {
          signaller.send('/slide', {
            idx: to,
            sender: signaller.id
      signaller.on('slide', function(data) {
        console.log('received notification to change to slide: ', data.idx);
        $.deck('go', data.idx);

    This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.

And that’s it!

You can find my slide deck on GitHub.

Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!

Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.

Many thanks to Damon Oehlman for his help in getting this working.

BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. 😉

by silvia at January 08, 2014 02:28 AM

Maik Merten : Tinkering with a H.261 encoder

On the rtcweb mailing list the struggle regarding what Mandatory To Implement (MTI) video codec should be chosen rages on. One camp favors H.264 ("We cannot have VP8"), the other VP8 ("We cannot have H.264"). Some propose that there should be a "safe" fallback codec anyone can implement, and H.261 is "as old and safe" as it can get. H.261 was specified in the final years of the 1980ies and is generally believed to have no non-expired patents left standing. Roughly speaking, this old gem of coding technology can transport CIF resolution (352x288) video at full framerate (>= 25 fps) with (depending on your definition) acceptable quality starting roughly in the 250 to 500 kbit/s range (basically, I've witnessed quite some Skype calls with similar perceived effective resolution, mostly driven by mediocre webcams, and I can live with that as long as the audio part is okay). From today's perspective, H.261 is very very light on computation, memory, and code footprint.

H.261 is, of course, outgunned by any semi-decent more modern video codec, which can, for instance, deliver video with higher resolution at similar bitrates. Those, however, don't have the luxury of having their patents expired with "as good as it can be" certainty.

People on the rtcweb list were quick to point out that having an encoder with modern encoding techniques may by itself be problematic regarding patents. Thankfully, for H.261, a public domain reference-style en- and decoder from 1993 can be found, e.g., at http://wftp3.itu.int/av-arch/video-site/h261/PVRG_Software/P64v1.2.2.tar - so that's a nice reference on what encoding technologies were state-of-the-art in the early 1990ies.

With some initial patching done by Ron Lee this old code builds quite nicely on modern platforms - and as it turns out, the encoder also produces intact video, even on x86_64 machines or on a Raspberry Pi. Quite remarkably portable C code (although not the cleanest style-wise). The original code is slow, though: It barely does realtime encoding of 352x288 video on a 1.65 GHz netbook, and I can barely imagine having it encode video on machines from 1993! Some fun was had in making it faster (it's now about three times faster than before) and the program can now encode from and decode to YUV4MPEG2 files, which is quite a lot more handy than the old mode of operation (still supported), where each frame would consist of three files (.Y, .U, .V).

For those interested, the patched up code is available at https://github.com/maikmerten/p64 - however, be aware that the original coding style (yay, global variables) is still alive and well.

So, is it "useful" to resurrect such an old codebase? Depends on the definition of "useful". For me, as long as it is fun and teaches me something, it's reasonably useful.

So is it fun to toy around with this ancient coding technology? Yes, especially as most modern codecs still follow the same overall design, but H.261 is the most basic instance of "modern" video coding and thus most easy to grasp. Who knows, with some helpful comments here and there that old codebase could be used for teaching basic principles of video coding.

January 07, 2014 07:35 PM

Thomas Daede : Off by one

My hardware had a bug that made about 50% of the outputs off by one. I compared my Verilog code to the original C and it was a one-for-one match, with the exception of a few OD_DCT_RSHIFT that I translated into arithmetic shifts. That turned out to break the transform. Looking at the definition of OD_DCT_RSHIFT:

/*This should translate directly to 3 or 4 instructions for a constant _b:
#define OD_UNBIASED_RSHIFT(_a,_b) ((_a)+(((1<<(_b))-1)&-((_a)<0))>>(_b))*/
/*This version relies on a smart compiler:*/
# define OD_UNBIASED_RSHIFT(_a, _b) ((_a)/(1<<(_b)))

I had always thought a divide by a power of two equals a shift, but this is wrong: an integer divide rounds towards zero, whereas a shift rounds towards negative infinity. The solution is simple: if the value to be shifted is less than zero, add a mask before shifting. Rather than write this logic in Verilog, I simply switched my code to the / operator as in the C code above, and XST inferred the correct logic. After verifying operation with random inputs, I also wrote a small benchmark to test the performance of my hardware:

[root@alarm hwtests]# ./idct4_test 
Filling input buffer... Done.
Running software benchmark... Done.
Time: 0.030000 s
Running hardware benchmark... Done.
Time: 0.960000 s

Not too impressive, but the implementation is super basic, so it’s not unsurprising. Most of the time is spent shuffling data across the extremely slow MMIO interface.

At the same time I was trying to figure out why the 16-bit version of the intra predictor performed uniformly worse than the double precision version – I thought 16 bits ought to be enough. The conversion is done by expanding the range to fill a 16 bit integer and then rounding:

OD_PRED_WEIGHTS_4x4_FIXED[i] = (od_coeff)floor(OD_PRED_WEIGHTS_4x4[i] * 32768 + 0.5);

The 16 bit multiplies with the 16 bit coefficients are summed in a 32 bit accumulator. The result is then truncated to the original range. I did this with a right shift – after the previous ordeal, I tried swapping it with the “round towards zero” macro. Here are the results:


The new 16 bit version even manages to outperform the double version slightly. I believe the reason why the “round to zero” does better than simply rounding down is because it tends to create a slightly negative bias in the encoded coefficients, decreasing coding gain.

by Thomas Daede at January 03, 2014 12:39 AM

Thomas Daede : Daala 4-input idct (tenatively) working!

I’ve implemented Daala’s smallest inverse transform in Verilog code. It appears as an AXI-Lite slave, with 2 32-bit registers for input and two for output. Right now it can do one transform per clock cycle, though at a pitiful 20MHz. I also haven’t verified that all of its output values are identical to the C version yet, but it passes preliminary testing with peek and poke.

Screenshot from 2014-01-01 23:23:22

I also have yet to figure out why XST only uses one DSP block even though my design has 3 multipliers…

by Thomas Daede at January 02, 2014 05:25 AM

Silvia Pfeiffer : New Challenges

I finished up at Google last week and am now working at NICTA, an Australian ICT research institute.

My work with Google was exciting and I learned a lot. I like to think that Google also got a lot out of me – I coded and contributed to some YouTube caption features, I worked on Chrome captions and video controls, and above all I worked on video accessibility for HTML at the W3C.

I was one of the key authors of the W3C Media Accessibility Requirements document that we created in the Media Accessibility Task Force of the W3C HTML WG. I then went on to help make video accessibility a reality. We created WebVTT and the <track> element and applied it to captions, subtitles, chapters (navigation), video descriptions, and metadata. To satisfy the need for synchronisation of video with other media resources such as sign language video or audio descriptions, we got the MediaController object and the @mediagroup attribute.

I must say it was a most rewarding time. I learned a lot about being productive at Google, collaborate successfully over the distance, about how the WebKit community works, and about the new way of writing W3C standard (which is more like pseudo-code). As one consequence, I am now a co-editor of the W3C HTML spec and it seems I am also about to become the editor of the WebVTT spec.

At NICTA my new focus of work is WebRTC. There is both a bit of research and a whole bunch of application development involved. I may even get to do some WebKit development, if we identify any issues with the current implementation. I started a week ago and am already amazed by the amount of work going on in the WebRTC space and the amazing number of open source projects playing around with it. Video conferencing is a new challenge and I look forward to it.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : Video Conferencing in HTML5: WebRTC via Socket.io

Six months ago I experimented with Web sockets for WebRTC and the early implementations of PeerConnection in Chrome. Last week I gave a presentation about WebRTC at Linux.conf.au, so it was time to update that codebase.

I decided to use socket.io for the signalling following the idea of Luc, which made the server code even smaller and reduced it to a mere reflector:

 var app = require('http').createServer().listen(1337);
 var io = require('socket.io').listen(app);

 io.sockets.on('connection', function(socket) {
         socket.on('message', function(message) {
         socket.broadcast.emit('message', message);

Then I turned to the client code. I was surprised to see the massive changes that PeerConnection has gone through. Check out my slide deck to see the different components that are now necessary to create a PeerConnection.

I was particularly surprised to see the SDP object now fully exposed to JavaScript and thus the ability to manipulate it directly rather than through some API. This allows Web developers to manipulate the type of session that they are asking the browsers to set up. I can imaging e.g. if they have support for a video codec in JavaScript that the browser does not provide built-in, they can add that codec to the set of choices to be offered to the peer. While it is flexible, I am concerned if this might create more problems than it solves. I guess we’ll have to wait and see.

I was also surprised by the need to use ICE, even though in my experiment I got away with an empty list of ICE servers – the ICE messages just got exchanged through the socket.io server. I am not sure whether this is a bug, but I was very happy about it because it meant I could run the whole demo on a completely separate network from the Internet.

The most exciting news since my talk is that Mozilla and Google have managed to get a PeerConnection working between Firefox and Chrome – this is the first cross-browser video conference call without a plugin! The code differences are minor.

Since the specification of the WebRTC API and of the MediaStream API are now official Working Drafts at the W3C, I expect other browsers will follow. I am also looking forward to the possibilities of:

The best places to learn about the latest possibilities of WebRTC are webrtc.org and the W3C WebRTC WG. code.google.com has open source code that continues to be updated to the latest released and interoperable features in browsers.

The video of my talk is in the process of being published. There is a MP4 version on the Linux Australia mirror server, but I expect it will be published properly soon. I will update the blog post when that happens.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : The use cases for a element in HTML

The W3C HTML WG and the WHATWG are currently discussing the introduction of a <main> element into HTML.

The <main> element has been proposed by Steve Faulkner and is specified in a draft extension spec which is about to be accepted as a FPWD (first public working draft) by the W3C HTML WG. This implies that the W3C HTML WG will be looking for implementations and for feedback by implementers on this spec.

I am supportive of the introduction of a <main> element into HTML. However, I believe that the current spec and use case list don’t make a good enough case for its introduction. Here are my thoughts.

Main use case: accessibility

In my opinion, the main use case for the introduction of <main> is accessibility.

Like any other users, when blind users want to perceive a Web page/application, they need to have a quick means of grasping the content of a page. Since they cannot visually scan the layout and thus determine where the main content is, they use accessibility technology (AT) to find what is known as “landmarks”.

“Landmarks” tell the user what semantic content is on a page: a header (such as a banner), a search box, a navigation menu, some asides (also called complementary content), a footer, …. and the most important part: the main content of the page. It is this main content that a blind user most often wants to skip to directly.

In the days of HTML4, a hidden “skip to content” link at the beginning of the Web page was used as a means to help blind users access the main content.

In the days of ARIA, the aria @role=main enables authors to avoid a hidden link and instead mark the element where the main content begins to allow direct access to the main content. This attribute is supported by AT – in particular screen readers – by making it part of the landmarks that AT can directly skip to.

Both the hidden link and the ARIA @role=main approaches are, however, band aids: they are being used by those of us that make “finished” Web pages accessible by adding specific extra markup.

A world where ARIA is not necessary and where accessibility developers would be out of a job because the normal markup that everyone writes already creates accessible Web sites/applications would be much preferable over the current world of band-aids.

Therefore, to me, the primary use case for a <main> element is to achieve exactly this better world and not require specialized markup to tell a user (or a tool) where the main content on a page starts.

An immediate effect would be that pages that have a <main> element will expose a “main” landmark to blind and vision-impaired users that will enable them to directly access that main content on the page without having to wade through other text on the page. Without a <main> element, this functionality can currently only be provided using heuristics to skip other semantic and structural elements and is for this reason not typically implemented in AT.

Other use cases

The <main> element is a semantic element not unlike other new semantic elements such as <header>, <footer>, <aside>, <article>, <nav>, or <section>. Thus, it can also serve other uses where the main content on a Web page/Web application needs to be identified.

Data mining

For data mining of Web content, the identification of the main content is one of the key challenges. Many scholarly articles have been published on this topic. This stackoverflow article references and suggests a multitude of approaches, but the accepted answer says “there’s no way to do this that’s guaranteed to work”. This is because Web pages are inherently complex and many <div>, <p>, <iframe> and other elements are used to provide markup for styling, notifications, ads, analytics and other use cases that are necessary to make a Web page complete, but don’t contribute to what a user consumes as semantically rich content. A <main> element will allow authors to pro-actively direct data mining tools to the main content.

Search engines

One particularly important “data mining” tool are search engines. They, too, have a hard time to identify which sections of a Web page are more important than others and employ many heuristics to do so, see e.g. this ACM article. Yet, they still disappoint with poor results pointing to findings of keywords in little relevant sections of a page rather than ranking Web pages higher where the keywords turn up in the main content area. A <main> element would be able to help search engines give text in main content areas a higher weight and prefer them over other areas of the Web page. It would be able to rank different Web pages depending on where on the page the search words are found. The <main> element will be an additional hint that search engines will digest.

Visual focus

On small devices, the display of Web pages designed for Desktop often causes confusion as to where the main content can be found and read, in particular when the text ends up being too small to be readable. It would be nice if browsers on small devices had a functionality (maybe a default setting) where Web pages would start being displayed as zoomed in on the main content. This could alleviate some of the headaches of responsive Web design, where the recommendation is to show high priority content as the first content. Right now this problem is addressed through stylesheets that re-layout the page differently depending on device, but again this is a band-aid solution. Explicit semantic markup of the main content can solve this problem more elegantly.


Finally, naturally, <main> would also be used to style the main content differently from others. You can e.g. replace a semantically meaningless <div id=”main”> with a semantically meaningful <main> where their position is identical. My analysis below shows, that this is not always the case, since oftentimes <div id=”main”> is used to group everything together that is not the header – in particular where there are multiple columns. Thus, the ease of styling a <main> element is only a positive side effect and not actually a real use case. It does make it easier, however, to adapt the style of the main content e.g. with media queries.

Proposed alternative solutions

It has been proposed that existing markup serves to satisfy the use cases that <main> has been proposed for. Let’s analyse these on some of the most popular Web sites. First let’s list the propsed algorithms.

Proposed solution No 1: Scooby-Doo

On Sat, Nov 17, 2012 at 11:01 AM, Ian Hickson <ian@hixie.ch> wrote:
| The main content is whatever content isn't
| marked up as not being main content (anything not marked up with <header>,
| <aside>, <nav>, etc).

This implies that the first element that is not a <header>, <aside>, <nav>, or <footer> will be the element that we want to give to a blind user as the location where they should start reading. The algorithm is implemented in https://gist.github.com/4032962.

Proposed solution No 2: First article element

On Sat, Nov 17, 2012 at 8:01 AM, Ian Hickson  wrote:
| On Thu, 15 Nov 2012, Ian Yang wrote:
| >
| > That's a good idea. We really need an element to wrap all the <p>s,
| > <ul>s, <ol>s, <figure>s, <table>s ... etc of a blog post.
| That's called <article>.

This approach identifies the first <article> element on the page as containing the main content. Here’s the algorithm for this approach.

Proposed solution No 3: An example heuristic approach

The readability plugin has been developed to make Web pages readable by essentially removing all the non-main content from a page. An early source of readability is available. This demonstrates what a heuristic approach can perform.

Analysing alternative solutions


I’ve picked 4 typical Websites (top on Alexa) to analyse how these three different approaches fare. Ideally, I’d like to simply apply the above three scripts and compare pictures. However, since the semantic HTML5 elements <header>, <aside>, <nav>, and <footer> are not actually used by any of these Web sites, I don’t actually have this choice.

So, instead, I decided to make some assumptions of where these semantic elements would be used and what the outcome of applying the first two algorithms would be. I can then compare it to the third, which is a product so we can take screenshots.


http://google.com – search for “Scooby Doo”.

The search results page would likely be built with:

  • a <nav> menu for the Google bar
  • a <header> for the search bar
  • another <header> for the login section
  • another <nav> menu for the search types
  • a <div> to contain the rest of the page
  • a <div> for the app bar with the search number
  • a few <aside>s for the left and right column
  • a set of <article>s for the search results
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element before the app bar in this case. Interestingly, there is a <div @id=main> already in the current Google results page, which “Scooby Doo” would likely also pick. However, there are a nav bar and two asides in this div, which clearly should not be part of the “main content”. Google actually placed a @role=main on a different element, namely the one that encapsulates all the search results.

“First Article” would find the first search result as the “main content”. While not quite the same as what Google intended – namely all search results – it is close enough to be useful.

The “readability” result is interesting, since it is not able to identify the main text on the page. It is actually aware of this problem and brings a warning before displaying this page:

Readability of google.com



A user page would likely be built with:

  • a <header> bar for the search and login bar
  • a <div> to contain the rest of the page
  • an <aside> for the left column
  • a <div> to contain the center and right column
  • an <aside> for the right column
  • a <header> to contain the center column “megaphone”
  • a <div> for the status posting
  • a set of <article>s for the home stream
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains all three columns. It’s actually a <div @id=content> already in the current Facebook user page, which “Scooby Doo” would likely also pick. However, Facebook selected a different element to place the @role=main : the center column.

“First Article” would find the first news item in the home stream. This is clearly not what Facebook intended, since they placed the @role=main on the center column, above the first blog post’s title. “First Article” would miss that title and the status posting.

The “readability” result again disappoints but warns that it failed:



A video page would likely be built with:

  • a <header> bar for the search and login bar
  • a <nav> for the menu
  • a <div> to contain the rest of the page
  • a <header> for the video title and channel links
  • a <div> to contain the video with controls
  • a <div> to contain the center and right column
  • an <aside> for the right column with an <article> per related video
  • an <aside> for the information below the video
  • a <article> per comment below the video
“Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div @id=content> already in the current YouTube video page, which “Scooby Doo” would likely also pick. However, YouTube’s related videos and comments are unlikely to be what the user would regard as “main content” – it’s the video they are after, which generously has a <div id=watch-player>.

“First Article” would find the first related video or comment in the home stream. This is clearly not what YouTube intends.

The “readability” result is not quite as unusable, but still very bare:


http://wikipedia.com (“Overscan” page)

A Wikipedia page would likely be built with:

  • a <header> bar for the search, login and menu items
  • a <div> to contain the rest of the page
  • an &ls; article> with title and lots of text
  • <article> an <aside> with the table of contents
  • several <aside>s for the left column
Good news: “Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div id=”content” role=”main”> element on Wikipedia, which “Scooby Doo” would likely also pick.

“First Article” would find the title and text of the main element on the page, but it would also include an <aside>.

The “readability” result is also in agreement.


In the following table we have summarised the results for the experiments:

Site Scooby-Doo First article Readability
Facebook.com FAIL FAIL FAIL

Clearly, Wikipedia is the prime example of a site where even the simple approaches find it easy to determine the main content on the page. WordPress blogs are similarly successful. Almost any other site, including news sites, social networks and search engine sites are petty hopeless with the proposed approaches, because there are too many elements that are used for layout or other purposes (notifications, hidden areas) such that the pre-determined list of semantic elements that are available simply don’t suffice to mark up a Web page/application completely.


It seems that in general it is impossible to determine which element(s) on a Web page should be the “main” piece of content that accessibility tools jump to when requested, that a search engine should put their focus on, or that should be highlighted to a general user to read. It would be very useful if the author of the Web page would provide a hint through a <main> element where that main content is to be found.

I think that the <main> element becomes particularly useful when combined with a default keyboard shortcut in browsers as proposed by Steve: we may actually find that non-accessibility users will also start making use of this shortcut, e.g. to get to videos on YouTube pages directly without having to tab over search boxes and other interactive elements, etc. Worthwhile markup indeed.

by silvia at January 01, 2014 08:53 AM

Silvia Pfeiffer : What is “interoperable TTML”?

I’ve just tried to come to terms with the latest state of TTML, the Timed Text Markup Language.

TTML has been specified by the W3C Timed Text Working Group and released as a RECommendation v1.0 in November 2010. Since then, several organisations have tried to adopt it as their caption file format. This includes the SMPTE, the EBU (European Broadcasting Union), and Microsoft.

Both, Microsoft and the EBU actually looked at TTML in detail and decided that in order to make it usable for their use cases, a restriction of its functionalities is needed.


The EBU released EBU-TT, which restricts the set of valid attributes and feature. “The EBU-TT format is intended to constrain the features provided by TTML, especially to make EBU-TT more suitable for the use with broadcast video and web video applications.” (see EBU-TT).

In addition, EBU-specific namespaces were introduce to extend TTML with EBU-specific data types, e.g. ebuttdt:frameRateMultiplierType or ebuttdt:smpteTimingType. Similarly, a bunch of metadata elements were introduced, e.g. ebuttm:documentMetadata, ebuttm:documentEbuttVersion, or ebuttm:documentIdentifier.

The use of namespaces as an extensibility mechanism will ascertain that EBU-TT files continue to be valid TTML files. However, any vanilla TTML parser will not know what to do with these custom extensions and will drop them on the floor.

Simple Delivery Profile

With the intention to make TTML ready for “internet delivery of Captions originated in the United States”, Microsoft proposed a “Simple Delivery Profile for Closed Captions (US)” (see Simple Profile). The Simple Profile is also a restriction of TTML.

Unfortunately, the Microsoft profile is not the same as the EBU-TT profile: for example, it contains the “set” element, which is not conformant in EBU-TT. Similarly, the supported style features are different, e.g. Simple Profile supports “display-region”, while EBU-TT does not. On the other hand, EBU-TT supports monospace, sans-serif and serif fonts, while the Simple profile does not.

Thus files created for the Simple Delivery Profile will not work on players that expect EBU-TT and the reverse.

Fortunately, the Simple Delivery Profile does not introduce any new namespaces and new features, so at least it is an explicit subpart of TTML and not both a restriction and extension like EBU-TT.


SMPTE also created a version of the TTML standard called SMPTE-TT. SMPTE did not decide on a subset of TTML for their purposes – it was simply adopted as a complete set. “This Standard provides a framework for timed text to be supported for content delivered via broadband means,…” (see SMPTE-TT).

However, SMPTE extended TTML in SMPTE-TT with an ability to store a binary blob with captions in another format. This allows using SMPTE-TT as a transport format for any caption format and is deemed to help with “backwards compatibility”.

Now, instead of specifying a profile, SMPTE decided to define how to convert CEA-608 captions to SMPTE-TT. Even if it’s not called a “profile”, that’s actually what it is. It even has its own namespace: “m608:”.


With all these different versions of TTML, I ask myself what a video player that claims support for TTML will do to get something working. The only chance it has is to implement all the extensions defined in all the different profiles. I pity the player that has to deal with a SMPTE-TT file that has a binary blob in it and is expected to be able to decode this.

Now, what is a caption author supposed to do when creating TTML? They obviously cannot expect all players to be able to play back all TTML versions. Should they create different files depending on what platform they are targeting, i.e. a EBU-TT version, a SMPTE-TT version, a vanilla TTML version, and a Simple Delivery Profile version? Should they by throwing all the features of all the versions into one TTML file and hope that the players will pick out the right things that they require and drop the rest on the floor?

Maybe the best way to progress would be to make a list of the “safe” features: those features that every TTML profile supports. That may be the best way to get an “interoperable TTML” file. Here’s me hoping that this minimal set of features doesn’t just end up being the usual (starttime, endtime, text) triple.


I just found out that UltraViolet have their own profile of SMPTE-TT called CFF-TT (see UltraViolet FAQ and spec). They are making some SMPTE-TT fields optional, but introduce a new @forcedDisplayMode attribute under their own namespace “cff:”.

by silvia at January 01, 2014 08:53 AM

Thomas Daede : Quantized intra predictor

Daala’s intra predictor currently uses doubles. Floating point math units are really expensive in hardware, and so is loading 64 bit weights. Therefore, I modified Daala to see what would happen if the weights were rounded to signed 16 bit. The result is below:ssimRed is before quantization, green after. This is too much loss – I’ll have to figure out why this happened. Worst case I move to 32 bit weights, though maybe my floor(+0.5) method of rounding is also suspect? Maybe the intra weights should be trained taking quantization into account?

by Thomas Daede at December 31, 2013 11:37 PM

Thomas Daede : First Zynq bitstream working!

Screenshot from 2013-12-31 14:51:20I got my first custom PL hardware working! Following the Zedboard tutorials, it was relatively straightforward, though using Vivado 2013.3 required a bit of playing around – I ended up making my own clock sources and reset controller until I realized that the Zynq PS had them if you enabled them. Next up: ChipScope or whatever it’s called in Vivado.

I crashed the chip numerous times until realizing that the bitstream file name had changed somewhere in the process, so I was uploading an old version of the bitstream….

by Thomas Daede at December 31, 2013 08:56 PM

Thomas Daede : Daala profiling on ARM

I reran the same decoding as yesterday, but this time on the Zynq Cortex-A9 instead of x86. Following is the histogram data, again with the functions I plan to accelerate highlighted:

 19.60%  lt-dump_video  [.] od_intra_pred16x16_mult
  6.66%  lt-dump_video  [.] od_intra_pred8x8_mult
  6.02%  lt-dump_video  [.] od_bin_idct16
  4.88%  lt-dump_video  [.] .divsi3_skip_div0_test
  4.54%  lt-dump_video  [.] od_bands_from_raster
  4.21%  lt-dump_video  [.] laplace_decode
  4.03%  lt-dump_video  [.] od_chroma_pred
  3.92%  lt-dump_video  [.] od_raster_from_bands
  3.66%  lt-dump_video  [.] od_post_filter16
  3.20%  lt-dump_video  [.] od_intra_pred4x4_mult
  3.09%  lt-dump_video  [.] od_apply_filter_cols
  3.08%  lt-dump_video  [.] od_bin_idct8
  2.60%  lt-dump_video  [.] od_post_filter8
  2.00%  lt-dump_video  [.] od_tf_down_hv
  1.69%  lt-dump_video  [.] od_intra_pred_cdf
  1.55%  lt-dump_video  [.] od_ec_decode_cdf_unscaled
  1.46%  lt-dump_video  [.] od_post_filter4
  1.45%  lt-dump_video  [.] od_convert_intra_coeffs
  1.44%  lt-dump_video  [.] od_convert_block_down
  1.28%  lt-dump_video  [.] generic_model_update
  1.24%  lt-dump_video  [.] pvq_decoder
  1.21%  lt-dump_video  [.] od_bin_idct4

The results are very similar as expected to x86, however there are a few oddities. One is that the intra prediction is even slower than on x86, and another is that the software division routine shows up relatively high in the list. It turns out that the division comes from the inverse lapping filters – although division by a constant can be replaced by a fixed point multiply, the compiler seems not to have done this, which hurts performance a lot.

For fun, let’s see what happens when remove the costly transforms and force 4×4 block sizes only:

 26.21%  lt-dump_video  [.] od_intra_pred4x4_mult
  7.35%  lt-dump_video  [.] od_intra_pred_cdf
  6.28%  lt-dump_video  [.] od_post_filter4
  6.17%  lt-dump_video  [.] od_chroma_pred
  5.77%  lt-dump_video  [.] od_bin_idct4
  4.04%  lt-dump_video  [.] od_bands_from_raster
  3.94%  lt-dump_video  [.] generic_model_update
  3.86%  lt-dump_video  [.] od_apply_filter_cols
  3.64%  lt-dump_video  [.] od_raster_from_bands
  3.29%  lt-dump_video  [.] .divsi3_skip_div0_test
  2.47%  lt-dump_video  [.] od_convert_intra_coeffs
  2.07%  lt-dump_video  [.] od_intra_pred4x4_get
  1.95%  lt-dump_video  [.] od_apply_postfilter
  1.82%  lt-dump_video  [.] od_tf_up_hv_lp
  1.81%  lt-dump_video  [.] laplace_decode
  1.74%  lt-dump_video  [.] od_ec_decode_cdf
  1.67%  lt-dump_video  [.] pvq_decode_delta
  1.61%  lt-dump_video  [.] od_apply_filter_rows
  1.55%  lt-dump_video  [.] od_bin_idct4x4

The 4×4 intra prediction has now skyrocketed to the top, with the transforms and filters increasing as well. I was surprised by the intra prediction decoder (od_intra_pred_cdf) taking up so much time, but it can be explained by much more prediction data coded relative to the image size due to the smaller blocks. The transform still doesn’t take much time, which I suppose shouldn’t be surprising given how simple it is – my hardware can even do it in 1 cycle.

by Thomas Daede at December 30, 2013 12:23 AM

Thomas Daede : Daala profiling on x86

Given that the purpose of my hardware acceleration is to run Daala at realtime speeds, I decided to benchmark the Daala player on my Core 2 Duo laptop. I used a test video at 720p24, encoded with -v 16 and no reference frames (intra only). The following is the perf annotate output:

 19.49%  lt-player_examp  [.] od_state_upsample8
 11.64%  lt-player_examp  [.] od_intra_pred16x16_mult
  5.74%  lt-player_examp  [.] od_intra_pred8x8_mult

20 percent for od_state_upsample8? Turns out that the results of this aren’t even used in intra only mode, so commenting it out yields a more reasonable result:

 14.50%  lt-player_examp  [.] od_intra_pred16x16_mult
  7.17%  lt-player_examp  [.] od_intra_pred8x8_mult
  6.37%  lt-player_examp  [.] od_bin_idct16
  5.09%  lt-player_examp  [.] od_post_filter16
  4.63%  lt-player_examp  [.] laplace_decode
  4.41%  lt-player_examp  [.] od_bin_idct8
  4.10%  lt-player_examp  [.] od_post_filter8
  3.86%  lt-player_examp  [.] od_apply_filter_cols
  3.28%  lt-player_examp  [.] od_chroma_pred
  3.18%  lt-player_examp  [.] od_raster_from_bands
  3.14%  lt-player_examp  [.] od_intra_pred4x4_mult
  2.84%  lt-player_examp  [.] pvq_decoder
  2.76%  lt-player_examp  [.] od_ec_decode_cdf_unscaled
  2.71%  lt-player_examp  [.] od_tf_down_hv
  2.58%  lt-player_examp  [.] od_post_filter4
  2.45%  lt-player_examp  [.] od_bands_from_raster
  2.13%  lt-player_examp  [.] od_intra_pred_cdf
  1.98%  lt-player_examp  [.] od_intra_pred16x16_get
  1.89%  lt-player_examp  [.] pvq_decode_delta
  1.50%  lt-player_examp  [.] od_convert_intra_coeffs
  1.43%  lt-player_examp  [.] generic_model_update
  1.37%  lt-player_examp  [.] od_convert_block_down
  1.21%  lt-player_examp  [.] od_ec_decode_cdf
  1.18%  lt-player_examp  [.] od_ec_dec_normalize
  1.18%  lt-player_examp  [.] od_bin_idct4

I have bolded the functions that I plan to implement in hardware. As you can see, they sum to only about 23% of the total execution time – this means that accelerating these functions alone won’t bring me into realtime decoding performance. Obvious other targets include the intra prediction matrix multiplication, though this might be better handled by NEON acceleration for now – I’m not too familiar with that area of the code yet.

by Thomas Daede at December 29, 2013 03:44 AM

Jean-Marc Valin : Opus 1.1 released

Opus 1.1
After more than two years of development, we have released Opus 1.1. This includes:

  • new analysis code and tuning that significantly improves encoding quality, especially for variable-bitrate (VBR),

  • automatic detection of speech or music to decide which encoding mode to use,

  • surround with good quality at 128 kbps for 5.1 and usable down to 48 kbps, and

  • speed improvements on all architectures, especially ARM, where decoding uses around 40% less CPU and encoding uses around 30% less CPU.

These improvements are explained in more details in Monty's demo (updated from the 1.1 beta demo). Of course, this new version is still fully compliant with the Opus specification (RFC 6716).

December 06, 2013 03:08 AM

Jean-Marc Valin : Opus 1.1-rc is out

We just released Opus 1.1-rc, which should be the last step before the final 1.1 release. Compared to 1.1-beta, this new release further improves surround encoding quality. It also includes better tuning of surround and stereo for lower bitrates. The complexity has been reduced on all CPUs, but especially ARM, which now has Neon assembly for the encoder.

With the changes, stereo encoding now produces usable audio (of course, not high fidelity) down to about 40 kb/s, with surround 5.1 sounding usable down to 48-64 kb/s. Please give this release a try and report any issues on the mailing list or by joining the #opus channel on irc.freenode.net. The more testing we get, the faster we'll be able to release 1.1-final.

As usual, the code can be downloaded from: http://opus-codec.org/downloads/

November 27, 2013 06:08 AM

Thomas Daede : Senior Honors Thesis – Daala in Hardware

not actually the daala logo

For my honors thesis, I am implementing part of the Daala decoder in hardware. This is not only a way for me to learn more about video coding and hardware, but also a way to provide feedback to the Daala project and create a reference hardware implementation.

The Chip

Part of the reason for a hardware implementation of any video codec is to make it possible to decode on an otherwise underpowered chip, such as the mobile processors common in smart phones and tablets. A very good model of this sort of chip is the Xilinx Zynq processor, which has two midrange ARM Cortex cores, and a large FPGA fabric surrounding them. The custom video decoder will be implemented in the FPGA, with high speed direct-memory-access providing communication with the ARM cores.

The Board

Image from zedboard.org

I will be using the ZedBoard, a low cost prototyping board based on the Zynq 7020 system-on-chip. It includes 512MB of DDR, both HDMI and VGA video output, Ethernet, serial, and boots Linux out of the box. The only thing that could make it better would be a cute kitten on the silkscreen.

Choosing what to accelerate

For now, parts of the codec will still run in software. This is because many of them would be very complicated state machines in hardware, and more importantly, it allows me to incrementally add hardware acceleration while maintaining a functional decoder. To make a particular algorithm a good candidate for hardware acceleration, it needs to have these properties:

  • Stable – Daala is a rapidly changing codec, and while much of it is expected to change, it takes time to implement hardware, and it’s much easier if the reference isn’t changing under my feet.
  • Parallel – Compared to a CPU, hardware can excel at exploiting parallelism. CPUs can do it too with SIMD instructions, but hardware can be tailor made to the application.
  • Independent – The hardware accelerator can act much like a parallel thread, which means that locking and synchronization comes into play. Ideally the hardware and CPU should rarely have to wait for each other.
  • Interesting – The hardware should be something unique to Daala.

The best fit that I have found for these is the transform stage of Daala. The transform stage is a combination of the discrete cosine transform (actually an integer approximation), and a lapping filter. While the DCT is an old concept, the 2D lapping filter is pretty unique to Daala, and implementing both in tightly coupled hardware can create a large performance benefit. More info on the transform stage can be found on Monty’s demo pages.

by Thomas Daede at November 26, 2013 03:04 AM

Thomas Daede : Inside a TTL crystal oscillator

Inside Crystal View 1

In case you ever wanted to know what is inside an oscillator can… I used a dremel so that now you can know. The big transparent disc on the right is the precisely cut quartz resonator, suspended on springs. On the left is a driver chip and pads for loading capacitors to complete the oscillator circuit. The heat from my dremel was enough to melt the solder and remove the components. Your average crystal can won’t have the driver chip or capacitors – most microcontrollers now have the driver circuitry built-in.


by Thomas Daede at November 25, 2013 10:49 PM

Jean-Marc Valin : Back from AES

I just got back from the 135th AES convention, where Koen Vos and I presented two papers on voice and music coding in Opus.

Also of interest at the convention was the Fraunhofer codec booth. It appears that Opus is now causing them some concerns, which is a good sign. And while we're on that topic, the answer is yes :-)


October 24, 2013 01:48 AM

Jean-Marc Valin : Opus 1.1-beta release and a demo page by Monty

Opus 1.1-beta
We just released Opus 1.1-beta, which includes many improvements over the 1.0.x branch. For this release, Monty made a nice demo page showing off most of the new features. In other news, the AES has accepted my paper on the CELT part of Opus, as well as a paper proposal from Koen Vos on the SILK part.

July 12, 2013 04:31 PM

Thomas B. Rücker : Icecast – How to prevent listeners from being disconnected

A set of hints for common questions and situations people encounter while setting up a radio station using Icecast.

  • The source client is on an unstable IP connection.
    • You really want to avoid such situations. If this is a professional setup you might even consider a dedicated internet connection for the source client. The least would be QoS with guaranteed bandwidth.
      I’ve seen it far too often over the years that people complain about ‘Icecast drops my stream all the time’ and after some digging we find that the same person or someone on their network runs a BitTorrent or some other bandwidth intensive application
    • If the TCP connection just stalls from time to time for a few seconds, then you might improve the situation by increasing the source-timeout, but don’t increase it too much as that will lead to weird behaviour for your listeners too. Beyond 15-20s I’d highly recommend to consider the next option instead.
    • If the TCP connection breaks, so the source client gets fully disconnected and has to reconnect to Icecast, then you really want to look into setting up fallbacks for your streams, as otherwise this also immediately disconnects all listeners.
      What this does is it transfers all listeners to a backup stream (or static file) instead of disconnecting them. Preferably you’d have that fallback stream generated locally on the same machine as the Icecast server, e.g. some ‘elevator music’ with an announcement of ‘technical difficulties, regular programming will resume soon’.
      There is a special case: If you opt for ‘silence’ as your fallback and you stream in e.g. Ogg/Vorbis, then it will compress digital silence to a few bit per second, as it’s incredibly compressible. This completely messes up most players. The workaround is to inject some -90dB AWGN, which is below hearing level, but keeps the encoder busy and the bit-rate up.
      Important note: To avoid problems you should match the parameters of both streams closely to avoid problems in playback, as many players won’t cope well if the codec, sample rate or other things change mid stream…


  • You want to have several different live shows during the week and in between some automated playlist driven programming
    • use (several) fallbacks
      primary mountpoint: live shows
      secondary mountpoint: playlist
      tertiary mountpoint: -90dB AWGN (optional, e.g. as a insurance if the playlist fails)
    • If you want to have this all completely hidden from the listeners with one mount point, that is automatically also listed on http://dir.xiph.org, then you need to add one more trick:
      Set up a local relay of the primary mountpoint, set it to force YP submission.
      Set all the three other mounts to hidden and force disable YP submission.
      This gives you one visible mountpoint for your users and all the ‘magic’ is hidden.


I’ll expand this post by some simple configuration examples later today/tomorrow.

by Thomas B. Rücker at July 08, 2013 07:08 AM

Thomas B. Rücker : Icecast · analyzing your logs to create useful statistics

There are many ways to analyze Icecast usage and generate statistics. It has a special statistics TCP stream, it has hooks that trigger on source or listener connect/disconnect events, you can query it over custom XSLT or read the stats.xml or you can analyze its log files. This post however will be only about the latter. I’ll be writing separate posts on the other topics soon.

  • Webalizer streaming version
    A fork of webalizer 2.10 extended to support icecast2 logs and produce nice listener statistics.
    I’ve used this a couple of years ago. It could use some love (it expects a manual input of ‘average bitrate’ while that could be calculated), but otherwise does a very good job of producing nice statistics.
  • AWStats – supports ‘streaming’ logs.
    Haven’t used this one myself for Icecast analysis, but it’s a well established and good open source log analysis tool.
  • Icecast log parser  – Log parser to feed a mySQL database
    I ran into this one recently, it seems to be a small script that helps you feed a mySQL database from your log files. Further analysis is then possible through SQL queries. A nice building block if you want to build a custom solution.

In case you want to parse the access log yourself you should be aware of several things. Largely the log is compatible with other httpd log formats, but there are slight differences. Please look at those two log entries: - - [02/Jun/2013:20:35:50 +0200] "GET /test.webm HTTP/1.1" 200 3379662 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0" 88 - - [02/Jun/2013:20:38:13 +0200] "PUT /test.webm HTTP/1.1" 200 19 "-" "-" 264

The first one is a listener (in this case Firefox playing a WebM stream). You can see it received 3379662 bytes, but the interesting part is the last entry on that line “88”. It’s the number of seconds it was connected. Something important for streaming, less important for a file serving httpd.

The second entry is a source client (in this case using the HTTP PUT method we’re adding for Icecast 2.4). Note that it only says “19” after the HTTP status code. That might seem very low at first, but it’s important as we’re not logging the number of bytes sent by the client to the server, but from the server to the client. If necessary you could extract this from the error.log though. The “264” at the end once again indicates it was streaming for 264 seconds.

Another thing, as we’ve heard that question pop up repeatedly. The log entry is only generated once the listener/source actually disconnects, as only then we know all the details that should go into that line. If you are looking for real time statistics, then the access.log is the wrong place, sorry.

There are also some closed source solutions, but I’ve never used them and I don’t think they provide significant benefit over the available open source solutions mentioned above.

If you know a good Icecast tool, not only for log analysis, please write a comment or send a mail to one of the Icecast mailing lists.

by Thomas B. Rücker at June 02, 2013 06:52 PM

Chris Pearce : HTML5 video playbackRate and Ogg chaining support landed in Firefox

Paul Adenot has recently landed patches in Firefox to enable the playbackRate attribute on HTML5 <audio> and <video> elements.

This is a cool feature that I've been looking forward to for a while; it means users can speed up playback of videos (and audio) so that you can for example watch presentations sped up and only slow down for the interesting bits. Currently Firefox's <video> controls don't have support for playbackRate, but it is accessible from JavaScript, and hopefully we'll get support added to our built-in controls soon.

Paul has also finally landed support for Ogg chaining. This has been strongly desired for quite some time by community members, and the final patch also had contributions from "oneman" (David Richards), who also was a strong advocate for this feature.

We decided to reduce the scope of our chaining implementation in order to make it easier and quicker to implement. We targeted the features most desired by internet radio providers, and so we only support chaining in Ogg Vorbis and Opus audio files and we disable seeking in chained files.

Thanks Paul and David for working on these features!

by Chris Pearce (noreply@blogger.com) at December 23, 2012 02:28 AM

Thomas Daede : Centaurus 3 Lights

Centaurus 3′s lights gathered some attention during FSGP 2012. Lights are a surprisingly hard thing to get right on a solar car – they need to be bright, efficient, aerodynamic, and pretty.

Previous solar cars have tried many things, with varying success. A simple plastic window and row of T1 LEDs works well, but if the window needs to be a strange shape, it can get difficult and ugly. Acrylic light pipes make very thin and aerodynamic lights, but it can be difficult to get high efficiency, or even acceptable brightness.

There are two general classes of LEDs. One is the low power, highly focused, signaling types of LEDs. These generally draw less than 75mA, and usually come with focused or diffuse lenses, such as a T1 3/4 case. This is also the type of LED in the ASC 2012 reference lights. The other type of LED is the high brightness lighting-class LED, such as those made by Philips, Cree, or Seoul Semiconductor. These usually draw 350mA or more, are rated for total lumen output rather than millicandelas+viewing angle, and require a heat sink.

We eventually chose the low-current type of LEDs, because the lumen count we needed was low, and the need for heatsinking would significantly complicate matters. We ended up using the 30-LED version of the ASC 2012 reference lighting. The outer casing is ugly, so the LED PCB was removed using a utility knife.

Now we need a new lens. I originally looked for acrylic casting, but settled on Alumilite Water-Clear urethane casting material. This material is much stronger than acrylic and supposedly easier to use.

This product is designed to be used with rubber molds. Instead, I used our team’s sponsorship from Stratasys to 3D print some molds:

The molds are made as two halves which bolt together with AN3 fasteners. They are sanded, painted, and sanded again, then coated with mold release wax and PVA. Then, each part of the alumilite urethane is placed in a vacuum pot to remove bubbles. Then the two parts are mixed, and poured into the mold. Next, the lights are dipped into the mold, suspended by two metal pins. The wires are bent and taped around the outside of the mold to hold the LED lights in place. Then, the entire mold is placed in the vacuum pot for about one minute to pull out more bubbles, then the mold is moved to an oven and allowed to cure for 4 more hours at 60C. Using an oven is not strictly necessary but makes the process happen faster and makes sure it reaches full hardness.

Once cured, the lights are removed, epoxied into place in the shell, and bondo’d. Then they are bondo’d and masked before sending off to paint. The masked area is somewhat smaller than the size of the cast part, to make the lights look more seamless and allow room for bondo to fix any bumps.

The lights generally come out with a rather diffuse surface, however it becomes water clear after the clear coat of automotive paint. The finished product looks like this:


by Thomas Daede at October 27, 2012 03:19 PM

Ben Schwartz : It’s Google

I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

by Ben at June 29, 2012 05:10 PM

David Schleef : GStreamer Streaming Server Library

Introducing the GStreamer Streaming Server Library, or GSS for short.

This post was originally intended to be a release announcement, but I started to wander off to work on other projects before the release was 100% complete.  So perhaps this is a pre-announcement.  Or it’s merely an informational piece with the bonus that the source code repository is in a pretty stable and bug-free state at the moment.  I tagged it with “gss-0.5.0″.

What it is

GSS is a standalone HTTP server implemented as a library.  Its special focus is to serve live video streams to thousands of clients, mainly for use inside an HTML5 video tag.  It’s based on GStreamer, libsoup, and json-glib, and uses Bootstrap and BrowserID in the user interface.

GSS comes with a streaming server application that is essentially a small wrapper around the library.  This application is referred to as the Entropy Wave Streaming Server (ew-stream-server); the code that is now GSS was originally split out of this application.  The app can be found in the tools/ directory in the source tree.


  • Streaming formats: WebM, Ogg, MPEG-TS.  (FLV support is waiting for a flvparse element in GStreamer.)
  • Streams in different formats/sizes/bitrates are bundled into a single “program”.
  • Streaming to Flash via HTTP.
  • Authentication using BrowserID.
  • Automatic conversion from properly formed MPEG-TS to HTTP Live Streaming.
  • Automatic conversion to RTP/RTSP (Experiemental, works for Ogg/Theora/Vorbis only.)
  • Stream upload via HTTP PUT (3 different varieties), Icecast, raw TCP socket.
  • Stream pull from another HTTP streaming server.
  • Content protection via automatic one-time URLs.
  • (Experimental) Video-on-Demand stream types.
  • Per-stream, per-program, and server metrics.
  • HTTP configuration interface and REST API is used to control the server, allowing standalone operation and easy integration with other web servers.

What’s not there?

  • Other types of authentication, LDAP or other authorization.
  • RTMP support.  (Maybe some day, but there are several good open-source Flash servers out there already.)
  • Support for upload using HTTP PUT with no 100-Continue header.  Several HTTP libraries do this.
  • Decent VOD support, with rate-controlled streaming, burst start, and seeking.

The details


by David Schleef at June 14, 2012 06:21 PM