Xiph logo

Silvia Pfeiffer : Annual Release of External-Videos plugin – we’ve hit v1.0

This is the annual release of my external-videos wordpress plugin and with the help of  Andrew Nimmolo I’m proud to annouce we’ve reached version 1.0!

So yes, my external-videos wordpress plugin is now roughly 7 years old, who would have thought! During the year, I don’t get the luxury of spending time on maintaining this open source love child of mine, but at Christmas, my bad conscience catches up with me  – every year! I then spend some time going through bug reports, upgrading the plugin to the latest wordpress version, upgrading to the latest video site APIs, testing functionality and of course making a new release.

This year has been quite special. The power of open source has kicked in and a new developer took an interest in external-videos. Andrew Nimmolo submitted patches over all of 2016. He decided to bring the external-videos plugin into the new decade with a huge update to the layout of the settings pages, general improvements, and an all-round update of all the video site APIs which included removing their overly complex SDKs and going straight for the REST APIs.

Therefore, I’m very proud to be able to release version 1.0 today. Thanks, Andrew!

Enjoy – and I look forward to many more contributions – have a Happy 2017!

NOTE: If you’re upgrading from an older version, you might need to remove and re-add your social video sites because the API details have changed a bit. Also, we noticed that there were layout issues on WordPress 4.3.7, so try and make sure your WordPress version is up to date.

by silvia at January 13, 2017 10:55 PM

Ben Schwartz : The Clock

I watched The Clock at the Boston Museum of Fine Arts from about 4:45 to 6:30 on Friday. My thoughts on The Clock:

The experiment works in part because scenes with clocks in them are usually frenetic. In a movie, the presence of a clock usually means someone is in a rush, and so most of the sequences convey urgency.

It’s often hard to spot the clock in each scene. In less exciting sequences, this serves as a game to pass the time. In many cases the clock in question is never in focus, or is moving too fast for the viewer to notice. The editors must have done careful freeze-frames and zoomed in on wristwatches to work out the indicated time.

The selected films are mostly in English, with a fair number in French and very few in any other languages. This feels fairly arbitrary to me.

Scenes from multiple films are often mixed within each segment. It seems like the editors adopted a relaxed rule, maybe something like: “if a clock appeared in an original, then a one minute window around the moment of appearance is fair game to include during that minute of The Clock, spliced together with other clips in any order”.

The editing makes heavy use of L cuts and audio crossfades to make the fairly random assortment of sources feel more cohesive.

I swear I saw a young Michael Cain at least twice in two different roles.

Some of the sources were distinctly low-fidelity, often due to framerate matching issues. I think this might be the first production I’ve seen that would really have benefited from a full Variable Frame Rate render and display pipeline.

I started to wonder about connections to deep learning. Could we train an image captioning network to identify images of clocks and watches, then run it on a massive video corpus to generate The Clock automatically?

Or, could we construct a spatial analogue to The Clock’s time-for-time conceit? How about a service that notifies you of a film clip shot at your current location? With a large GPS-tagged corpus (or a location-finder neural network) it might be possible to do this with pretty broad coverage.

by Ben at November 28, 2016 03:03 AM

Jean-Marc Valin : Opus 1.2-alpha is out!

We just released Opus 1.2-alpha. It's an alpha release for the upcoming Opus 1.2. It includes many quality improvements for low-bitrate speech and music. It also includes new features, as well as a large number of bug fixes. See the announcement for more details.

November 03, 2016 08:07 PM

Monty : Daala progress update 20160606: Revisiting Daala Technology Demos

I've not posted many entries in the past year-- practically none at all in fact. There's something entirely different about coding and writing prose that makes it difficult to shift between one and the other. Nor have I been particularly good (or even acceptably bad) about responding to questions and replies, especially those asking where Daala is, or where Daala is going in the era of AOM.

In the meantime, though, Jean-Marc has written another demo-style post that revisits many of the techniques we've tried out in Daala while applying the benefit of hindsight. I've got a number of my own comments I want to make about what he's written, but in the meantime it's an excellent technical summary.

by Monty (monty@xiph.org) at June 06, 2016 08:39 PM

Jean-Marc Valin : Revisiting Daala Technology Demos

Over the last three years, we have published a number of Daala technology demos. With pieces of Daala being contributed to the Alliance for Open Media's AV1 video codec, now seems like a good time to go back over the demos and see what worked, what didn't, and what changed compared to the description we made in the demos.

Read more!

June 06, 2016 07:43 PM

Jean-Marc Valin : A Deringing Filter for Daala... And Beyond

Deringing banner

Here's the latest addition to the Daala demo series. This demo describes the new Daala deringing filter that replaces a previous attempt with a less complex algorithm that performs much better. Those who like the know all the math details can also check out the full paper.

Read more!

April 07, 2016 10:31 AM

Silvia Pfeiffer : WebRTC predictions for 2016

I wrote these predictions in the first week of January and meant to publish them as encouragement to think about where WebRTC still needs some work. I’d like to be able to compare the state of WebRTC in the browser a year from now. Therefore, without further ado, here are my thoughts.

WebRTC Browser support

I’m quite optimistic when it comes to browser support for WebRTC. We have seen Edge bring in initial support last year and Apple looking to hire engineers to implement WebRTC. My prediction is that we will see the following developments in 2016:

  • Edge will become interoperable with Chrome and Firefox, i.e. it will publish VP8/VP9 and H.264/H.265 support
  • Firefox of course continues to support both VP8/VP9 and H.264/H.265
  • Chrome will follow the spec and implement H.264/H.265 support (to add to their already existing VP8/VP9 support)
  • Safari will enter the WebRTC space but only with H.264/H.265 support

Codec Observations

With Edge and Safari entering the WebRTC space, there will be a larger focus on H.264/H.265. It will help with creating interoperability between the browsers.

However, since there are so many flavours of H.264/H.265, I expect that when different browsers are used at different endpoints, we will get poor quality video calls because of having to negotiate a common denominator. Certainly, baseline will work interoperably, but better encoding quality and lower bandwidth will only be achieved if all endpoints use the same browser.

Thus, we will get to the funny situation where we buy ourselves interoperability at the cost of video quality and bandwidth. I’d call that a “degree of interoperability” and not the best possible outcome.

I’m going to go out on a limb and say that at this stage, Google is going to consider strongly to improve the case of VP8/VP9 by improving its bandwidth adaptability: I think they will buy themselves some SVC capability and make VP9 the best quality codec for live video conferencing. Thus, when Safari eventually follows the standard and also implements VP8/VP9 support, the interoperability win of H.264/H.265 will become only temporary overshadowed by a vastly better video quality when using VP9.

The Enterprise Boundary

Like all video conferencing technology, WebRTC is having a hard time dealing with the corporate boundary: firewalls and proxies get in the way of setting up video connections from within an enterprise to people outside.

The telco world has come up with the concept of SBCs (session border controller). SBCs come packed with functionality to deal with security, signalling protocol translation, Quality of Service policing, regulatory requirements, statistics, billing, and even media service like transcoding.

SBCs are a total overkill for a world where a large number of Web applications simply want to add a WebRTC feature – probably mostly to provide a video or audio customer support service, but it could be a live training session with call-in, or an interest group conference all.

We cannot install a custom SBC solution for every WebRTC service provider in every enterprise. That’s like saying we need a custom Web proxy for every Web server. It doesn’t scale.

Cloud services thrive on their ability to sell directly to an individual in an organisation on their credit card without that individual having to ask their IT department to put special rules in place. WebRTC will not make progress in the corporate environment unless this is fixed.

We need a solution that allows all WebRTC services to get through an enterprise firewall and enterprise proxy. I think the WebRTC standards have done pretty well with firewalls and connecting to a TURN server on port 443 will do the trick most of the time. But enterprise proxies are the next frontier.

What it takes is some kind of media packet forwarding service that sits on the firewall or in a proxy and allows WebRTC media packets through – maybe with some configuration that is necessary in the browsers or the Web app to add this service as another type of TURN server.

I don’t have a full understanding of the problems involved, but I think such a solution is vital before WebRTC can go mainstream. I expect that this year we will see some clever people coming up with a solution for this and a new type of product will be born and rolled out to enterprises around the world.


So these are my predictions. In summary, they address the key areas where I think WebRTC still has to make progress: interoperability between browsers, video quality at low bitrates, and the enterprise boundary. I’m really curious to see where we stand with these a year from now.

It’s worth mentioning Philipp Hancke’s tweet reply to my post:

— we saw some clever people come up with a solution already. Now it needs to be implemented 🙂

by silvia at February 17, 2016 09:48 PM

Silvia Pfeiffer : SWAY at RFWS using Coviu

A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at February 14, 2016 09:02 PM

Silvia Pfeiffer : My journey to Coviu

My new startup just released our MVP – this is the story of what got me here.

I love creating new applications that let people do their work better or in a manner that wasn’t possible before.

German building and loan socityMy first such passion was as a student intern when I built a system for a building and loan association’s monthly customer magazine. The group I worked with was managing their advertiser contacts through a set of paper cards and I wrote a dBase based system (yes, that long ago) that would manage their customer relationships. They loved it – until it got replaced by an SAP system that cost 100 times what I cost them, had really poor UX, and only gave them half the functionality. It was a corporate system with ongoing support, which made all the difference to them.

Dr Scholz und Partner GmbHThe story repeated itself with a CRM for my Uncle’s construction company, and with a resume and quotation management system for Accenture right after Uni, both of which I left behind when I decided to go into research.

Even as a PhD student, I never lost sight of challenges that people were facing and wanted to develop technology to overcome problems. The aim of my PhD thesis was to prepare for the oncoming onslaught of audio and video on the Internet (yes, this was 1994!) by developing algorithms to automatically extract and locate information in such files, which would enable users to structure, index and search such content.

Many of the use cases that we explored are now part of products or continue to be challenges: finding music that matches your preferences, identifying music or video pieces e.g. to count ads on the radio or to mark copyright infringement, or the automated creation of video summaries such as trailers.


This continued when I joined the CSIRO in Australia – I was working on segmenting speech into words or talk spurts since that would simplify captioning & subtitling, and on MPEG-7 which was a (slightly over-engineered) standard to structure metadata about audio and video.

In 2001 I had the idea of replicating the Web for videos: i.e. creating hyperlinked and searchable video-only experiences. We called it “Annodex” for annotated and indexed video and it needed full-screen hyperlinked video in browsers – man were we ahead of our time! It was my first step into standards, got several IETF RFCs to my name, and started my involvement with open codecs through Xiph.

vquence logoAround the time that YouTube was founded in 2006, I founded Vquence – originally a video search company for the Web, but pivoted to a video metadata mining company. Vquence still exists and continues to sell its data to channel partners, but it lacks the user impact that has always driven my work.

As the video element started being developed for HTML5, I had to get involved. I contributed many use cases to the W3C, became a co-editor of the HTML5 spec and focused on video captioning with WebVTT while contracting to Mozilla and later to Google. We made huge progress and today the technology exists to publish video on the Web with captions, making the Web more inclusive for everybody. I contributed code to YouTube and Google Chrome, but was keen to make a bigger impact again.

NICTA logoThe opportunity came when a couple of former CSIRO colleagues who now worked for NICTA approached me to get me interested in addressing new use cases for video conferencing in the context of WebRTC. We worked on a kiosk-style solution to service delivery for large service organisations, particularly targeting government. The emerging WebRTC standard posed many technical challenges that we addressed by building rtc.io , by contributing to the standards, and registering bugs on the browsers.

Fast-forward through the development of a few further custom solutions for customers in health and education and we are starting to see patterns of need emerge. The core learning that we’ve come away with is that to get things done, you have to go beyond “talking heads” in a video call. It’s not just about seeing the other person, but much more about having a shared view of the things that need to be worked on and a shared way of interacting with them. Also, we learnt that the things that are being worked on are quite varied and may include multiple input cameras, digital documents, Web pages, applications, device data, controls, forms.

Coviu logoSo we set out to build a solution that would enable productive remote collaboration to take place. It would need to provide an excellent user experience, it would need to be simple to work with, provide for the standard use cases out of the box, yet be architected to be extensible for specialised data sharing needs that we knew some of our customers had. It would need to be usable directly on Coviu.com, but also able to integrate with specialised applications that some of our customers were already using, such as the applications that they spend most of their time in (CRMs, practice management systems, learning management systems, team chat systems). It would need to require our customers to sign up, yet their clients to join a call without sign-up.

Collaboration is a big problem. People are continuing to get more comfortable with technology and are less and less inclined to travel distances just to get a service done. In a country as large as Australia, where 12% of the population lives in rural and remote areas, people may not even be able to travel distances, particularly to receive or provide recurring or specialised services, or to achieve work/life balance. To make the world a global village, we need to be able to work together better remotely.

The need for collaboration is being recognised by specialised Web applications already, such as the LiveShare feature of Invision for Designers, Codassium for pair programming, or the recently announced Dropbox Paper. Few go all the way to video – WebRTC is still regarded as a complicated feature to support.

Coviu in action

With Coviu, we’d like to offer a collaboration feature to every Web app. We now have a Web app that provides a modern and beautifully designed collaboration interface. To enable other Web apps to integrate it, we are now developing an API. Integration may entail customisation of the data sharing part of Coviu – something Coviu has been designed for. How to replicate the data and keep it consistent when people collaborate remotely – that is where Coviu makes a difference.

We have started our journey and have just launched free signup to the Coviu base product, which allows individuals to own their own “room” (i.e. a fixed URL) in which to collaborate with others. A huge shout out goes to everyone in the Coviu team – a pretty amazing group of people – who have turned the app from an idea to reality. You are all awesome!

With Coviu you can share and annotate:

  • images (show your mum photos of your last holidays, or get feedback on an architecture diagram from a customer),
  • pdf files (give a presentation remotely, or walk a customer through a contract),
  • whiteboards (brainstorm with a colleague), and
  • share an application window (watch a YouTube video together, or work through your task list with your colleagues).

All of these are regarded as “shared documents” in Coviu and thus have zooming and annotations features and are listed in a document tray for ease of navigation.

This is just the beginning of how we want to make working together online more productive. Give it a go and let us know what you think.


by silvia at October 27, 2015 10:08 AM

Monty : Announcing VP9 (And Opus and Ogg and Vorbis) support coming to Microsoft Edge

Oh look, an official announcement: http://blogs.windows.com/msedgedev/2015/09/08/announcing-vp9-support-coming-to-microsoft-edge/

Called it!

In any case, welcome to the party Microsoft. Feel free to help yourself to the open bar. :-)

(And, to be fair, this isn't as fantastically unlikely as some pundits have been saying. After all, MS does own an IP stake in Opus).

by Monty (monty@xiph.org) at September 08, 2015 11:23 PM

Monty : Comments on the Alliance for Open Media, or, "Oh Man, What a Day"

I assume folks who follow video codecs and digital media have already noticed the brand new Alliance for Open Media jointly announced by Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix. I expect the list of member companies to grow somewhat in the near future.

One thing that's come up several times today: People contacting Xiph to see if we're worried this detracts from the IETF's NETVC codec effort. The way the aomedia.org website reads right now, it might sound as if this is competing development. It's not; it's something quite different and complementary.

Open source codec developers need a place to collaborate on and share patent analysis in a forum protected by client-attorney privilege, something the IETF can't provide. AOMedia is to be that forum. I'm sure some development discussion will happen there, probably quite a bit in fact, but pooled IP review is the reason it exists.

It's also probably accurate to view the Alliance for Open Media (the Rebel Alliance?) as part of an industry pushback against the licensing lunacy made obvious by HEVCAdvance. Dan Rayburn at Streaming Media reports a third HEVC licensing pool is about to surface. To-date, we've not yet seen licensing terms on more than half of the known HEVC patents out there.

In any case, HEVC is becoming rather expensive, and yet increasingly uncertain licensing-wise. Licensing uncertainty gives responsible companies the tummy troubles. Some of the largest companies in the world are seriously considering starting over rather than bet on the mess...

Is this, at long last, what a tipping point feels like?

Oh, and one more thing--

As of today, just after Microsoft announced its membership in the Open Media Alliance, they also quietly changed the internal development status of Vorbis, Opus, WebM and VP9 to indicate they intend to ship all of the above in the new Windows Edge browser. Cue spooky X-files theme music.

by Monty (monty@xiph.org) at September 02, 2015 06:37 AM

Monty : Cisco introduces the Thor video codec

I want to mention this on its own, because it's a big deal, and so that other news items don't steal its thunder: Cisco has begun blogging about their own FOSS video codec project, also being submitted as an initial input to the IETF NetVC video codec working group effort:

"World, Meet Thor – a Project to Hammer Out a Royalty Free Video Codec"

Heh. Even an appropriate almost-us-like nutty headline for the post. I approve :-)

In a bit, I'll write more about what's been going on in the video codec realm in general. Folks have especially been asking lots of questions about HEVCAdvance licensing recently. But it's not their moment right now. Bask in the glow of more open development goodness, we'll get to talking about the 'other side' in a bit.

by Monty (monty@xiph.org) at August 11, 2015 08:29 PM

Monty : libvpx 1.4.0 ("Indian Runner Duck") officially released

Johann Koenig just posted that the long awaited libvpx 1.4.0 is officially released.

"Much like the Indian Runner Duck, the theme of this release is less bugs, less bloat and more speed...."

by Monty (monty@xiph.org) at April 03, 2015 09:20 PM

Monty : Today's WTF Moment: A Competing HEVC Licensing Pool

Had this happened next week, I'd have thought it was an April Fools' joke.

Out of nowhere, a new patent licensing group just announced it has formed a second, competing patent pool for HEVC that is independent of MPEG LA. And they apparently haven't decided what their pricing will be... maybe they'll have a fee structure ready in a few months.

Video on the Net (and let's be clear-- video's future is the Net) already suffers endless technology licensing problems. And the industry's solution is apparently even more licensing.

In case you've been living in a cave, Google has been trying to establish VP9 as a royalty- and strings-free alternative (new version release candidate just out this week!), and NetVC, our own next-next-generation royalty-free video codec, was just conditionally approved as an IETF working group on Tuesday and we'll be submitting our Daala codec as an input to the standardization process. The biggest practical question surrounding both efforts is 'how can you possibly keep up with the MPEG behemoth'?

Apparently all we have to do is stand back and let the dominant players commit suicide while they dance around Schroedinger's Cash Box.

by Monty (monty@xiph.org) at March 27, 2015 06:35 PM

Monty : Daala Blog-Like Update: Bug or feature? [or, the law of Unintentionally Intentional Behaviors]

Codec development is often an exercise in tracking down examples of "that's funny... why is it doing that?" The usual hope is that unexpected behaviors spring from a simple bug, and finding bugs is like finding free performance. Fix the bug, and things usually work better.

Often, though, hunting down the 'bug' is a frustrating exercise in finding that the code is not misbehaving at all; it's functioning exactly as designed. Then the question becomes a thornier issue of determining if the design is broken, and if so, how to fix it. If it's fixable. And the fix is worth it.

[continue reading at Xiph.Org....]

by Monty (monty@xiph.org) at March 26, 2015 04:49 PM

Silvia Pfeiffer : WebVTT Audio Descriptions for Elephants Dream

When I set out to improve accessibility on the Web and we started developing WebSRT – later to be renamed to WebVTT – I needed an example video to demonstrate captions / subtitles, audio descriptions, transcripts, navigation markers and sign language.

I needed a freely available video with spoken text that either already had such data available or that I could create it for. Naturally I chose “Elephants Dream” by the Orange Open Movie Project , because it was created under the Creative Commons Attribution 2.5 license.

As it turned out, the Blender Foundation had already created a collection of SRT files that would represent the English original as well as the translated languages. I was able to reuse them by merely adding a WEBVTT header.

Then there was a need for a textual audio description. I read up on the plot online and finally wrote up a time-alignd audio description. I’m hereby making that file available under the Create Commons Attribution 4.0 license. I’ve added a few lines to the medadata headers so it doesn’t confuse players. Feel free to reuse at will – I know there are others out there that have a similar need to demonstrate accessibility features.

by silvia at March 10, 2015 12:50 PM

Monty : A Fabulous Daala Holiday Update

Before we get into the update itself, yes, the level of magenta in that banner image got away from me just a bit. Then it was just begging for inappropriate abuse of a font...


Hey everyone! I just posted a Daala update that mostly has to do with still image performance improvements (yes, still image in a video codec. Go read it to find out why!). The update includes metric plots showing our improvement on objective metrics over the past year and relative to other codecs. Since objective metrics are only of limited use, there's also side-by-side interactive image comparisons against jpeg, vp8, vp9, x264 and x265.

The update text (and demo code) was originally for a July update, as still image work was mostly in the beginning of the year. That update get held up and hadn't been released officially, though it had been discovered by and discussed at forums like doom9. I regenerated the metrics and image runs to use latest versions of all the codecs involved (only Daala and x265 improved) for this official better-late-than-never progress report!

by Monty (monty@xiph.org) at December 23, 2014 09:09 PM

Monty : Daala Demo 6: Perceptual Vector Quantization (by J.M. Valin)

Jean-Marc has finished the sixth Daala demo page, this one about PVQ, the foundation of our encoding scheme in both Daala and Opus.

(I suppose this also means we've finally settled on what the acronym 'PVQ' stands for: Perceptual Vector Quantization. It's based on, and expanded from, an older technique called Pyramid Vector Quantization, and we'd kept using 'PVQ' for years even though our encoding space was actually spherical. I'd suggested we call it 'Pspherical Vector Quantization' with a silent P so that we could keep the acronym, and that name appears in some of my slide decks. Don't get confused, it's all the same thing!)

by Monty (monty@xiph.org) at November 19, 2014 08:12 PM

Jean-Marc Valin : Daala: Perceptual Vector Quantization (PVQ)

Here's my new contribution to the Daala demo effort. Perceptual Vector Quantization has been one of the core ideas in Daala, so it was time for me to explain how it works. The details involve lots of maths, but hopefully this demo will make the general idea clear enough. I promise that the equations in the top banner are the only ones you will see!

Read more!

November 18, 2014 08:37 PM

Monty : Intra-Paint: A new Daala demo from Jean-Marc Valin

Intra paint is not a technique that's part of the original Daala plan and, as of right now, we're not currently using it in Daala. Jean-Marc envisioned it as a simpler, potentially more effective replacement for intra-prediction. That didn't quite work out-- but it has useful and visually pleasing qualities that, of all things, make it an interesting postprocessing filter, especially for deringing.

Several people have said 'that should be an Instagram filter!' I'm sure Facebook could shake a few million loose for us to make that happen ;-)

by Monty (monty@xiph.org) at September 24, 2014 07:36 PM

Jean-Marc Valin : Daala: Painting Images For Fun (and Profit?)

As a contribution to Monty's Daala demo effort, I decided to demonstrate a technique I've recently been developing for Daala: image painting. The idea is to represent images as directions and 1-D patterns.

Read more!

September 24, 2014 05:24 PM

Jean-Marc Valin : Opus beats all other codecs. Again.

Three years ago Opus got rated higher than HE-AAC and Vorbis in a 64 kb/s listening test. Now, the results of the recent 96 kb/s listening test are in and Opus got the best ratings, ahead of AAC-LC and Vorbis. Also interesting, Opus at 96 kb/s sounded better than MP3 at 128 kb/s.

Full plot

September 18, 2014 01:32 AM

Silvia Pfeiffer : Progress with rtc.io

At the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.


Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.

WDCNZ presentation page5

One of the most exciting opportunities is still under-exploited: the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.

Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.


We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections:

Packages per day across popular platforms (Shamelessly copied from: http://blog.nodejitsu.com/npm-innovation-through-modularity/)

For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.

For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.

Using rtc.io rtc JS library
Using rtc.io rtc JS library

So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course: enjoy WebRTC! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team!

On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating!!

by silvia at August 12, 2014 07:06 AM

Silvia Pfeiffer : Nodebots Day Sydney 2014 (take 2)

Nodebots Day Sydney 2014 (take 2)

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 11:00 PM

Silvia Pfeiffer : Nodebots Day 2014 Sydney

Nodebots and WebRTC Day in Sydney, NICTA http://www.meetup.com/WebRTC-Sydney/events/173234022/

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 05:26 PM

Silvia Pfeiffer : AppRTC : Google’s WebRTC test app and its parameters

If you’ve been interested in WebRTC and haven’t lived under a rock, you will know about Google’s open source testing application for WebRTC: AppRTC.

When you go to the site, a new video conferencing room is automatically created for you and you can share the provided URL with somebody else and thus connect (make sure you’re using Google Chrome, Opera or Mozilla Firefox).

We’ve been using this application forever to check whether any issues with our own WebRTC applications are due to network connectivity issues, firewall issues, or browser bugs, in which case AppRTC breaks down, too. Otherwise we’re pretty sure to have to dig deeper into our own code.

Now, AppRTC creates a pretty poor quality video conference, because the browsers use a 640×480 resolution by default. However, there are many query parameters that can be added to the AppRTC URL through which the connection can be manipulated.

Here are my favourite parameters:

  • hd=true : turns on high definition, ie. minWidth=1280,minHeight=720
  • stereo=true : turns on stereo audio
  • debug=loopback : connect to yourself (great to check your own firewalls)
  • tt=60 : by default, the channel is closed after 30min – this gives you 60 (max 1440)

For example, here’s how a stereo, HD loopback test would look like: https://apprtc.appspot.com/?r=82313387&hd=true&stereo=true&debug=loopback .

This is not the limit of the available parameter, though. Here are some others that you may find interesting for some more in-depth geekery:

  • ss=[stunserver] : in case you want to test a different STUN server to the default Google ones
  • ts=[turnserver] : in case you want to test a different TURN server to the default Google ones
  • tp=[password] : password for the TURN server
  • audio=true&video=false : audio-only call
  • audio=false : video-only call
  • audio=googEchoCancellation=false,googAutoGainControl=true : disable echo cancellation and enable gain control
  • audio=googNoiseReduction=true : enable noise reduction (more Google-specific parameters)
  • asc=ISAC/16000 : preferred audio send codec is ISAC at 16kHz (use on Android)
  • arc=opus/48000 : preferred audio receive codec is opus at 48kHz
  • dtls=false : disable datagram transport layer security
  • dscp=true : enable DSCP
  • ipv6=true : enable IPv6

AppRTC’s source code is available here. And here is the file with the parameters (in case you want to check if they have changed).

Have fun playing with the main and always up-to-date WebRTC application: AppRTC.

UPDATE 12 May 2014

AppRTC now also supports the following bitrate controls:

  • arbr=[bitrate] : set audio receive bitrate
  • asbr=[bitrate] : set audio send bitrate
  • vsbr=[bitrate] : set video receive bitrate
  • vrbr=[bitrate] : set video send bitrate

Example usage: https://apprtc.appspot.com/?r=&asbr=128&vsbr=4096&hd=true

by silvia at July 23, 2014 05:02 AM

Ben Schwartz : Gooseberry

Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.

by Ben at May 07, 2014 04:16 AM

Ben Schwartz : Stingy/stylish

My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.


by Ben at February 18, 2014 02:58 AM

Ben Schwartz : Efficiency

So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11″ output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

by Ben at January 21, 2014 07:00 AM

Thomas Daede : Zynq DMA performance

I finally got my inverse transform up to snuff, complete with hardware matrix transposer. The transposer was trivial to implement – I realized that I could use the Xilinx data width converter IP to register entire 4×4 blocks at once, allowing my transposer to simply be a bunch of wires (assign statements in Verilog).

Screenshot from 2014-01-08 22:04:27

Unfortunately, I wasn’t getting the performance I was expecting. At a clock speed of 100MHz and a 64-bit width, I expected to be able to perform 25 million transforms per second. However, I was having trouble even getting 4 million. To debug the problem, I used the Xilinx debug cores in Vivado:


There are several problems. Here’s an explanation of what is happening in the above picture:

  1. The CPU configures the DMA registers and starts the transfer. This works for a few clock cycles.
  2. The Stream to Memory DMA (s2mm) starts a memory transfer, but its FIFOs fill up almost immediately and it has to stall (tready goes low).
  3. The transform stream pipeline also stalls, making its tready go low.
  4. The s2mm DMA is able to start its first burst transfer, and everything goes smoothly.
  5. The CPU sees that the DMA has completed, and schedules the second pass. The turnaround time for this is extremely large, and ends up taking the majority of the time.
  6. The same process happens again, but the latency is even larger due to writing to system memory.

Fortunately, the solution isn’t that complicated. I am going to switch to a scatter-gather DMA engine, which allows me to construct a request chain, and then the DMA will execute the operations without CPU intervention, avoiding the CPU latency. In addition, a FIFO can be used to reduce the impact of the initial write latency somewhat, though this costs FPGA area and it might be better just to strive for longer DMA requests.

There are other problems with my memory access at the moment – the most egregious being that my hardware expects a tiled buffer, but the Daala reference implementation uses linear buffers everywhere. This is the problem that I plan to tackle next.

by Thomas Daede at January 09, 2014 04:26 AM

Silvia Pfeiffer : Use deck.js as a remote presentation tool

deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.

Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.

I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.

I loaded a slide master URL:
and the room loaded the URL without query string:

Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.

How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!

We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!

Here are the few things we added to deck.js to make it work:

  • Code added to index.html to make the video connection work:
    <meta name="rtc-signalhost" content="http://rtc.io/switchboard/">
    <meta name="rtc-room" content="lca2014">
    <video id="localV" rtc-capture="camera" muted></video>
    <video id="peerV" rtc-peer rtc-stream="localV"></video>
    <script src="glue.js"></script>
    glue.config.iceServers = [{ url: 'stun:stun.l.google.com:19302' }];

    The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.

  • Added glue.js library to deck.js:

    Downloaded from https://raw.github.com/rtc-io/rtc-glue/master/dist/glue.js into the source directory of deck.js.

  • Code added to index.html to synchronize slide navigation:
    glue.events.once('connected', function(signaller) {
      if (location.search.slice(1) !== '') {
        $(document).bind('deck.change', function(evt, from, to) {
          signaller.send('/slide', {
            idx: to,
            sender: signaller.id
      signaller.on('slide', function(data) {
        console.log('received notification to change to slide: ', data.idx);
        $.deck('go', data.idx);

    This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.

And that’s it!

You can find my slide deck on GitHub.

Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!

Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.

Many thanks to Damon Oehlman for his help in getting this working.

BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. 😉

by silvia at January 08, 2014 02:28 AM

Maik Merten : Tinkering with a H.261 encoder

On the rtcweb mailing list the struggle regarding what Mandatory To Implement (MTI) video codec should be chosen rages on. One camp favors H.264 ("We cannot have VP8"), the other VP8 ("We cannot have H.264"). Some propose that there should be a "safe" fallback codec anyone can implement, and H.261 is "as old and safe" as it can get. H.261 was specified in the final years of the 1980ies and is generally believed to have no non-expired patents left standing. Roughly speaking, this old gem of coding technology can transport CIF resolution (352x288) video at full framerate (>= 25 fps) with (depending on your definition) acceptable quality starting roughly in the 250 to 500 kbit/s range (basically, I've witnessed quite some Skype calls with similar perceived effective resolution, mostly driven by mediocre webcams, and I can live with that as long as the audio part is okay). From today's perspective, H.261 is very very light on computation, memory, and code footprint.

H.261 is, of course, outgunned by any semi-decent more modern video codec, which can, for instance, deliver video with higher resolution at similar bitrates. Those, however, don't have the luxury of having their patents expired with "as good as it can be" certainty.

People on the rtcweb list were quick to point out that having an encoder with modern encoding techniques may by itself be problematic regarding patents. Thankfully, for H.261, a public domain reference-style en- and decoder from 1993 can be found, e.g., at http://wftp3.itu.int/av-arch/video-site/h261/PVRG_Software/P64v1.2.2.tar - so that's a nice reference on what encoding technologies were state-of-the-art in the early 1990ies.

With some initial patching done by Ron Lee this old code builds quite nicely on modern platforms - and as it turns out, the encoder also produces intact video, even on x86_64 machines or on a Raspberry Pi. Quite remarkably portable C code (although not the cleanest style-wise). The original code is slow, though: It barely does realtime encoding of 352x288 video on a 1.65 GHz netbook, and I can barely imagine having it encode video on machines from 1993! Some fun was had in making it faster (it's now about three times faster than before) and the program can now encode from and decode to YUV4MPEG2 files, which is quite a lot more handy than the old mode of operation (still supported), where each frame would consist of three files (.Y, .U, .V).

For those interested, the patched up code is available at https://github.com/maikmerten/p64 - however, be aware that the original coding style (yay, global variables) is still alive and well.

So, is it "useful" to resurrect such an old codebase? Depends on the definition of "useful". For me, as long as it is fun and teaches me something, it's reasonably useful.

So is it fun to toy around with this ancient coding technology? Yes, especially as most modern codecs still follow the same overall design, but H.261 is the most basic instance of "modern" video coding and thus most easy to grasp. Who knows, with some helpful comments here and there that old codebase could be used for teaching basic principles of video coding.

January 07, 2014 07:35 PM

Thomas Daede : Off by one

My hardware had a bug that made about 50% of the outputs off by one. I compared my Verilog code to the original C and it was a one-for-one match, with the exception of a few OD_DCT_RSHIFT that I translated into arithmetic shifts. That turned out to break the transform. Looking at the definition of OD_DCT_RSHIFT:

/*This should translate directly to 3 or 4 instructions for a constant _b:
#define OD_UNBIASED_RSHIFT(_a,_b) ((_a)+(((1<<(_b))-1)&-((_a)<0))>>(_b))*/
/*This version relies on a smart compiler:*/
# define OD_UNBIASED_RSHIFT(_a, _b) ((_a)/(1<<(_b)))

I had always thought a divide by a power of two equals a shift, but this is wrong: an integer divide rounds towards zero, whereas a shift rounds towards negative infinity. The solution is simple: if the value to be shifted is less than zero, add a mask before shifting. Rather than write this logic in Verilog, I simply switched my code to the / operator as in the C code above, and XST inferred the correct logic. After verifying operation with random inputs, I also wrote a small benchmark to test the performance of my hardware:

[root@alarm hwtests]# ./idct4_test 
Filling input buffer... Done.
Running software benchmark... Done.
Time: 0.030000 s
Running hardware benchmark... Done.
Time: 0.960000 s

Not too impressive, but the implementation is super basic, so it’s not unsurprising. Most of the time is spent shuffling data across the extremely slow MMIO interface.

At the same time I was trying to figure out why the 16-bit version of the intra predictor performed uniformly worse than the double precision version – I thought 16 bits ought to be enough. The conversion is done by expanding the range to fill a 16 bit integer and then rounding:

OD_PRED_WEIGHTS_4x4_FIXED[i] = (od_coeff)floor(OD_PRED_WEIGHTS_4x4[i] * 32768 + 0.5);

The 16 bit multiplies with the 16 bit coefficients are summed in a 32 bit accumulator. The result is then truncated to the original range. I did this with a right shift – after the previous ordeal, I tried swapping it with the “round towards zero” macro. Here are the results:


The new 16 bit version even manages to outperform the double version slightly. I believe the reason why the “round to zero” does better than simply rounding down is because it tends to create a slightly negative bias in the encoded coefficients, decreasing coding gain.

by Thomas Daede at January 03, 2014 12:39 AM

Thomas Daede : Daala 4-input idct (tenatively) working!

I’ve implemented Daala’s smallest inverse transform in Verilog code. It appears as an AXI-Lite slave, with 2 32-bit registers for input and two for output. Right now it can do one transform per clock cycle, though at a pitiful 20MHz. I also haven’t verified that all of its output values are identical to the C version yet, but it passes preliminary testing with peek and poke.

Screenshot from 2014-01-01 23:23:22

I also have yet to figure out why XST only uses one DSP block even though my design has 3 multipliers…

by Thomas Daede at January 02, 2014 05:25 AM

Thomas Daede : Quantized intra predictor

Daala’s intra predictor currently uses doubles. Floating point math units are really expensive in hardware, and so is loading 64 bit weights. Therefore, I modified Daala to see what would happen if the weights were rounded to signed 16 bit. The result is below:ssimRed is before quantization, green after. This is too much loss – I’ll have to figure out why this happened. Worst case I move to 32 bit weights, though maybe my floor(+0.5) method of rounding is also suspect? Maybe the intra weights should be trained taking quantization into account?

by Thomas Daede at December 31, 2013 11:37 PM

Thomas Daede : First Zynq bitstream working!

Screenshot from 2013-12-31 14:51:20I got my first custom PL hardware working! Following the Zedboard tutorials, it was relatively straightforward, though using Vivado 2013.3 required a bit of playing around – I ended up making my own clock sources and reset controller until I realized that the Zynq PS had them if you enabled them. Next up: ChipScope or whatever it’s called in Vivado.

I crashed the chip numerous times until realizing that the bitstream file name had changed somewhere in the process, so I was uploading an old version of the bitstream….

by Thomas Daede at December 31, 2013 08:56 PM

Thomas Daede : Daala profiling on ARM

I reran the same decoding as yesterday, but this time on the Zynq Cortex-A9 instead of x86. Following is the histogram data, again with the functions I plan to accelerate highlighted:

 19.60%  lt-dump_video  [.] od_intra_pred16x16_mult
  6.66%  lt-dump_video  [.] od_intra_pred8x8_mult
  6.02%  lt-dump_video  [.] od_bin_idct16
  4.88%  lt-dump_video  [.] .divsi3_skip_div0_test
  4.54%  lt-dump_video  [.] od_bands_from_raster
  4.21%  lt-dump_video  [.] laplace_decode
  4.03%  lt-dump_video  [.] od_chroma_pred
  3.92%  lt-dump_video  [.] od_raster_from_bands
  3.66%  lt-dump_video  [.] od_post_filter16
  3.20%  lt-dump_video  [.] od_intra_pred4x4_mult
  3.09%  lt-dump_video  [.] od_apply_filter_cols
  3.08%  lt-dump_video  [.] od_bin_idct8
  2.60%  lt-dump_video  [.] od_post_filter8
  2.00%  lt-dump_video  [.] od_tf_down_hv
  1.69%  lt-dump_video  [.] od_intra_pred_cdf
  1.55%  lt-dump_video  [.] od_ec_decode_cdf_unscaled
  1.46%  lt-dump_video  [.] od_post_filter4
  1.45%  lt-dump_video  [.] od_convert_intra_coeffs
  1.44%  lt-dump_video  [.] od_convert_block_down
  1.28%  lt-dump_video  [.] generic_model_update
  1.24%  lt-dump_video  [.] pvq_decoder
  1.21%  lt-dump_video  [.] od_bin_idct4

The results are very similar as expected to x86, however there are a few oddities. One is that the intra prediction is even slower than on x86, and another is that the software division routine shows up relatively high in the list. It turns out that the division comes from the inverse lapping filters – although division by a constant can be replaced by a fixed point multiply, the compiler seems not to have done this, which hurts performance a lot.

For fun, let’s see what happens when remove the costly transforms and force 4×4 block sizes only:

 26.21%  lt-dump_video  [.] od_intra_pred4x4_mult
  7.35%  lt-dump_video  [.] od_intra_pred_cdf
  6.28%  lt-dump_video  [.] od_post_filter4
  6.17%  lt-dump_video  [.] od_chroma_pred
  5.77%  lt-dump_video  [.] od_bin_idct4
  4.04%  lt-dump_video  [.] od_bands_from_raster
  3.94%  lt-dump_video  [.] generic_model_update
  3.86%  lt-dump_video  [.] od_apply_filter_cols
  3.64%  lt-dump_video  [.] od_raster_from_bands
  3.29%  lt-dump_video  [.] .divsi3_skip_div0_test
  2.47%  lt-dump_video  [.] od_convert_intra_coeffs
  2.07%  lt-dump_video  [.] od_intra_pred4x4_get
  1.95%  lt-dump_video  [.] od_apply_postfilter
  1.82%  lt-dump_video  [.] od_tf_up_hv_lp
  1.81%  lt-dump_video  [.] laplace_decode
  1.74%  lt-dump_video  [.] od_ec_decode_cdf
  1.67%  lt-dump_video  [.] pvq_decode_delta
  1.61%  lt-dump_video  [.] od_apply_filter_rows
  1.55%  lt-dump_video  [.] od_bin_idct4x4

The 4×4 intra prediction has now skyrocketed to the top, with the transforms and filters increasing as well. I was surprised by the intra prediction decoder (od_intra_pred_cdf) taking up so much time, but it can be explained by much more prediction data coded relative to the image size due to the smaller blocks. The transform still doesn’t take much time, which I suppose shouldn’t be surprising given how simple it is – my hardware can even do it in 1 cycle.

by Thomas Daede at December 30, 2013 12:23 AM

Thomas Daede : Daala profiling on x86

Given that the purpose of my hardware acceleration is to run Daala at realtime speeds, I decided to benchmark the Daala player on my Core 2 Duo laptop. I used a test video at 720p24, encoded with -v 16 and no reference frames (intra only). The following is the perf annotate output:

 19.49%  lt-player_examp  [.] od_state_upsample8
 11.64%  lt-player_examp  [.] od_intra_pred16x16_mult
  5.74%  lt-player_examp  [.] od_intra_pred8x8_mult

20 percent for od_state_upsample8? Turns out that the results of this aren’t even used in intra only mode, so commenting it out yields a more reasonable result:

 14.50%  lt-player_examp  [.] od_intra_pred16x16_mult
  7.17%  lt-player_examp  [.] od_intra_pred8x8_mult
  6.37%  lt-player_examp  [.] od_bin_idct16
  5.09%  lt-player_examp  [.] od_post_filter16
  4.63%  lt-player_examp  [.] laplace_decode
  4.41%  lt-player_examp  [.] od_bin_idct8
  4.10%  lt-player_examp  [.] od_post_filter8
  3.86%  lt-player_examp  [.] od_apply_filter_cols
  3.28%  lt-player_examp  [.] od_chroma_pred
  3.18%  lt-player_examp  [.] od_raster_from_bands
  3.14%  lt-player_examp  [.] od_intra_pred4x4_mult
  2.84%  lt-player_examp  [.] pvq_decoder
  2.76%  lt-player_examp  [.] od_ec_decode_cdf_unscaled
  2.71%  lt-player_examp  [.] od_tf_down_hv
  2.58%  lt-player_examp  [.] od_post_filter4
  2.45%  lt-player_examp  [.] od_bands_from_raster
  2.13%  lt-player_examp  [.] od_intra_pred_cdf
  1.98%  lt-player_examp  [.] od_intra_pred16x16_get
  1.89%  lt-player_examp  [.] pvq_decode_delta
  1.50%  lt-player_examp  [.] od_convert_intra_coeffs
  1.43%  lt-player_examp  [.] generic_model_update
  1.37%  lt-player_examp  [.] od_convert_block_down
  1.21%  lt-player_examp  [.] od_ec_decode_cdf
  1.18%  lt-player_examp  [.] od_ec_dec_normalize
  1.18%  lt-player_examp  [.] od_bin_idct4

I have bolded the functions that I plan to implement in hardware. As you can see, they sum to only about 23% of the total execution time – this means that accelerating these functions alone won’t bring me into realtime decoding performance. Obvious other targets include the intra prediction matrix multiplication, though this might be better handled by NEON acceleration for now – I’m not too familiar with that area of the code yet.

by Thomas Daede at December 29, 2013 03:44 AM

Jean-Marc Valin : Opus 1.1 released

Opus 1.1
After more than two years of development, we have released Opus 1.1. This includes:

  • new analysis code and tuning that significantly improves encoding quality, especially for variable-bitrate (VBR),

  • automatic detection of speech or music to decide which encoding mode to use,

  • surround with good quality at 128 kbps for 5.1 and usable down to 48 kbps, and

  • speed improvements on all architectures, especially ARM, where decoding uses around 40% less CPU and encoding uses around 30% less CPU.

These improvements are explained in more details in Monty's demo (updated from the 1.1 beta demo). Of course, this new version is still fully compliant with the Opus specification (RFC 6716).

December 06, 2013 03:08 AM

Jean-Marc Valin : Opus 1.1-rc is out

We just released Opus 1.1-rc, which should be the last step before the final 1.1 release. Compared to 1.1-beta, this new release further improves surround encoding quality. It also includes better tuning of surround and stereo for lower bitrates. The complexity has been reduced on all CPUs, but especially ARM, which now has Neon assembly for the encoder.

With the changes, stereo encoding now produces usable audio (of course, not high fidelity) down to about 40 kb/s, with surround 5.1 sounding usable down to 48-64 kb/s. Please give this release a try and report any issues on the mailing list or by joining the #opus channel on irc.freenode.net. The more testing we get, the faster we'll be able to release 1.1-final.

As usual, the code can be downloaded from: http://opus-codec.org/downloads/

November 27, 2013 06:08 AM

Thomas Daede : Senior Honors Thesis – Daala in Hardware

not actually the daala logo

For my honors thesis, I am implementing part of the Daala decoder in hardware. This is not only a way for me to learn more about video coding and hardware, but also a way to provide feedback to the Daala project and create a reference hardware implementation.

The Chip

Part of the reason for a hardware implementation of any video codec is to make it possible to decode on an otherwise underpowered chip, such as the mobile processors common in smart phones and tablets. A very good model of this sort of chip is the Xilinx Zynq processor, which has two midrange ARM Cortex cores, and a large FPGA fabric surrounding them. The custom video decoder will be implemented in the FPGA, with high speed direct-memory-access providing communication with the ARM cores.

The Board

Image from zedboard.org

I will be using the ZedBoard, a low cost prototyping board based on the Zynq 7020 system-on-chip. It includes 512MB of DDR, both HDMI and VGA video output, Ethernet, serial, and boots Linux out of the box. The only thing that could make it better would be a cute kitten on the silkscreen.

Choosing what to accelerate

For now, parts of the codec will still run in software. This is because many of them would be very complicated state machines in hardware, and more importantly, it allows me to incrementally add hardware acceleration while maintaining a functional decoder. To make a particular algorithm a good candidate for hardware acceleration, it needs to have these properties:

  • Stable – Daala is a rapidly changing codec, and while much of it is expected to change, it takes time to implement hardware, and it’s much easier if the reference isn’t changing under my feet.
  • Parallel – Compared to a CPU, hardware can excel at exploiting parallelism. CPUs can do it too with SIMD instructions, but hardware can be tailor made to the application.
  • Independent – The hardware accelerator can act much like a parallel thread, which means that locking and synchronization comes into play. Ideally the hardware and CPU should rarely have to wait for each other.
  • Interesting – The hardware should be something unique to Daala.

The best fit that I have found for these is the transform stage of Daala. The transform stage is a combination of the discrete cosine transform (actually an integer approximation), and a lapping filter. While the DCT is an old concept, the 2D lapping filter is pretty unique to Daala, and implementing both in tightly coupled hardware can create a large performance benefit. More info on the transform stage can be found on Monty’s demo pages.

by Thomas Daede at November 26, 2013 03:04 AM

Thomas Daede : Inside a TTL crystal oscillator

Inside Crystal View 1

In case you ever wanted to know what is inside an oscillator can… I used a dremel so that now you can know. The big transparent disc on the right is the precisely cut quartz resonator, suspended on springs. On the left is a driver chip and pads for loading capacitors to complete the oscillator circuit. The heat from my dremel was enough to melt the solder and remove the components. Your average crystal can won’t have the driver chip or capacitors – most microcontrollers now have the driver circuitry built-in.


by Thomas Daede at November 25, 2013 10:49 PM

Jean-Marc Valin : Back from AES

I just got back from the 135th AES convention, where Koen Vos and I presented two papers on voice and music coding in Opus.

Also of interest at the convention was the Fraunhofer codec booth. It appears that Opus is now causing them some concerns, which is a good sign. And while we're on that topic, the answer is yes :-)


October 24, 2013 01:48 AM

Jean-Marc Valin : Opus 1.1-beta release and a demo page by Monty

Opus 1.1-beta
We just released Opus 1.1-beta, which includes many improvements over the 1.0.x branch. For this release, Monty made a nice demo page showing off most of the new features. In other news, the AES has accepted my paper on the CELT part of Opus, as well as a paper proposal from Koen Vos on the SILK part.

July 12, 2013 04:31 PM

Thomas B. Rücker : Icecast – How to prevent listeners from being disconnected

A set of hints for common questions and situations people encounter while setting up a radio station using Icecast.

  • The source client is on an unstable IP connection.
    • You really want to avoid such situations. If this is a professional setup you might even consider a dedicated internet connection for the source client. The least would be QoS with guaranteed bandwidth.
      I’ve seen it far too often over the years that people complain about ‘Icecast drops my stream all the time’ and after some digging we find that the same person or someone on their network runs a BitTorrent or some other bandwidth intensive application
    • If the TCP connection just stalls from time to time for a few seconds, then you might improve the situation by increasing the source-timeout, but don’t increase it too much as that will lead to weird behaviour for your listeners too. Beyond 15-20s I’d highly recommend to consider the next option instead.
    • If the TCP connection breaks, so the source client gets fully disconnected and has to reconnect to Icecast, then you really want to look into setting up fallbacks for your streams, as otherwise this also immediately disconnects all listeners.
      What this does is it transfers all listeners to a backup stream (or static file) instead of disconnecting them. Preferably you’d have that fallback stream generated locally on the same machine as the Icecast server, e.g. some ‘elevator music’ with an announcement of ‘technical difficulties, regular programming will resume soon’.
      There is a special case: If you opt for ‘silence’ as your fallback and you stream in e.g. Ogg/Vorbis, then it will compress digital silence to a few bit per second, as it’s incredibly compressible. This completely messes up most players. The workaround is to inject some -90dB AWGN, which is below hearing level, but keeps the encoder busy and the bit-rate up.
      Important note: To avoid problems you should match the parameters of both streams closely to avoid problems in playback, as many players won’t cope well if the codec, sample rate or other things change mid stream…


  • You want to have several different live shows during the week and in between some automated playlist driven programming
    • use (several) fallbacks
      primary mountpoint: live shows
      secondary mountpoint: playlist
      tertiary mountpoint: -90dB AWGN (optional, e.g. as a insurance if the playlist fails)
    • If you want to have this all completely hidden from the listeners with one mount point, that is automatically also listed on http://dir.xiph.org, then you need to add one more trick:
      Set up a local relay of the primary mountpoint, set it to force YP submission.
      Set all the three other mounts to hidden and force disable YP submission.
      This gives you one visible mountpoint for your users and all the ‘magic’ is hidden.


I’ll expand this post by some simple configuration examples later today/tomorrow.

by Thomas B. Rücker at July 08, 2013 07:08 AM

Thomas B. Rücker : Icecast · analyzing your logs to create useful statistics

There are many ways to analyze Icecast usage and generate statistics. It has a special statistics TCP stream, it has hooks that trigger on source or listener connect/disconnect events, you can query it over custom XSLT or read the stats.xml or you can analyze its log files. This post however will be only about the latter. I’ll be writing separate posts on the other topics soon.

  • Webalizer streaming version
    A fork of webalizer 2.10 extended to support icecast2 logs and produce nice listener statistics.
    I’ve used this a couple of years ago. It could use some love (it expects a manual input of ‘average bitrate’ while that could be calculated), but otherwise does a very good job of producing nice statistics.
  • AWStats – supports ‘streaming’ logs.
    Haven’t used this one myself for Icecast analysis, but it’s a well established and good open source log analysis tool.
  • Icecast log parser  – Log parser to feed a mySQL database
    I ran into this one recently, it seems to be a small script that helps you feed a mySQL database from your log files. Further analysis is then possible through SQL queries. A nice building block if you want to build a custom solution.

In case you want to parse the access log yourself you should be aware of several things. Largely the log is compatible with other httpd log formats, but there are slight differences. Please look at those two log entries: - - [02/Jun/2013:20:35:50 +0200] "GET /test.webm HTTP/1.1" 200 3379662 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0" 88 - - [02/Jun/2013:20:38:13 +0200] "PUT /test.webm HTTP/1.1" 200 19 "-" "-" 264

The first one is a listener (in this case Firefox playing a WebM stream). You can see it received 3379662 bytes, but the interesting part is the last entry on that line “88”. It’s the number of seconds it was connected. Something important for streaming, less important for a file serving httpd.

The second entry is a source client (in this case using the HTTP PUT method we’re adding for Icecast 2.4). Note that it only says “19” after the HTTP status code. That might seem very low at first, but it’s important as we’re not logging the number of bytes sent by the client to the server, but from the server to the client. If necessary you could extract this from the error.log though. The “264” at the end once again indicates it was streaming for 264 seconds.

Another thing, as we’ve heard that question pop up repeatedly. The log entry is only generated once the listener/source actually disconnects, as only then we know all the details that should go into that line. If you are looking for real time statistics, then the access.log is the wrong place, sorry.

There are also some closed source solutions, but I’ve never used them and I don’t think they provide significant benefit over the available open source solutions mentioned above.

If you know a good Icecast tool, not only for log analysis, please write a comment or send a mail to one of the Icecast mailing lists.

by Thomas B. Rücker at June 02, 2013 06:52 PM

Chris Pearce : HTML5 video playbackRate and Ogg chaining support landed in Firefox

Paul Adenot has recently landed patches in Firefox to enable the playbackRate attribute on HTML5 <audio> and <video> elements.

This is a cool feature that I've been looking forward to for a while; it means users can speed up playback of videos (and audio) so that you can for example watch presentations sped up and only slow down for the interesting bits. Currently Firefox's <video> controls don't have support for playbackRate, but it is accessible from JavaScript, and hopefully we'll get support added to our built-in controls soon.

Paul has also finally landed support for Ogg chaining. This has been strongly desired for quite some time by community members, and the final patch also had contributions from "oneman" (David Richards), who also was a strong advocate for this feature.

We decided to reduce the scope of our chaining implementation in order to make it easier and quicker to implement. We targeted the features most desired by internet radio providers, and so we only support chaining in Ogg Vorbis and Opus audio files and we disable seeking in chained files.

Thanks Paul and David for working on these features!

by Chris Pearce (noreply@blogger.com) at December 23, 2012 02:28 AM

Thomas Daede : Centaurus 3 Lights

Centaurus 3′s lights gathered some attention during FSGP 2012. Lights are a surprisingly hard thing to get right on a solar car – they need to be bright, efficient, aerodynamic, and pretty.

Previous solar cars have tried many things, with varying success. A simple plastic window and row of T1 LEDs works well, but if the window needs to be a strange shape, it can get difficult and ugly. Acrylic light pipes make very thin and aerodynamic lights, but it can be difficult to get high efficiency, or even acceptable brightness.

There are two general classes of LEDs. One is the low power, highly focused, signaling types of LEDs. These generally draw less than 75mA, and usually come with focused or diffuse lenses, such as a T1 3/4 case. This is also the type of LED in the ASC 2012 reference lights. The other type of LED is the high brightness lighting-class LED, such as those made by Philips, Cree, or Seoul Semiconductor. These usually draw 350mA or more, are rated for total lumen output rather than millicandelas+viewing angle, and require a heat sink.

We eventually chose the low-current type of LEDs, because the lumen count we needed was low, and the need for heatsinking would significantly complicate matters. We ended up using the 30-LED version of the ASC 2012 reference lighting. The outer casing is ugly, so the LED PCB was removed using a utility knife.

Now we need a new lens. I originally looked for acrylic casting, but settled on Alumilite Water-Clear urethane casting material. This material is much stronger than acrylic and supposedly easier to use.

This product is designed to be used with rubber molds. Instead, I used our team’s sponsorship from Stratasys to 3D print some molds:

The molds are made as two halves which bolt together with AN3 fasteners. They are sanded, painted, and sanded again, then coated with mold release wax and PVA. Then, each part of the alumilite urethane is placed in a vacuum pot to remove bubbles. Then the two parts are mixed, and poured into the mold. Next, the lights are dipped into the mold, suspended by two metal pins. The wires are bent and taped around the outside of the mold to hold the LED lights in place. Then, the entire mold is placed in the vacuum pot for about one minute to pull out more bubbles, then the mold is moved to an oven and allowed to cure for 4 more hours at 60C. Using an oven is not strictly necessary but makes the process happen faster and makes sure it reaches full hardness.

Once cured, the lights are removed, epoxied into place in the shell, and bondo’d. Then they are bondo’d and masked before sending off to paint. The masked area is somewhat smaller than the size of the cast part, to make the lights look more seamless and allow room for bondo to fix any bumps.

The lights generally come out with a rather diffuse surface, however it becomes water clear after the clear coat of automotive paint. The finished product looks like this:


by Thomas Daede at October 27, 2012 03:19 PM

Ben Schwartz : It’s Google

I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

by Ben at June 29, 2012 05:10 PM

David Schleef : GStreamer Streaming Server Library

Introducing the GStreamer Streaming Server Library, or GSS for short.

This post was originally intended to be a release announcement, but I started to wander off to work on other projects before the release was 100% complete.  So perhaps this is a pre-announcement.  Or it’s merely an informational piece with the bonus that the source code repository is in a pretty stable and bug-free state at the moment.  I tagged it with “gss-0.5.0″.

What it is

GSS is a standalone HTTP server implemented as a library.  Its special focus is to serve live video streams to thousands of clients, mainly for use inside an HTML5 video tag.  It’s based on GStreamer, libsoup, and json-glib, and uses Bootstrap and BrowserID in the user interface.

GSS comes with a streaming server application that is essentially a small wrapper around the library.  This application is referred to as the Entropy Wave Streaming Server (ew-stream-server); the code that is now GSS was originally split out of this application.  The app can be found in the tools/ directory in the source tree.


  • Streaming formats: WebM, Ogg, MPEG-TS.  (FLV support is waiting for a flvparse element in GStreamer.)
  • Streams in different formats/sizes/bitrates are bundled into a single “program”.
  • Streaming to Flash via HTTP.
  • Authentication using BrowserID.
  • Automatic conversion from properly formed MPEG-TS to HTTP Live Streaming.
  • Automatic conversion to RTP/RTSP (Experiemental, works for Ogg/Theora/Vorbis only.)
  • Stream upload via HTTP PUT (3 different varieties), Icecast, raw TCP socket.
  • Stream pull from another HTTP streaming server.
  • Content protection via automatic one-time URLs.
  • (Experimental) Video-on-Demand stream types.
  • Per-stream, per-program, and server metrics.
  • HTTP configuration interface and REST API is used to control the server, allowing standalone operation and easy integration with other web servers.

What’s not there?

  • Other types of authentication, LDAP or other authorization.
  • RTMP support.  (Maybe some day, but there are several good open-source Flash servers out there already.)
  • Support for upload using HTTP PUT with no 100-Continue header.  Several HTTP libraries do this.
  • Decent VOD support, with rate-controlled streaming, burst start, and seeking.

The details


by David Schleef at June 14, 2012 06:21 PM