Xiph logo

Monty : I'll be speaking in Norway next week (Oslo and Bergen)!

Hello all! I'm traveling to Norway this week to give a few talks and a workshop on online video, all sponsored by the Norwegian Unix Users Group. The workshop is likely to cover technical details of the new AV1 codec (and free codecs in general), while the talks are intended more as a free codec tea party.

Talks and meetups: "Fighting for free video - technical tactics and war stories from the FOSS audio and video codec frontier"

The talks are free and open to the public.

Workshop: "Audio and video format workshop with Xiphmont"

  • Wednesday April 24 at 10:00am at Cisco

The workshop is open to interested professionals, and I expect it to be more technical. Please contact NUUG (information in the links above) if you wish to attend.

Do come!

April 15, 2019 06:01 PM

Jean-Marc Valin : How Opus Came To Be

Note: This is a first-person account of my involvement in Opus. Since I was not part of the early SILK efforts mentioned below, I cannot speak about its early development. That part is omitted from this account but by no means is that intended to diminish its importance to Opus.

Opus is an open-source, royalty-free, highly versatile audio codec standard. It is now deployed in billions of devices. This is how it came to be. Even before Opus, I had a strong interest in open standards, which led me to start the Speex project in 2002, with help from David Rowe. Speex was one of the first modern royalty-free speech codecs. It was shipped in many applications, especially games, but because it was slightly inferior to the standard codecs of the time, it never achieved a critical mass of deployment.

In 2007, when working on a high-quality videoconferencing project as part of my post-doc, I realized the need for a high-fidelity audio codec that also had very low delay suitable for interactive, real-time applications. At the time, audio codecs were mostly divided into two categories: there were high-delay, high-fidelity transform codecs (like MP3, AAC, and Vorbis) that were unsuitable for real-time operation, and there were low-delay speech codecs (like AMR, Speex, and G.729) with limited audio quality.

That is why I started the Opus ancestor called CELT, an effort to create a high-fidelity transform codec with an ultra low delay around 4-8 ms — even lower than the 20 ms typical delay for VoIP and videoconferencing. My first step was to discuss with Christopher "Monty" Montgomery, who had previously designed Ogg Vorbis, a high-delay, high-fidelity codec, and was then looking at designing a successor. Even though our sets of goals proved too different for us to merge the two efforts, the discussion was very helpful in that I was able to gain some of the experience Monty got while designing Vorbis. The most important advice I got was "always make sure the shape of the energy spectrum is preserved". In Vorbis (and other codecs), that energy constraint was only partially achieved, through very careful tuning of the encoder, and sometimes at great cost in bitrate. For CELT, I attacked the problem from a different perspective: What if CELT could be designed so that the constraint was built into the format, and thus mathematically impossible to violate?

This is where the CELT name originated: Constrained Energy Lapped Transform. The format itself would constrain the energy so that no effort or bits would be wasted. Although simple in principle, that idea required completely new compression and math techniques that had never previously been used in transform codecs. One of them was algebraic vector quantization, which had been used for a long time in speech codecs, but never in transform codecs, which still used scalar quantization. Overall, it took about 2 years to figure out the core of the CELT technology, with the help of Tim Terriberry, Greg Maxwell, and other Xiph contributors.

Because of the ultra low delay constraint, CELT was not trying to match or exceed the bitrate efficiency of MP3 and AAC, since these codecs benefited from a long delay (100-200 ms). It was thus a complete surprise when — only 6 months after the first commits — a listening test showed CELT already out-performing MP3 despite the difference in delay. That was attributed to the ancient technology behind MP3. CELT was still behind the more recent AAC, with no plan to compete on efficiency alone.

Despite still being in early development, some people started using CELT for their projects, mostly because it was the only codec that would suit their needs. These early users greatly helped CELT to improve by providing real-life use cases and raising issues that could not have been foreseen with just "lab" testing. For example, a developer who was using CELT for network music performances (musicians playing live together in different cities) once complained that "CELT works very well for everyone, except for me with my bass guitar". By getting an actual sample, it was easy to find the problem and address it. There were many similar stories and over a few years, many parts of CELT were changed or completely rewritten.

There has been no mention of the name Opus so far because there was still a missing piece. Around the same time CELT was getting started, another codec effort was quietly started by Koen Vos, Søren Skak Jensen, and Karsten Vandborg Sørensen at Skype under the name SILK. SILK was a more traditional speech codec, but with state-of-the-art efficiency, competing with or exceeding other speech codecs. We became aware of SILK in 2009 when Skype proposed it as a royalty-free codec to the Internet Engineering Task Force (IETF), the main standards body governing the Internet. We immediately joined the effort, proposing CELT to the emerging working group. It was a highly political effort, given the presence of organizations heavily invested in royalty-bearing codecs. There was thus strong pressure into restricting the working group’s effort to standardizing a single codec. That drove us to investigate ways to combine SILK and CELT. The two codecs were surprisingly complementary, SILK being more efficient at coding speech up to 8 kHz, and CELT being more efficient at coding music and achieving delays below 10 ms. The only thing none of the codecs did very efficiently was coding high-quality speech covering the full audio bandwidth (up to 20 kHz). This is where both SILK and CELT could be used simultaneously and achieve high-quality, fullband speech codec at just 32 kb/s, something no other codecs could achieve. Opus was born and, thanks to the IETF collaboration, the result would be better than the sum of its SILK and CELT parts.

Integrating SILK and CELT required changes to both technologies. On the CELT side, it meant supporting and optimizing for frame sizes up to 20 ms — no longer ultra-low delay but low enough for videoconferencing. Through collaboration in the working group, CELT also gained a perceptual pitch post-filter contributed by Raymond Chen at Broadcom. The post-filter and the 20-ms frames also increased the efficiency to the point where some audio enthusiasts started comparing Opus to HE-AAC on music compression. Unsurprisingly, they found the higher-delay HE-AAC to have higher quality at the same bitrate, but they also started providing specific feedback that helped improve Opus. This went on for several months, until a listening test eventually showed Opus having higher quality than HE-AAC, despite HE-AAC being designed for much higher delays. At that point, Opus really became a universal audio codec. It was either on par or better than all other audio codecs, regardless of the application, be it speech, music, real-time, storage, or streaming.

Opus officially became an IETF standard in 2012. At the time, the IETF was also defining the WebRTC standard for videoconferencing on the web. Thanks to its efficiency and its royalty-free nature, Opus became the mandatory-to-implement standard for WebRTC. In part thanks to WebRTC, Opus is now included in all major browsers and in both the Android and iOS mobile operating systems. It is also used alongside AV1 in YouTube. Most large technology companies now ship products using Opus. This ensures inter-operability across different applications that can communicate with a common codec. Because there are no royalties, it also enables some products that would not otherwise be viable (e.g. because you can't afford to pay a 0.50$ royalty for each freely-downloaded copy of a client software).

As for many other codecs, only the Opus decoder is standardized, which means that the encoder can keep improving without breaking compatibility. This is how Opus keeps improving to this day, with the latest version, Opus 1.3, released in October 2018.

April 05, 2019 04:16 PM

Monty : Another new experimental codec from Xiph.Org!

Jean-Marc Valin has been applying deep learning frameworks to audio over the past few years. So far he's released RNNoise (a surprisingly good/fast denoising system) and LPCNet (a speech synthesis system along the lines of WaveNet, but fast enough to use realtime on commodity hardware).

Now he's built a codec out of LPCNet.

"A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet" presents a new wideband speech codec built out of the best parts of a brutally speed and space efficient vocoder paired with deep-learning analysis and excitation. It's alpha-grade research in a lot of ways, but decidedly not vapourware. You can download the source and play with it now, but first, go have a look at the demo page.

March 29, 2019 07:55 PM

Jean-Marc Valin : A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

This is a follow-up on the first LPCNet demo. In this new demo, we turn LPCNet into a very low-bitrate neural speech codec (see submitted paper) that's actually usable on current hardware and even on phones. It's the first time a neural vocoder is able to run in real-time using just one CPU core on a phone (as opposed to a high-end GPU). The resulting bitrate — just 1.6 kb/s — is about 10 times less than what wideband codecs typically use. The quality is much better than existing very low bitrate vocoders and comparable to that of more traditional codecs using a higher bitrate.

Read More

March 29, 2019 01:15 PM

Jean-Marc Valin : LPCNet: DSP-Boosted Neural Speech Synthesis

This new demo presents LPCNet, an architecture that combines signal processing and deep learning to improve the efficiency of neural speech synthesis. Neural speech synthesis models like WaveNet have recently demonstrated impressive speech synthesis quality. Unfortunately, their computational complexity has made them hard to use in real-time, especially on phones. As was the case in the RNNoise project, one solution is to use a combination of deep learning and digital signal processing (DSP) techniques. This demo explains the motivations for LPCNet, shows what it can achieve, and explores its possible applications.

Read More

November 20, 2018 02:11 PM

Monty : Opus 1.3 "The Everything + Ambisonics Release"!

Performance-wise, 1.3 is the biggest Opus update so far. Several commenters have suggested it should have been named 2.0, but full interoperability is important, and '2.0' could suggest we broke compatibility somehow (we haven't).

Opus 1.3 adds full Ambisonics surround support, improves the built in speech/music detection, and greatly improves low bitrate performance. Wideband speech now goes down to 9kbps, narrowband to 5kpbs, and stereo performance improves especially in the 24-32kbps range.

Go forth and deploy!

October 19, 2018 01:47 AM

Jean-Marc Valin : Opus 1.3 is out!

Opus gets another major update with the release of version 1.3. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. This is also the first release with Ambisonics support. This Opus 1.3 demo describes a few of the upgrades that users and implementers will care about the most. You can download the new version from the Opus website.

October 18, 2018 04:01 PM

Monty : next generation video: Introducing AV1, part1: Chroma from Luma

My first technical writing regarding the new AV1 codec is up at Xiph.Org. We've been working on AV1, heads-down, for a long time and my writing took a hiatus for almost that entire period. Now that it's out, it's time to continue the technology pages that I started with Daala.

"AV1 is a new general-purpose video codec developed by the Alliance for Open Media. The alliance began development of this new codec using Google's VPX codecs, Cisco's Thor codec, and Mozilla's/Xiph.Org's Daala codec as starting point. AV1 leapfrogs the performance of VP9 and HEVC, making it a next-next-generation codec. The AV1 format is and will always be royalty-free with a permissive FOSS license."

April 09, 2018 06:46 PM

Jean-Marc Valin : RNNoise: Learning Noise Suppression


This demo presents the RNNoise project, showing how deep learning can be applied to noise suppression. The main idea is to combine classic signal processing with deep learning to create a real-time noise suppression algorithm that's small and fast. No expensive GPUs required — it runs easily on a Raspberry Pi. The result is much simpler (easier to tune) and sounds better than traditional noise suppression systems (been there!).

Read More

September 27, 2017 02:51 AM

Monty : Opus 1.2 released!

I'm about to get an official press release together, but in the meantime, I'm pleased to announce we've released Opus 1.2!

Quoting Jean-Marc Valin, the Opus lead developer:

Opus gets another major upgrade with the release of version 1.2. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. There are also optimizations, new options, as well as many bug fixes. This Opus 1.2 demo describes a few of the upgrades that users and implementers will care about the most. You can download the code from the Opus website.

June 21, 2017 12:36 AM

Jean-Marc Valin : Opus 1.2 is out

Opus gets another major upgrade with the release of version 1.2. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. There are also optimizations, new options, as well as many bug fixes. This Opus 1.2 demo describes a few of the upgrades that users and implementers will care about the most. You can download the code from the Opus website.

June 20, 2017 07:01 PM

Monty : MP3: It's Free, Not Dead.

Last week, Fraunhofer and Thomson suspended their MP3 patent licensing program because the patents expired. We can finally welcome MP3 into the family of truly Free codecs!

Then came a press push calling MP3 dead. That's dumb. Fraunhofer is only calling MP3 dead to push unwary customers into 'upgrading' to AAC for which they can still charge patent fees.

This is a bit like the family pediatrician telling you that your perfectly healthy child in college is dead-- and solemnly suggesting you have another child immediately. Just to keep making money off of you.

I would call that disingenuous at best.

No, MP3 isn't dead, and it's not pining for any fjords. The money that Thomson and Fraunhofer were previously collecting in patent royalties now stays in your (and everyone else's) bottom line. Don't license something new and unnecessary just to spend more money.

If you really do need something more advanced than MP3, the best alternatives are also open and royalty-free. Vorbis is the mature alternative with 20 years of wide deployment under its belt. Better yet, consider Opus, the world's most advanced officially standardized codec.

That said, the network effects that have kept MP3 dominant for so long just got stronger. Nothing beats its level of interoperability and support. There's no reason to jump off a thoroughbred that’s still increasing its lead.

May 22, 2017 07:00 PM

Monty : Gentlemen, we have a new naming scheme

Seriously, there are so many winners on the full list I don't even know where to start.

May 20, 2017 12:13 AM

Silvia Pfeiffer : Annual Release of External-Videos plugin – we’ve hit v1.0

This is the annual release of my external-videos wordpress plugin and with the help of  Andrew Nimmolo I’m proud to annouce we’ve reached version 1.0!

So yes, my external-videos wordpress plugin is now roughly 7 years old, who would have thought! During the year, I don’t get the luxury of spending time on maintaining this open source love child of mine, but at Christmas, my bad conscience catches up with me  – every year! I then spend some time going through bug reports, upgrading the plugin to the latest wordpress version, upgrading to the latest video site APIs, testing functionality and of course making a new release.

This year has been quite special. The power of open source has kicked in and a new developer took an interest in external-videos. Andrew Nimmolo submitted patches over all of 2016. He decided to bring the external-videos plugin into the new decade with a huge update to the layout of the settings pages, general improvements, and an all-round update of all the video site APIs which included removing their overly complex SDKs and going straight for the REST APIs.

Therefore, I’m very proud to be able to release version 1.0 today. Thanks, Andrew!

Enjoy – and I look forward to many more contributions – have a Happy 2017!

NOTE: If you’re upgrading from an older version, you might need to remove and re-add your social video sites because the API details have changed a bit. Also, we noticed that there were layout issues on WordPress 4.3.7, so try and make sure your WordPress version is up to date.

by silvia at January 13, 2017 10:55 PM

Ben Schwartz : The Clock

I watched The Clock at the Boston Museum of Fine Arts from about 4:45 to 6:30 on Friday. My thoughts on The Clock:

The experiment works in part because scenes with clocks in them are usually frenetic. In a movie, the presence of a clock usually means someone is in a rush, and so most of the sequences convey urgency.

It’s often hard to spot the clock in each scene. In less exciting sequences, this serves as a game to pass the time. In many cases the clock in question is never in focus, or is moving too fast for the viewer to notice. The editors must have done careful freeze-frames and zoomed in on wristwatches to work out the indicated time.

The selected films are mostly in English, with a fair number in French and very few in any other languages. This feels fairly arbitrary to me.

Scenes from multiple films are often mixed within each segment. It seems like the editors adopted a relaxed rule, maybe something like: “if a clock appeared in an original, then a one minute window around the moment of appearance is fair game to include during that minute of The Clock, spliced together with other clips in any order”.

The editing makes heavy use of L cuts and audio crossfades to make the fairly random assortment of sources feel more cohesive.

I swear I saw a young Michael Cain at least twice in two different roles.

Some of the sources were distinctly low-fidelity, often due to framerate matching issues. I think this might be the first production I’ve seen that would really have benefited from a full Variable Frame Rate render and display pipeline.

I started to wonder about connections to deep learning. Could we train an image captioning network to identify images of clocks and watches, then run it on a massive video corpus to generate The Clock automatically?

Or, could we construct a spatial analogue to The Clock’s time-for-time conceit? How about a service that notifies you of a film clip shot at your current location? With a large GPS-tagged corpus (or a location-finder neural network) it might be possible to do this with pretty broad coverage.

by Ben at November 28, 2016 03:03 AM

Silvia Pfeiffer : WebRTC predictions for 2016

I wrote these predictions in the first week of January and meant to publish them as encouragement to think about where WebRTC still needs some work. I’d like to be able to compare the state of WebRTC in the browser a year from now. Therefore, without further ado, here are my thoughts.

WebRTC Browser support

I’m quite optimistic when it comes to browser support for WebRTC. We have seen Edge bring in initial support last year and Apple looking to hire engineers to implement WebRTC. My prediction is that we will see the following developments in 2016:

  • Edge will become interoperable with Chrome and Firefox, i.e. it will publish VP8/VP9 and H.264/H.265 support
  • Firefox of course continues to support both VP8/VP9 and H.264/H.265
  • Chrome will follow the spec and implement H.264/H.265 support (to add to their already existing VP8/VP9 support)
  • Safari will enter the WebRTC space but only with H.264/H.265 support

Codec Observations

With Edge and Safari entering the WebRTC space, there will be a larger focus on H.264/H.265. It will help with creating interoperability between the browsers.

However, since there are so many flavours of H.264/H.265, I expect that when different browsers are used at different endpoints, we will get poor quality video calls because of having to negotiate a common denominator. Certainly, baseline will work interoperably, but better encoding quality and lower bandwidth will only be achieved if all endpoints use the same browser.

Thus, we will get to the funny situation where we buy ourselves interoperability at the cost of video quality and bandwidth. I’d call that a “degree of interoperability” and not the best possible outcome.

I’m going to go out on a limb and say that at this stage, Google is going to consider strongly to improve the case of VP8/VP9 by improving its bandwidth adaptability: I think they will buy themselves some SVC capability and make VP9 the best quality codec for live video conferencing. Thus, when Safari eventually follows the standard and also implements VP8/VP9 support, the interoperability win of H.264/H.265 will become only temporary overshadowed by a vastly better video quality when using VP9.

The Enterprise Boundary

Like all video conferencing technology, WebRTC is having a hard time dealing with the corporate boundary: firewalls and proxies get in the way of setting up video connections from within an enterprise to people outside.

The telco world has come up with the concept of SBCs (session border controller). SBCs come packed with functionality to deal with security, signalling protocol translation, Quality of Service policing, regulatory requirements, statistics, billing, and even media service like transcoding.

SBCs are a total overkill for a world where a large number of Web applications simply want to add a WebRTC feature – probably mostly to provide a video or audio customer support service, but it could be a live training session with call-in, or an interest group conference all.

We cannot install a custom SBC solution for every WebRTC service provider in every enterprise. That’s like saying we need a custom Web proxy for every Web server. It doesn’t scale.

Cloud services thrive on their ability to sell directly to an individual in an organisation on their credit card without that individual having to ask their IT department to put special rules in place. WebRTC will not make progress in the corporate environment unless this is fixed.

We need a solution that allows all WebRTC services to get through an enterprise firewall and enterprise proxy. I think the WebRTC standards have done pretty well with firewalls and connecting to a TURN server on port 443 will do the trick most of the time. But enterprise proxies are the next frontier.

What it takes is some kind of media packet forwarding service that sits on the firewall or in a proxy and allows WebRTC media packets through – maybe with some configuration that is necessary in the browsers or the Web app to add this service as another type of TURN server.

I don’t have a full understanding of the problems involved, but I think such a solution is vital before WebRTC can go mainstream. I expect that this year we will see some clever people coming up with a solution for this and a new type of product will be born and rolled out to enterprises around the world.


So these are my predictions. In summary, they address the key areas where I think WebRTC still has to make progress: interoperability between browsers, video quality at low bitrates, and the enterprise boundary. I’m really curious to see where we stand with these a year from now.

It’s worth mentioning Philipp Hancke’s tweet reply to my post:

— we saw some clever people come up with a solution already. Now it needs to be implemented đŸ™‚

by silvia at February 17, 2016 09:48 PM

Silvia Pfeiffer : SWAY at RFWS using Coviu

A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at February 14, 2016 09:02 PM

Silvia Pfeiffer : My journey to Coviu

My new startup just released our MVP – this is the story of what got me here.

I love creating new applications that let people do their work better or in a manner that wasn’t possible before.

German building and loan socityMy first such passion was as a student intern when I built a system for a building and loan association’s monthly customer magazine. The group I worked with was managing their advertiser contacts through a set of paper cards and I wrote a dBase based system (yes, that long ago) that would manage their customer relationships. They loved it – until it got replaced by an SAP system that cost 100 times what I cost them, had really poor UX, and only gave them half the functionality. It was a corporate system with ongoing support, which made all the difference to them.

Dr Scholz und Partner GmbHThe story repeated itself with a CRM for my Uncle’s construction company, and with a resume and quotation management system for Accenture right after Uni, both of which I left behind when I decided to go into research.

Even as a PhD student, I never lost sight of challenges that people were facing and wanted to develop technology to overcome problems. The aim of my PhD thesis was to prepare for the oncoming onslaught of audio and video on the Internet (yes, this was 1994!) by developing algorithms to automatically extract and locate information in such files, which would enable users to structure, index and search such content.

Many of the use cases that we explored are now part of products or continue to be challenges: finding music that matches your preferences, identifying music or video pieces e.g. to count ads on the radio or to mark copyright infringement, or the automated creation of video summaries such as trailers.


This continued when I joined the CSIRO in Australia – I was working on segmenting speech into words or talk spurts since that would simplify captioning & subtitling, and on MPEG-7 which was a (slightly over-engineered) standard to structure metadata about audio and video.

In 2001 I had the idea of replicating the Web for videos: i.e. creating hyperlinked and searchable video-only experiences. We called it “Annodex” for annotated and indexed video and it needed full-screen hyperlinked video in browsers – man were we ahead of our time! It was my first step into standards, got several IETF RFCs to my name, and started my involvement with open codecs through Xiph.

vquence logoAround the time that YouTube was founded in 2006, I founded Vquence – originally a video search company for the Web, but pivoted to a video metadata mining company. Vquence still exists and continues to sell its data to channel partners, but it lacks the user impact that has always driven my work.

As the video element started being developed for HTML5, I had to get involved. I contributed many use cases to the W3C, became a co-editor of the HTML5 spec and focused on video captioning with WebVTT while contracting to Mozilla and later to Google. We made huge progress and today the technology exists to publish video on the Web with captions, making the Web more inclusive for everybody. I contributed code to YouTube and Google Chrome, but was keen to make a bigger impact again.

NICTA logoThe opportunity came when a couple of former CSIRO colleagues who now worked for NICTA approached me to get me interested in addressing new use cases for video conferencing in the context of WebRTC. We worked on a kiosk-style solution to service delivery for large service organisations, particularly targeting government. The emerging WebRTC standard posed many technical challenges that we addressed by building rtc.io , by contributing to the standards, and registering bugs on the browsers.

Fast-forward through the development of a few further custom solutions for customers in health and education and we are starting to see patterns of need emerge. The core learning that we’ve come away with is that to get things done, you have to go beyond “talking heads” in a video call. It’s not just about seeing the other person, but much more about having a shared view of the things that need to be worked on and a shared way of interacting with them. Also, we learnt that the things that are being worked on are quite varied and may include multiple input cameras, digital documents, Web pages, applications, device data, controls, forms.

Coviu logoSo we set out to build a solution that would enable productive remote collaboration to take place. It would need to provide an excellent user experience, it would need to be simple to work with, provide for the standard use cases out of the box, yet be architected to be extensible for specialised data sharing needs that we knew some of our customers had. It would need to be usable directly on Coviu.com, but also able to integrate with specialised applications that some of our customers were already using, such as the applications that they spend most of their time in (CRMs, practice management systems, learning management systems, team chat systems). It would need to require our customers to sign up, yet their clients to join a call without sign-up.

Collaboration is a big problem. People are continuing to get more comfortable with technology and are less and less inclined to travel distances just to get a service done. In a country as large as Australia, where 12% of the population lives in rural and remote areas, people may not even be able to travel distances, particularly to receive or provide recurring or specialised services, or to achieve work/life balance. To make the world a global village, we need to be able to work together better remotely.

The need for collaboration is being recognised by specialised Web applications already, such as the LiveShare feature of Invision for Designers, Codassium for pair programming, or the recently announced Dropbox Paper. Few go all the way to video – WebRTC is still regarded as a complicated feature to support.

Coviu in action

With Coviu, we’d like to offer a collaboration feature to every Web app. We now have a Web app that provides a modern and beautifully designed collaboration interface. To enable other Web apps to integrate it, we are now developing an API. Integration may entail customisation of the data sharing part of Coviu – something Coviu has been designed for. How to replicate the data and keep it consistent when people collaborate remotely – that is where Coviu makes a difference.

We have started our journey and have just launched free signup to the Coviu base product, which allows individuals to own their own “room” (i.e. a fixed URL) in which to collaborate with others. A huge shout out goes to everyone in the Coviu team – a pretty amazing group of people – who have turned the app from an idea to reality. You are all awesome!

With Coviu you can share and annotate:

  • images (show your mum photos of your last holidays, or get feedback on an architecture diagram from a customer),
  • pdf files (give a presentation remotely, or walk a customer through a contract),
  • whiteboards (brainstorm with a colleague), and
  • share an application window (watch a YouTube video together, or work through your task list with your colleagues).

All of these are regarded as “shared documents” in Coviu and thus have zooming and annotations features and are listed in a document tray for ease of navigation.

This is just the beginning of how we want to make working together online more productive. Give it a go and let us know what you think.


by silvia at October 27, 2015 10:08 AM

Silvia Pfeiffer : WebVTT Audio Descriptions for Elephants Dream

When I set out to improve accessibility on the Web and we started developing WebSRT – later to be renamed to WebVTT – I needed an example video to demonstrate captions / subtitles, audio descriptions, transcripts, navigation markers and sign language.

I needed a freely available video with spoken text that either already had such data available or that I could create it for. Naturally I chose “Elephants Dream” by the Orange Open Movie Project , because it was created under the Creative Commons Attribution 2.5 license.

As it turned out, the Blender Foundation had already created a collection of SRT files that would represent the English original as well as the translated languages. I was able to reuse them by merely adding a WEBVTT header.

Then there was a need for a textual audio description. I read up on the plot online and finally wrote up a time-alignd audio description. I’m hereby making that file available under the Create Commons Attribution 4.0 license. I’ve added a few lines to the medadata headers so it doesn’t confuse players. Feel free to reuse at will – I know there are others out there that have a similar need to demonstrate accessibility features.

by silvia at March 10, 2015 12:50 PM

Silvia Pfeiffer : Progress with rtc.io

At the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.


Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.

WDCNZ presentation page5

One of the most exciting opportunities is still under-exploited: the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.

Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.


We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections:

Packages per day across popular platforms (Shamelessly copied from: http://blog.nodejitsu.com/npm-innovation-through-modularity/)

For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.

For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.

Using rtc.io rtc JS library
Using rtc.io rtc JS library

So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course: enjoy WebRTC! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team!

On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating!!

by silvia at August 12, 2014 07:06 AM

Silvia Pfeiffer : Nodebots Day Sydney 2014 (take 2)

Nodebots Day Sydney 2014 (take 2)

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 11:00 PM

Silvia Pfeiffer : Nodebots Day 2014 Sydney

Nodebots and WebRTC Day in Sydney, NICTA http://www.meetup.com/WebRTC-Sydney/events/173234022/

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 05:26 PM

Silvia Pfeiffer : AppRTC : Google’s WebRTC test app and its parameters

If you’ve been interested in WebRTC and haven’t lived under a rock, you will know about Google’s open source testing application for WebRTC: AppRTC.

When you go to the site, a new video conferencing room is automatically created for you and you can share the provided URL with somebody else and thus connect (make sure you’re using Google Chrome, Opera or Mozilla Firefox).

We’ve been using this application forever to check whether any issues with our own WebRTC applications are due to network connectivity issues, firewall issues, or browser bugs, in which case AppRTC breaks down, too. Otherwise we’re pretty sure to have to dig deeper into our own code.

Now, AppRTC creates a pretty poor quality video conference, because the browsers use a 640×480 resolution by default. However, there are many query parameters that can be added to the AppRTC URL through which the connection can be manipulated.

Here are my favourite parameters:

  • hd=true : turns on high definition, ie. minWidth=1280,minHeight=720
  • stereo=true : turns on stereo audio
  • debug=loopback : connect to yourself (great to check your own firewalls)
  • tt=60 : by default, the channel is closed after 30min – this gives you 60 (max 1440)

For example, here’s how a stereo, HD loopback test would look like: https://apprtc.appspot.com/?r=82313387&hd=true&stereo=true&debug=loopback .

This is not the limit of the available parameter, though. Here are some others that you may find interesting for some more in-depth geekery:

  • ss=[stunserver] : in case you want to test a different STUN server to the default Google ones
  • ts=[turnserver] : in case you want to test a different TURN server to the default Google ones
  • tp=[password] : password for the TURN server
  • audio=true&video=false : audio-only call
  • audio=false : video-only call
  • audio=googEchoCancellation=false,googAutoGainControl=true : disable echo cancellation and enable gain control
  • audio=googNoiseReduction=true : enable noise reduction (more Google-specific parameters)
  • asc=ISAC/16000 : preferred audio send codec is ISAC at 16kHz (use on Android)
  • arc=opus/48000 : preferred audio receive codec is opus at 48kHz
  • dtls=false : disable datagram transport layer security
  • dscp=true : enable DSCP
  • ipv6=true : enable IPv6

AppRTC’s source code is available here. And here is the file with the parameters (in case you want to check if they have changed).

Have fun playing with the main and always up-to-date WebRTC application: AppRTC.

UPDATE 12 May 2014

AppRTC now also supports the following bitrate controls:

  • arbr=[bitrate] : set audio receive bitrate
  • asbr=[bitrate] : set audio send bitrate
  • vsbr=[bitrate] : set video receive bitrate
  • vrbr=[bitrate] : set video send bitrate

Example usage: https://apprtc.appspot.com/?r=&asbr=128&vsbr=4096&hd=true

by silvia at July 23, 2014 05:02 AM

Ben Schwartz : Gooseberry

Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.

by Ben at May 07, 2014 04:16 AM

Ben Schwartz : Stingy/stylish

My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.


by Ben at February 18, 2014 02:58 AM

Ben Schwartz : Efficiency

So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11” output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

by Ben at January 21, 2014 07:00 AM

Thomas Daede : Zynq DMA performance

I finally got my inverse transform up to snuff, complete with hardware matrix transposer. The transposer was trivial to implement – I realized that I could use the Xilinx data width converter IP to register entire 4×4 blocks at once, allowing my transposer to simply be a bunch of wires (assign statements in Verilog).

Screenshot from 2014-01-08 22:04:27

Unfortunately, I wasn’t getting the performance I was expecting. At a clock speed of 100MHz and a 64-bit width, I expected to be able to perform 25 million transforms per second. However, I was having trouble even getting 4 million. To debug the problem, I used the Xilinx debug cores in Vivado:


There are several problems. Here’s an explanation of what is happening in the above picture:

  1. The CPU configures the DMA registers and starts the transfer. This works for a few clock cycles.
  2. The Stream to Memory DMA (s2mm) starts a memory transfer, but its FIFOs fill up almost immediately and it has to stall (tready goes low).
  3. The transform stream pipeline also stalls, making its tready go low.
  4. The s2mm DMA is able to start its first burst transfer, and everything goes smoothly.
  5. The CPU sees that the DMA has completed, and schedules the second pass. The turnaround time for this is extremely large, and ends up taking the majority of the time.
  6. The same process happens again, but the latency is even larger due to writing to system memory.

Fortunately, the solution isn’t that complicated. I am going to switch to a scatter-gather DMA engine, which allows me to construct a request chain, and then the DMA will execute the operations without CPU intervention, avoiding the CPU latency. In addition, a FIFO can be used to reduce the impact of the initial write latency somewhat, though this costs FPGA area and it might be better just to strive for longer DMA requests.

There are other problems with my memory access at the moment – the most egregious being that my hardware expects a tiled buffer, but the Daala reference implementation uses linear buffers everywhere. This is the problem that I plan to tackle next.

by Thomas Daede at January 09, 2014 04:26 AM

Silvia Pfeiffer : Use deck.js as a remote presentation tool

deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.

Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.

I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.

I loaded a slide master URL:
and the room loaded the URL without query string:

Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.

How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!

We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!

Here are the few things we added to deck.js to make it work:

  • Code added to index.html to make the video connection work:
    <meta name="rtc-signalhost" content="http://rtc.io/switchboard/">
    <meta name="rtc-room" content="lca2014">
    <video id="localV" rtc-capture="camera" muted></video>
    <video id="peerV" rtc-peer rtc-stream="localV"></video>
    <script src="glue.js"></script>
    glue.config.iceServers = [{ url: 'stun:stun.l.google.com:19302' }];

    The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.

  • Added glue.js library to deck.js:

    Downloaded from https://raw.github.com/rtc-io/rtc-glue/master/dist/glue.js into the source directory of deck.js.

  • Code added to index.html to synchronize slide navigation:
    glue.events.once('connected', function(signaller) {
      if (location.search.slice(1) !== '') {
        $(document).bind('deck.change', function(evt, from, to) {
          signaller.send('/slide', {
            idx: to,
            sender: signaller.id
      signaller.on('slide', function(data) {
        console.log('received notification to change to slide: ', data.idx);
        $.deck('go', data.idx);

    This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.

And that’s it!

You can find my slide deck on GitHub.

Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!

Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.

Many thanks to Damon Oehlman for his help in getting this working.

BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. đŸ˜‰

by silvia at January 08, 2014 02:28 AM

Maik Merten : Tinkering with a H.261 encoder

On the rtcweb mailing list the struggle regarding what Mandatory To Implement (MTI) video codec should be chosen rages on. One camp favors H.264 ("We cannot have VP8"), the other VP8 ("We cannot have H.264"). Some propose that there should be a "safe" fallback codec anyone can implement, and H.261 is "as old and safe" as it can get. H.261 was specified in the final years of the 1980ies and is generally believed to have no non-expired patents left standing. Roughly speaking, this old gem of coding technology can transport CIF resolution (352x288) video at full framerate (>= 25 fps) with (depending on your definition) acceptable quality starting roughly in the 250 to 500 kbit/s range (basically, I've witnessed quite some Skype calls with similar perceived effective resolution, mostly driven by mediocre webcams, and I can live with that as long as the audio part is okay). From today's perspective, H.261 is very very light on computation, memory, and code footprint.

H.261 is, of course, outgunned by any semi-decent more modern video codec, which can, for instance, deliver video with higher resolution at similar bitrates. Those, however, don't have the luxury of having their patents expired with "as good as it can be" certainty.

People on the rtcweb list were quick to point out that having an encoder with modern encoding techniques may by itself be problematic regarding patents. Thankfully, for H.261, a public domain reference-style en- and decoder from 1993 can be found, e.g., at http://wftp3.itu.int/av-arch/video-site/h261/PVRG_Software/P64v1.2.2.tar - so that's a nice reference on what encoding technologies were state-of-the-art in the early 1990ies.

With some initial patching done by Ron Lee this old code builds quite nicely on modern platforms - and as it turns out, the encoder also produces intact video, even on x86_64 machines or on a Raspberry Pi. Quite remarkably portable C code (although not the cleanest style-wise). The original code is slow, though: It barely does realtime encoding of 352x288 video on a 1.65 GHz netbook, and I can barely imagine having it encode video on machines from 1993! Some fun was had in making it faster (it's now about three times faster than before) and the program can now encode from and decode to YUV4MPEG2 files, which is quite a lot more handy than the old mode of operation (still supported), where each frame would consist of three files (.Y, .U, .V).

For those interested, the patched up code is available at https://github.com/maikmerten/p64 - however, be aware that the original coding style (yay, global variables) is still alive and well.

So, is it "useful" to resurrect such an old codebase? Depends on the definition of "useful". For me, as long as it is fun and teaches me something, it's reasonably useful.

So is it fun to toy around with this ancient coding technology? Yes, especially as most modern codecs still follow the same overall design, but H.261 is the most basic instance of "modern" video coding and thus most easy to grasp. Who knows, with some helpful comments here and there that old codebase could be used for teaching basic principles of video coding.

by maikmerten at January 07, 2014 07:35 PM

Thomas Daede : Off by one

My hardware had a bug that made about 50% of the outputs off by one. I compared my Verilog code to the original C and it was a one-for-one match, with the exception of a few OD_DCT_RSHIFT that I translated into arithmetic shifts. That turned out to break the transform. Looking at the definition of OD_DCT_RSHIFT:

/*This should translate directly to 3 or 4 instructions for a constant _b:
#define OD_UNBIASED_RSHIFT(_a,_b) ((_a)+(((1<<(_b))-1)&-((_a)<0))>>(_b))*/
/*This version relies on a smart compiler:*/
# define OD_UNBIASED_RSHIFT(_a, _b) ((_a)/(1<<(_b)))

I had always thought a divide by a power of two equals a shift, but this is wrong: an integer divide rounds towards zero, whereas a shift rounds towards negative infinity. The solution is simple: if the value to be shifted is less than zero, add a mask before shifting. Rather than write this logic in Verilog, I simply switched my code to the / operator as in the C code above, and XST inferred the correct logic. After verifying operation with random inputs, I also wrote a small benchmark to test the performance of my hardware:

[root@alarm hwtests]# ./idct4_test 
Filling input buffer... Done.
Running software benchmark... Done.
Time: 0.030000 s
Running hardware benchmark... Done.
Time: 0.960000 s

Not too impressive, but the implementation is super basic, so it’s not unsurprising. Most of the time is spent shuffling data across the extremely slow MMIO interface.

At the same time I was trying to figure out why the 16-bit version of the intra predictor performed uniformly worse than the double precision version – I thought 16 bits ought to be enough. The conversion is done by expanding the range to fill a 16 bit integer and then rounding:

OD_PRED_WEIGHTS_4x4_FIXED[i] = (od_coeff)floor(OD_PRED_WEIGHTS_4x4[i] * 32768 + 0.5);

The 16 bit multiplies with the 16 bit coefficients are summed in a 32 bit accumulator. The result is then truncated to the original range. I did this with a right shift – after the previous ordeal, I tried swapping it with the “round towards zero” macro. Here are the results:


The new 16 bit version even manages to outperform the double version slightly. I believe the reason why the “round to zero” does better than simply rounding down is because it tends to create a slightly negative bias in the encoded coefficients, decreasing coding gain.

by Thomas Daede at January 03, 2014 12:39 AM

Thomas Daede : Daala 4-input idct (tenatively) working!

I’ve implemented Daala’s smallest inverse transform in Verilog code. It appears as an AXI-Lite slave, with 2 32-bit registers for input and two for output. Right now it can do one transform per clock cycle, though at a pitiful 20MHz. I also haven’t verified that all of its output values are identical to the C version yet, but it passes preliminary testing with peek and poke.

Screenshot from 2014-01-01 23:23:22

I also have yet to figure out why XST only uses one DSP block even though my design has 3 multipliers…

by Thomas Daede at January 02, 2014 05:25 AM

Thomas Daede : Quantized intra predictor

Daala’s intra predictor currently uses doubles. Floating point math units are really expensive in hardware, and so is loading 64 bit weights. Therefore, I modified Daala to see what would happen if the weights were rounded to signed 16 bit. The result is below:ssimRed is before quantization, green after. This is too much loss – I’ll have to figure out why this happened. Worst case I move to 32 bit weights, though maybe my floor(+0.5) method of rounding is also suspect? Maybe the intra weights should be trained taking quantization into account?

by Thomas Daede at December 31, 2013 11:37 PM

Thomas Daede : First Zynq bitstream working!

Screenshot from 2013-12-31 14:51:20I got my first custom PL hardware working! Following the Zedboard tutorials, it was relatively straightforward, though using Vivado 2013.3 required a bit of playing around – I ended up making my own clock sources and reset controller until I realized that the Zynq PS had them if you enabled them. Next up: ChipScope or whatever it’s called in Vivado.

I crashed the chip numerous times until realizing that the bitstream file name had changed somewhere in the process, so I was uploading an old version of the bitstream….

by Thomas Daede at December 31, 2013 08:56 PM

Thomas Daede : Daala profiling on ARM

I reran the same decoding as yesterday, but this time on the Zynq Cortex-A9 instead of x86. Following is the histogram data, again with the functions I plan to accelerate highlighted:

 19.60%  lt-dump_video  [.] od_intra_pred16x16_mult
  6.66%  lt-dump_video  [.] od_intra_pred8x8_mult
  6.02%  lt-dump_video  [.] od_bin_idct16
  4.88%  lt-dump_video  [.] .divsi3_skip_div0_test
  4.54%  lt-dump_video  [.] od_bands_from_raster
  4.21%  lt-dump_video  [.] laplace_decode
  4.03%  lt-dump_video  [.] od_chroma_pred
  3.92%  lt-dump_video  [.] od_raster_from_bands
  3.66%  lt-dump_video  [.] od_post_filter16
  3.20%  lt-dump_video  [.] od_intra_pred4x4_mult
  3.09%  lt-dump_video  [.] od_apply_filter_cols
  3.08%  lt-dump_video  [.] od_bin_idct8
  2.60%  lt-dump_video  [.] od_post_filter8
  2.00%  lt-dump_video  [.] od_tf_down_hv
  1.69%  lt-dump_video  [.] od_intra_pred_cdf
  1.55%  lt-dump_video  [.] od_ec_decode_cdf_unscaled
  1.46%  lt-dump_video  [.] od_post_filter4
  1.45%  lt-dump_video  [.] od_convert_intra_coeffs
  1.44%  lt-dump_video  [.] od_convert_block_down
  1.28%  lt-dump_video  [.] generic_model_update
  1.24%  lt-dump_video  [.] pvq_decoder
  1.21%  lt-dump_video  [.] od_bin_idct4

The results are very similar as expected to x86, however there are a few oddities. One is that the intra prediction is even slower than on x86, and another is that the software division routine shows up relatively high in the list. It turns out that the division comes from the inverse lapping filters – although division by a constant can be replaced by a fixed point multiply, the compiler seems not to have done this, which hurts performance a lot.

For fun, let’s see what happens when remove the costly transforms and force 4×4 block sizes only:

 26.21%  lt-dump_video  [.] od_intra_pred4x4_mult
  7.35%  lt-dump_video  [.] od_intra_pred_cdf
  6.28%  lt-dump_video  [.] od_post_filter4
  6.17%  lt-dump_video  [.] od_chroma_pred
  5.77%  lt-dump_video  [.] od_bin_idct4
  4.04%  lt-dump_video  [.] od_bands_from_raster
  3.94%  lt-dump_video  [.] generic_model_update
  3.86%  lt-dump_video  [.] od_apply_filter_cols
  3.64%  lt-dump_video  [.] od_raster_from_bands
  3.29%  lt-dump_video  [.] .divsi3_skip_div0_test
  2.47%  lt-dump_video  [.] od_convert_intra_coeffs
  2.07%  lt-dump_video  [.] od_intra_pred4x4_get
  1.95%  lt-dump_video  [.] od_apply_postfilter
  1.82%  lt-dump_video  [.] od_tf_up_hv_lp
  1.81%  lt-dump_video  [.] laplace_decode
  1.74%  lt-dump_video  [.] od_ec_decode_cdf
  1.67%  lt-dump_video  [.] pvq_decode_delta
  1.61%  lt-dump_video  [.] od_apply_filter_rows
  1.55%  lt-dump_video  [.] od_bin_idct4x4

The 4×4 intra prediction has now skyrocketed to the top, with the transforms and filters increasing as well. I was surprised by the intra prediction decoder (od_intra_pred_cdf) taking up so much time, but it can be explained by much more prediction data coded relative to the image size due to the smaller blocks. The transform still doesn’t take much time, which I suppose shouldn’t be surprising given how simple it is – my hardware can even do it in 1 cycle.

by Thomas Daede at December 30, 2013 12:23 AM

Thomas Daede : Daala profiling on x86

Given that the purpose of my hardware acceleration is to run Daala at realtime speeds, I decided to benchmark the Daala player on my Core 2 Duo laptop. I used a test video at 720p24, encoded with -v 16 and no reference frames (intra only). The following is the perf annotate output:

 19.49%  lt-player_examp  [.] od_state_upsample8
 11.64%  lt-player_examp  [.] od_intra_pred16x16_mult
  5.74%  lt-player_examp  [.] od_intra_pred8x8_mult

20 percent for od_state_upsample8? Turns out that the results of this aren’t even used in intra only mode, so commenting it out yields a more reasonable result:

 14.50%  lt-player_examp  [.] od_intra_pred16x16_mult
  7.17%  lt-player_examp  [.] od_intra_pred8x8_mult
  6.37%  lt-player_examp  [.] od_bin_idct16
  5.09%  lt-player_examp  [.] od_post_filter16
  4.63%  lt-player_examp  [.] laplace_decode
  4.41%  lt-player_examp  [.] od_bin_idct8
  4.10%  lt-player_examp  [.] od_post_filter8
  3.86%  lt-player_examp  [.] od_apply_filter_cols
  3.28%  lt-player_examp  [.] od_chroma_pred
  3.18%  lt-player_examp  [.] od_raster_from_bands
  3.14%  lt-player_examp  [.] od_intra_pred4x4_mult
  2.84%  lt-player_examp  [.] pvq_decoder
  2.76%  lt-player_examp  [.] od_ec_decode_cdf_unscaled
  2.71%  lt-player_examp  [.] od_tf_down_hv
  2.58%  lt-player_examp  [.] od_post_filter4
  2.45%  lt-player_examp  [.] od_bands_from_raster
  2.13%  lt-player_examp  [.] od_intra_pred_cdf
  1.98%  lt-player_examp  [.] od_intra_pred16x16_get
  1.89%  lt-player_examp  [.] pvq_decode_delta
  1.50%  lt-player_examp  [.] od_convert_intra_coeffs
  1.43%  lt-player_examp  [.] generic_model_update
  1.37%  lt-player_examp  [.] od_convert_block_down
  1.21%  lt-player_examp  [.] od_ec_decode_cdf
  1.18%  lt-player_examp  [.] od_ec_dec_normalize
  1.18%  lt-player_examp  [.] od_bin_idct4

I have bolded the functions that I plan to implement in hardware. As you can see, they sum to only about 23% of the total execution time – this means that accelerating these functions alone won’t bring me into realtime decoding performance. Obvious other targets include the intra prediction matrix multiplication, though this might be better handled by NEON acceleration for now – I’m not too familiar with that area of the code yet.

by Thomas Daede at December 29, 2013 03:44 AM

Thomas Daede : Senior Honors Thesis – Daala in Hardware

not actually the daala logo

For my honors thesis, I am implementing part of the Daala decoder in hardware. This is not only a way for me to learn more about video coding and hardware, but also a way to provide feedback to the Daala project and create a reference hardware implementation.

The Chip

Part of the reason for a hardware implementation of any video codec is to make it possible to decode on an otherwise underpowered chip, such as the mobile processors common in smart phones and tablets. A very good model of this sort of chip is the Xilinx Zynq processor, which has two midrange ARM Cortex cores, and a large FPGA fabric surrounding them. The custom video decoder will be implemented in the FPGA, with high speed direct-memory-access providing communication with the ARM cores.

The Board

Image from zedboard.org

I will be using the ZedBoard, a low cost prototyping board based on the Zynq 7020 system-on-chip. It includes 512MB of DDR, both HDMI and VGA video output, Ethernet, serial, and boots Linux out of the box. The only thing that could make it better would be a cute kitten on the silkscreen.

Choosing what to accelerate

For now, parts of the codec will still run in software. This is because many of them would be very complicated state machines in hardware, and more importantly, it allows me to incrementally add hardware acceleration while maintaining a functional decoder. To make a particular algorithm a good candidate for hardware acceleration, it needs to have these properties:

  • Stable – Daala is a rapidly changing codec, and while much of it is expected to change, it takes time to implement hardware, and it’s much easier if the reference isn’t changing under my feet.
  • Parallel – Compared to a CPU, hardware can excel at exploiting parallelism. CPUs can do it too with SIMD instructions, but hardware can be tailor made to the application.
  • Independent – The hardware accelerator can act much like a parallel thread, which means that locking and synchronization comes into play. Ideally the hardware and CPU should rarely have to wait for each other.
  • Interesting – The hardware should be something unique to Daala.

The best fit that I have found for these is the transform stage of Daala. The transform stage is a combination of the discrete cosine transform (actually an integer approximation), and a lapping filter. While the DCT is an old concept, the 2D lapping filter is pretty unique to Daala, and implementing both in tightly coupled hardware can create a large performance benefit. More info on the transform stage can be found on Monty’s demo pages.

by Thomas Daede at November 26, 2013 03:04 AM

Thomas Daede : Inside a TTL crystal oscillator

Inside Crystal View 1

In case you ever wanted to know what is inside an oscillator can… I used a dremel so that now you can know. The big transparent disc on the right is the precisely cut quartz resonator, suspended on springs. On the left is a driver chip and pads for loading capacitors to complete the oscillator circuit. The heat from my dremel was enough to melt the solder and remove the components. Your average crystal can won’t have the driver chip or capacitors – most microcontrollers now have the driver circuitry built-in.


by Thomas Daede at November 25, 2013 10:49 PM

Thomas B. Rücker : Icecast – How to prevent listeners from being disconnected

A set of hints for common questions and situations people encounter while setting up a radio station using Icecast.

  • The source client is on an unstable IP connection.
    • You really want to avoid such situations. If this is a professional setup you might even consider a dedicated internet connection for the source client. The least would be QoS with guaranteed bandwidth.
      I’ve seen it far too often over the years that people complain about ‘Icecast drops my stream all the time’ and after some digging we find that the same person or someone on their network runs a BitTorrent or some other bandwidth intensive application
    • If the TCP connection just stalls from time to time for a few seconds, then you might improve the situation by increasing the source-timeout, but don’t increase it too much as that will lead to weird behaviour for your listeners too. Beyond 15-20s I’d highly recommend to consider the next option instead.
    • If the TCP connection breaks, so the source client gets fully disconnected and has to reconnect to Icecast, then you really want to look into setting up fallbacks for your streams, as otherwise this also immediately disconnects all listeners.
      What this does is it transfers all listeners to a backup stream (or static file) instead of disconnecting them. Preferably you’d have that fallback stream generated locally on the same machine as the Icecast server, e.g. some ‘elevator music’ with an announcement of ‘technical difficulties, regular programming will resume soon’.
      There is a special case: If you opt for ‘silence’ as your fallback and you stream in e.g. Ogg/Vorbis, then it will compress digital silence to a few bit per second, as it’s incredibly compressible. This completely messes up most players. The workaround is to inject some -90dB AWGN, which is below hearing level, but keeps the encoder busy and the bit-rate up.
      Important note: To avoid problems you should match the parameters of both streams closely to avoid problems in playback, as many players won’t cope well if the codec, sample rate or other things change mid stream…


  • You want to have several different live shows during the week and in between some automated playlist driven programming
    • use (several) fallbacks
      primary mountpoint: live shows
      secondary mountpoint: playlist
      tertiary mountpoint: -90dB AWGN (optional, e.g. as a insurance if the playlist fails)
    • If you want to have this all completely hidden from the listeners with one mount point, that is automatically also listed on http://dir.xiph.org, then you need to add one more trick:
      Set up a local relay of the primary mountpoint, set it to force YP submission.
      Set all the three other mounts to hidden and force disable YP submission.
      This gives you one visible mountpoint for your users and all the ‘magic’ is hidden.


I’ll expand this post by some simple configuration examples later today/tomorrow.

by Thomas B. Rücker at July 08, 2013 07:08 AM

Thomas B. Rücker : Icecast · analyzing your logs to create useful statistics

There are many ways to analyze Icecast usage and generate statistics. It has a special statistics TCP stream, it has hooks that trigger on source or listener connect/disconnect events, you can query it over custom XSLT or read the stats.xml or you can analyze its log files. This post however will be only about the latter. I’ll be writing separate posts on the other topics soon.

  • Webalizer streaming version
    A fork of webalizer 2.10 extended to support icecast2 logs and produce nice listener statistics.
    I’ve used this a couple of years ago. It could use some love (it expects a manual input of ‘average bitrate’ while that could be calculated), but otherwise does a very good job of producing nice statistics.
  • AWStats – supports ‘streaming’ logs.
    Haven’t used this one myself for Icecast analysis, but it’s a well established and good open source log analysis tool.
  • Icecast log parser  – Log parser to feed a mySQL database
    I ran into this one recently, it seems to be a small script that helps you feed a mySQL database from your log files. Further analysis is then possible through SQL queries. A nice building block if you want to build a custom solution.

In case you want to parse the access log yourself you should be aware of several things. Largely the log is compatible with other httpd log formats, but there are slight differences. Please look at those two log entries: - - [02/Jun/2013:20:35:50 +0200] "GET /test.webm HTTP/1.1" 200 3379662 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0" 88 - - [02/Jun/2013:20:38:13 +0200] "PUT /test.webm HTTP/1.1" 200 19 "-" "-" 264

The first one is a listener (in this case Firefox playing a WebM stream). You can see it received 3379662 bytes, but the interesting part is the last entry on that line “88”. It’s the number of seconds it was connected. Something important for streaming, less important for a file serving httpd.

The second entry is a source client (in this case using the HTTP PUT method we’re adding for Icecast 2.4). Note that it only says “19” after the HTTP status code. That might seem very low at first, but it’s important as we’re not logging the number of bytes sent by the client to the server, but from the server to the client. If necessary you could extract this from the error.log though. The “264” at the end once again indicates it was streaming for 264 seconds.

Another thing, as we’ve heard that question pop up repeatedly. The log entry is only generated once the listener/source actually disconnects, as only then we know all the details that should go into that line. If you are looking for real time statistics, then the access.log is the wrong place, sorry.

There are also some closed source solutions, but I’ve never used them and I don’t think they provide significant benefit over the available open source solutions mentioned above.

If you know a good Icecast tool, not only for log analysis, please write a comment or send a mail to one of the Icecast mailing lists.

by Thomas B. Rücker at June 02, 2013 06:52 PM

Chris Pearce : HTML5 video playbackRate and Ogg chaining support landed in Firefox

Paul Adenot has recently landed patches in Firefox to enable the playbackRate attribute on HTML5 <audio> and <video> elements.

This is a cool feature that I've been looking forward to for a while; it means users can speed up playback of videos (and audio) so that you can for example watch presentations sped up and only slow down for the interesting bits. Currently Firefox's <video> controls don't have support for playbackRate, but it is accessible from JavaScript, and hopefully we'll get support added to our built-in controls soon.

Paul has also finally landed support for Ogg chaining. This has been strongly desired for quite some time by community members, and the final patch also had contributions from "oneman" (David Richards), who also was a strong advocate for this feature.

We decided to reduce the scope of our chaining implementation in order to make it easier and quicker to implement. We targeted the features most desired by internet radio providers, and so we only support chaining in Ogg Vorbis and Opus audio files and we disable seeking in chained files.

Thanks Paul and David for working on these features!

by Chris Pearce (noreply@blogger.com) at December 23, 2012 02:28 AM

Thomas Daede : Centaurus 3 Lights

Centaurus 3′s lights gathered some attention during FSGP 2012. Lights are a surprisingly hard thing to get right on a solar car – they need to be bright, efficient, aerodynamic, and pretty.

Previous solar cars have tried many things, with varying success. A simple plastic window and row of T1 LEDs works well, but if the window needs to be a strange shape, it can get difficult and ugly. Acrylic light pipes make very thin and aerodynamic lights, but it can be difficult to get high efficiency, or even acceptable brightness.

There are two general classes of LEDs. One is the low power, highly focused, signaling types of LEDs. These generally draw less than 75mA, and usually come with focused or diffuse lenses, such as a T1 3/4 case. This is also the type of LED in the ASC 2012 reference lights. The other type of LED is the high brightness lighting-class LED, such as those made by Philips, Cree, or Seoul Semiconductor. These usually draw 350mA or more, are rated for total lumen output rather than millicandelas+viewing angle, and require a heat sink.

We eventually chose the low-current type of LEDs, because the lumen count we needed was low, and the need for heatsinking would significantly complicate matters. We ended up using the 30-LED version of the ASC 2012 reference lighting. The outer casing is ugly, so the LED PCB was removed using a utility knife.

Now we need a new lens. I originally looked for acrylic casting, but settled on Alumilite Water-Clear urethane casting material. This material is much stronger than acrylic and supposedly easier to use.

This product is designed to be used with rubber molds. Instead, I used our team’s sponsorship from Stratasys to 3D print some molds:

The molds are made as two halves which bolt together with AN3 fasteners. They are sanded, painted, and sanded again, then coated with mold release wax and PVA. Then, each part of the alumilite urethane is placed in a vacuum pot to remove bubbles. Then the two parts are mixed, and poured into the mold. Next, the lights are dipped into the mold, suspended by two metal pins. The wires are bent and taped around the outside of the mold to hold the LED lights in place. Then, the entire mold is placed in the vacuum pot for about one minute to pull out more bubbles, then the mold is moved to an oven and allowed to cure for 4 more hours at 60C. Using an oven is not strictly necessary but makes the process happen faster and makes sure it reaches full hardness.

Once cured, the lights are removed, epoxied into place in the shell, and bondo’d. Then they are bondo’d and masked before sending off to paint. The masked area is somewhat smaller than the size of the cast part, to make the lights look more seamless and allow room for bondo to fix any bumps.

The lights generally come out with a rather diffuse surface, however it becomes water clear after the clear coat of automotive paint. The finished product looks like this:


by Thomas Daede at October 27, 2012 03:19 PM

Ben Schwartz : It’s Google

I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

by Ben at June 29, 2012 05:10 PM

David Schleef : GStreamer Streaming Server Library

Introducing the GStreamer Streaming Server Library, or GSS for short.

This post was originally intended to be a release announcement, but I started to wander off to work on other projects before the release was 100% complete.  So perhaps this is a pre-announcement.  Or it’s merely an informational piece with the bonus that the source code repository is in a pretty stable and bug-free state at the moment.  I tagged it with “gss-0.5.0″.

What it is

GSS is a standalone HTTP server implemented as a library.  Its special focus is to serve live video streams to thousands of clients, mainly for use inside an HTML5 video tag.  It’s based on GStreamer, libsoup, and json-glib, and uses Bootstrap and BrowserID in the user interface.

GSS comes with a streaming server application that is essentially a small wrapper around the library.  This application is referred to as the Entropy Wave Streaming Server (ew-stream-server); the code that is now GSS was originally split out of this application.  The app can be found in the tools/ directory in the source tree.


  • Streaming formats: WebM, Ogg, MPEG-TS.  (FLV support is waiting for a flvparse element in GStreamer.)
  • Streams in different formats/sizes/bitrates are bundled into a single “program”.
  • Streaming to Flash via HTTP.
  • Authentication using BrowserID.
  • Automatic conversion from properly formed MPEG-TS to HTTP Live Streaming.
  • Automatic conversion to RTP/RTSP (Experiemental, works for Ogg/Theora/Vorbis only.)
  • Stream upload via HTTP PUT (3 different varieties), Icecast, raw TCP socket.
  • Stream pull from another HTTP streaming server.
  • Content protection via automatic one-time URLs.
  • (Experimental) Video-on-Demand stream types.
  • Per-stream, per-program, and server metrics.
  • HTTP configuration interface and REST API is used to control the server, allowing standalone operation and easy integration with other web servers.

What’s not there?

  • Other types of authentication, LDAP or other authorization.
  • RTMP support.  (Maybe some day, but there are several good open-source Flash servers out there already.)
  • Support for upload using HTTP PUT with no 100-Continue header.  Several HTTP libraries do this.
  • Decent VOD support, with rate-controlled streaming, burst start, and seeking.

The details


by David Schleef at June 14, 2012 06:21 PM

David Schleef : GStreamer backend for video in Firefox

Good news to hear that the GStreamer backend for video playback in Firefox has landed, due to a flurry of work by Alessandro Decina in the last few months.  Of course, this isn’t part of the standard Firefox build (but maybe some day?), but it’s very useful for putting Firefox on mobile and embedded platforms, since GStreamer has a well-established ecosystem of vendor-provided plugins for hardware decoding.

by David Schleef at April 29, 2012 10:19 PM

David Schleef : OggStreamer: audio capture and streaming device

Recently learned about a cool new open hardware project called OggStreamer.  They’re designing and making a small device that records an analog audio signal and streams it using Ogg/Vorbis.  It’s an open hardware project, so all the schematics and PCB layout is provided.

by David Schleef at April 06, 2012 12:30 AM

David Schleef : Update on the GStreamer DeckLink Elements

A little more than a year ago, I posted about GStreamer support for SDI and HD-SDI using DeckLink hardware from BlackMagic Design.  In the meantime, the decklinksrc and decklinksink elements have grown up a bit, and work with most devices in the DeckLink and Intensity line of hardware.  A laundry list of features:

  • Multiple device support
  • Multiple input and output support on a single device
  • HDMI, component analog, and composite input and output with Intensity Pro
  • Analog, AES/EBU, and embedded (HDMI/SDI) audio input
  • SDI, HD-SDI, and Optical SDI input and output with DeckLink
  • Works on Linux, OS/X (new), and Windows
  • 8-bit and 10-bit support for SDI/HD-SDI
  • Supports most video modes in the DeckLink SDK
  • Implements GstPropertyProbe interface for proper detection as a source element
  • Lots of bug fixes from previous releases

Kudos to Blake Tregre and Joshua Doe for submitting several of the patches implementing the above list.  There still a bunch of outstanding bug reports (some with patches) that need to be fixed.  Several of these relate to output, which is currently rather clumsy and broken.

People have asked me about automatically detecting the video mode for input.  Some DeckLink hardware has this capability, but not any of the hardware I have to test with.  However, I’ve had some success with cycling through the video modes at the application level, with a 200 ms timeout between modes, stopping when it finds a mode than generates output.  This works ok, except that it tends to confuse 60i and 30p modes (and 50i with 25p), which can be differentiated with a bit of processing on the images.  At some point I’d like to integrate this functionality into decklinksrc, but wouldn’t be upset if someone else did it first.

by David Schleef at April 03, 2012 04:56 AM

David Schleef : HDTV Color Matrix

Digital video is a time series of pictures, and each picture is comprised of an array of pixels, and each pixel is comprised of three numbers representing how brightly the red, green, and blue LCD dots (or CRT phosphors, if you’re old school) glow.  The representation in memory, however, is not of RGB values, but of YCbCr values, which one calculates by multiplying a 3×3 matrix with the RGB values, and then adding/subtracting some offsets.  This converts the components into a gray value (Y, or luma) and Cb and Cr (chroma blue and chroma red). The reason for doing this is because the human visual system is more sensitive to variations in luma compared to variations in chroma (er, actually luminance and chrominance, see below).  Furthermore, for this reason, typically half or 3/4 of the chroma values are dropped and not stored — the missing ones are interpolated when converting back to RGB for display.

There are various theoretical reasons for choosing a particular matrix, and I’ve recently become interested if these reasons are actually valid.  For historical reasons, early digital video copied analog precedent and used a matrix that is theoretically suboptimal.  This matrix is used in standard definition (SD) video, but was changed to the theoretically correct matrix for high-definition (HD) video.  There are other technical differences between SD and HD video, but this is the most significant for color accuracy.

For some time, I’ve been curious how much of a visual difference there is between the two matrices.  Here are two stills from Big Buck Bunny, the first is the original, correct image, and the second is the same picture converted to YCbCr with the HDTV matrix and then back to RGB with the SDTV matrix.  (To best see the differences, open the images in separate browser tabs and flip between them.)

Big Buck Bunny frame 660, originalBig Buck Bunny frame 660, wrong matrixIf you are like me, you probably have trouble seeing the difference side by side, but flipping between them makes it fairly obvious.  I chose this image because it has relatively saturated green and greenish-yellow, which shows off some of the largest differences.

The RGB values for the pixels that are used in computation are not proportional to the actual amount of power output by a monitor.  This is known as gamma correction, and is a clever byproduct of the fact that the response curve of television phosphors (the amount of light output for a given voltage) is approximately similar to the response curve of the eye (the perceived brightness based on the amount of light).  Thus voltage became synonymous with perceived brightness, televisions had fewer vacuum tubes, and we’re left with that legacy.  But it’s not a bad legacy, because just like dropping chroma values, it makes it easier to compress images.

However, color comes along and messes with that simplicity a bit.  Luminance in color theory is used to describe how the brain interprets the brightness of a particular pixel, which is proportional to the RGB values in linear light space, i.e., the amount of light emanating from a display.  Luma is proportional to the RGB values in gamma-corrected (actually, gamma-compressed) space.  This means that luma doesn’t simply depend on luminance, and contains some variation due to color.  This messes with our idea that matrixing RGB values will separate variations in brightness from variations in color.  How visible is it?  I took the above picture and squashed the luma to one value, leaving chroma values the same (HD matrix):

Big Buck Bunny, frame 660, luma squashed

What you see here is that saturated areas appear brighter than the grey areas.  This is chroma (i.e., the color values we use in calculations) feeding into luminance (i.e., the perception of brightness).

How much does this matter for image and video compression efficiency?  It’s a minor inefficiency of a subtle visual difference.  In other words, not very much.

Earlier I mentioned that the HD matrix was theoretically more correct than the SD matrix.  What about in practice?  Here’s the same luma-squashed image with the SD matrix.  Notice that there’s a lot more leakage from chroma into luminance, especially in the green leaves:

Big Buck Bunny, frame 660, chroma squashed with SD matrix

by David Schleef at March 24, 2012 03:07 AM

Ben Schwartz : Ethics in an unethical world: Ethics Offsets

The recent hubbub regarding the (admirably public) debate within Mozilla about codec support has set me thinking about how to deal with untenable situations. After rightly railing against H.264 on the web for several years, and pushing free codecs with the full thrust of the organization, Mozilla may now be approaching consensus that they cannot win, and that continued refusal to capitulate to the cartel is tantamount to organizational suicide.

So what can you do, when you find yourself compelled to do something that goes against your ethics? To make a choice that you feel is wrong on its own because it benefits you in other ways, a choice you would like to make only when really necessary and never otherwise? Any thinking person will have this problem, to greater and lesser degrees, throughout their lives. We are not martyrs, so we do what we have to do to survive and try to keep in mind our need to escape from the trap.

Organizations cannot simply keep something in mind, but they can adopt structures that remind their members of their values even when those values are compromised. A common structure of this type is the sin tax, a tax designed (in a democracy) by members of a state to help them break or prevent their own bad habits. Sin taxes work by countering the locally perceived benefit of some action that’s harmful in a larger way, by reminding us of less visible but still important negative considerations. Some of their effect is straightforwardly economic, but some is psychological, to help us remember the bigger picture.

Sin taxes are more or less involuntary, but when the government does not impose these reminders, we often choose to remind ourselves. One currently popular implementation of this concept is the Carbon offset, a payment typically made when burning fuel to counter the effect of global warming. Organizations that buy carbon offsets for their fuel consumption do so to send a message, both internally and externally, that they place real value on minimizing carbon emissions. They may send this message both explicitly (by publicizing the purchase) and implicitly (by its effect on internal and external economic incentives).

Carbon offsets may be in fashion this decade, but there are many older forms of this concept. Maybe the most quotidian is the Curse Jar*, traditionally a place in a home or small office where individuals may make a small payment when using discouraged vocabulary. The Curse Jar provides a disincentive to coarse language despite being strictly voluntary, and despite not purchasing any effect on the linguistic environment (although the coffee fund may help for some). The Curse Jar works simply by reminding group members which behaviors are accepted and which are not.

For Mozilla, the difficulty is not emissions, verbal or vaporous, but ethical behavior. How can Mozilla publicly commit to a standard of behavior while violating it? I humbly submit that the answer is to balance its karmic books, by introducing an Ethics Offset**. When Mozilla finds itself cornered, it may take the necessary unfortunate action … and introduce a proportionate positive action as a reminder about its real values.

In the case at hand, a reasonable Ethics Offset might look like an internal “tax” on all uses of patented codecs. For example, for every Boot2Gecko device that is sold, Mozilla could commit to an offset equal to double the amount spent on patent licenses for the device. The offset could be donated to relevant worthy causes, like organizations that oppose software patents or contribute to the development of patent-free multimedia … but the actual recipient matters much less than the commitment. By accumulating and periodically (and publicly) “losing” this money, Mozilla would remind us all about its commitment to freedom in the multimedia realm. A similar scheme may be appropriate for Firefox Mobile if it is also configured for H.264 support.

Without a reminder of this kind, Mozilla risks becoming dangerously complacent and complicit to the cartel-controlled multimedia monopolies. As long as H.264 support appears to serve Mozilla’s other goals, Mozilla’s commitment to multimedia freedom will remain uncomfortable, inconvenient, and tempting to forget. Greater organizations have slid down off their ethical peaks, on paths paved all along with good intentions.

Most companies would not even consider a public and persistent admission of compromise, but Mozilla is not most companies. Neither are the companies that produce free operating systems, and many other components of the free software ecosystem. None of them should be ashamed to admit when they are forced to compromise their values and support enterprises that, on ethical grounds, they despise … but they should make their position clear, by committing to an Ethics Offset until they can escape from the compromise entirely.

*: Why is there no Wikipedia entry for “Curse Jar”!?
**: Let’s not call it an indulgence.

by Ben at March 14, 2012 04:43 AM

David Schleef : New Schrödinger Release

I recently added support for 10- and 16-bit encoding and decoding to Schrödinger, so I did a little release. Presenting Schrödinger-1.0.11. Also pushed changes to GStreamer to handle the new features. Although these changes have been in the works for some time, a little prompting from j-b caused me to finish this off, so this will probably appear in VLC soon, too.
This was the last piece needed to create a 10-bit master of Sintel, which I’ve been planning to do for some time.

by David Schleef at January 23, 2012 04:23 PM

Chris Pearce : Changes to DOM full-screen API in Firefox 11

We've made some changes to how the HTML full-screen API exits full-screen mode in Firefox 11, which is scheduled to ship in March 2012. Previously Document.mozCancelFullScreen() would fully-exit full-screen and return the browser to "normal" mode. Starting in Firefox 11, Document.mozCancelFullScreen() will restore full-screen state to the element that was previously full-screen. If there is no previous full-screen element in either the document or a parent document (full-screen mode isn't restored to former full-screen elements in child documents), then the browser will "fully-exit full-screen", and return the browser to normal mode.

To see how this is useful, consider the case of a PowerPoint clone or presentation web app that wants to run full-screen. One way to implement such a web app would be to have a full-screen <div> element where the slides are shown. The developer may want to be able to switch full-screen mode seamlessly between the slide deck <div> and (say) a <video>, and then return to having the slide deck <div> as the full-screen element so that the user can carry on with the presentation. Before this change, if the <video> was in a cross-origin subdocument (like a YouTube embedded player in an <iframe>) returning full-screen mode to the slide deck <div> from the <video> was a two-step process; users would have to fully-exit full-screen, and re-request full-screen mode on the slide deck element. Now developers can simply call Document.mozCancelFullScreen() and seamlessly switch back. The browser won't drop out of full-screen mode during the transition.

Note that if users press the escape key they will always fully-exit full-screen, i.e. Firefox won't restore the previous full-screen element to full-screen state on escape key press. So to seamlessly restore full-screen to the previous full-screen element, developers must explicitly call Document.mozCancelFullScreen(), they can't rely on the user pressing the escape key.

We've also added webconsole logging upon full-screen request failures to Firefox 11, to make debugging denied full-screen requests easier.

Another change coming in Firefox 11 is we'll no longer deny full-screen requests in web pages which contain windowed plugins. Now we'll exit full-screen when a windowed plugin is focused instead (on Windows and Linux, MacOSX is unaffected).

by Chris Pearce (noreply@blogger.com) at December 19, 2011 08:48 PM