Xiph logo

Jean-Marc Valin : RNNoise: Learning Noise Suppression


banner

This demo presents the RNNoise project, showing how deep learning can be applied to noise suppression. The main idea is to combine classic signal processing with deep learning to create a real-time noise suppression algorithm that's small and fast. No expensive GPUs required — it runs easily on a Raspberry Pi. The result is much simpler (easier to tune) and sounds better than traditional noise suppression systems (been there!).

Read More

September 27, 2017 02:51 AM

Monty : Opus 1.2 released!


I'm about to get an official press release together, but in the meantime, I'm pleased to announce we've released Opus 1.2!

Quoting Jean-Marc Valin, the Opus lead developer:

Opus gets another major upgrade with the release of version 1.2. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. There are also optimizations, new options, as well as many bug fixes. This Opus 1.2 demo describes a few of the upgrades that users and implementers will care about the most. You can download the code from the Opus website.

June 21, 2017 12:36 AM

Jean-Marc Valin : Opus 1.2 is out



Opus gets another major upgrade with the release of version 1.2. This release brings quality improvements to both speech and music, while remaining fully compatible with RFC 6716. There are also optimizations, new options, as well as many bug fixes. This Opus 1.2 demo describes a few of the upgrades that users and implementers will care about the most. You can download the code from the Opus website.

June 20, 2017 07:01 PM

Monty : MP3: It's Free, Not Dead.


Last week, Fraunhofer and Thomson suspended their MP3 patent licensing program because the patents expired. We can finally welcome MP3 into the family of truly Free codecs!

Then came a press push calling MP3 dead. That's dumb. Fraunhofer is only calling MP3 dead to push unwary customers into 'upgrading' to AAC for which they can still charge patent fees.

This is a bit like the family pediatrician telling you that your perfectly healthy child in college is dead-- and solemnly suggesting you have another child immediately. Just to keep making money off of you.

I would call that disingenuous at best.

No, MP3 isn't dead, and it's not pining for any fjords. The money that Thomson and Fraunhofer were previously collecting in patent royalties now stays in your (and everyone else's) bottom line. Don't license something new and unnecessary just to spend more money.

If you really do need something more advanced than MP3, the best alternatives are also open and royalty-free. Vorbis is the mature alternative with 20 years of wide deployment under its belt. Better yet, consider Opus, the world's most advanced officially standardized codec.

That said, the network effects that have kept MP3 dominant for so long just got stronger. Nothing beats its level of interoperability and support. There's no reason to jump off a thoroughbred that’s still increasing its lead.

May 22, 2017 07:00 PM

Monty : Gentlemen, we have a new naming scheme


Seriously, there are so many winners on the full list I don't even know where to start.

May 20, 2017 12:13 AM

Silvia Pfeiffer : Annual Release of External-Videos plugin – we’ve hit v1.0


This is the annual release of my external-videos wordpress plugin and with the help of  Andrew Nimmolo I’m proud to annouce we’ve reached version 1.0!

So yes, my external-videos wordpress plugin is now roughly 7 years old, who would have thought! During the year, I don’t get the luxury of spending time on maintaining this open source love child of mine, but at Christmas, my bad conscience catches up with me  – every year! I then spend some time going through bug reports, upgrading the plugin to the latest wordpress version, upgrading to the latest video site APIs, testing functionality and of course making a new release.

This year has been quite special. The power of open source has kicked in and a new developer took an interest in external-videos. Andrew Nimmolo submitted patches over all of 2016. He decided to bring the external-videos plugin into the new decade with a huge update to the layout of the settings pages, general improvements, and an all-round update of all the video site APIs which included removing their overly complex SDKs and going straight for the REST APIs.

Therefore, I’m very proud to be able to release version 1.0 today. Thanks, Andrew!

Enjoy – and I look forward to many more contributions – have a Happy 2017!

NOTE: If you’re upgrading from an older version, you might need to remove and re-add your social video sites because the API details have changed a bit. Also, we noticed that there were layout issues on WordPress 4.3.7, so try and make sure your WordPress version is up to date.

by silvia at January 13, 2017 10:55 PM

Ben Schwartz : The Clock


I watched The Clock at the Boston Museum of Fine Arts from about 4:45 to 6:30 on Friday. My thoughts on The Clock:

The experiment works in part because scenes with clocks in them are usually frenetic. In a movie, the presence of a clock usually means someone is in a rush, and so most of the sequences convey urgency.

It’s often hard to spot the clock in each scene. In less exciting sequences, this serves as a game to pass the time. In many cases the clock in question is never in focus, or is moving too fast for the viewer to notice. The editors must have done careful freeze-frames and zoomed in on wristwatches to work out the indicated time.

The selected films are mostly in English, with a fair number in French and very few in any other languages. This feels fairly arbitrary to me.

Scenes from multiple films are often mixed within each segment. It seems like the editors adopted a relaxed rule, maybe something like: “if a clock appeared in an original, then a one minute window around the moment of appearance is fair game to include during that minute of The Clock, spliced together with other clips in any order”.

The editing makes heavy use of L cuts and audio crossfades to make the fairly random assortment of sources feel more cohesive.

I swear I saw a young Michael Cain at least twice in two different roles.

Some of the sources were distinctly low-fidelity, often due to framerate matching issues. I think this might be the first production I’ve seen that would really have benefited from a full Variable Frame Rate render and display pipeline.

I started to wonder about connections to deep learning. Could we train an image captioning network to identify images of clocks and watches, then run it on a massive video corpus to generate The Clock automatically?

Or, could we construct a spatial analogue to The Clock’s time-for-time conceit? How about a service that notifies you of a film clip shot at your current location? With a large GPS-tagged corpus (or a location-finder neural network) it might be possible to do this with pretty broad coverage.

by Ben at November 28, 2016 03:03 AM

Silvia Pfeiffer : WebRTC predictions for 2016


I wrote these predictions in the first week of January and meant to publish them as encouragement to think about where WebRTC still needs some work. I’d like to be able to compare the state of WebRTC in the browser a year from now. Therefore, without further ado, here are my thoughts.

WebRTC Browser support

I’m quite optimistic when it comes to browser support for WebRTC. We have seen Edge bring in initial support last year and Apple looking to hire engineers to implement WebRTC. My prediction is that we will see the following developments in 2016:

  • Edge will become interoperable with Chrome and Firefox, i.e. it will publish VP8/VP9 and H.264/H.265 support
  • Firefox of course continues to support both VP8/VP9 and H.264/H.265
  • Chrome will follow the spec and implement H.264/H.265 support (to add to their already existing VP8/VP9 support)
  • Safari will enter the WebRTC space but only with H.264/H.265 support

Codec Observations

With Edge and Safari entering the WebRTC space, there will be a larger focus on H.264/H.265. It will help with creating interoperability between the browsers.

However, since there are so many flavours of H.264/H.265, I expect that when different browsers are used at different endpoints, we will get poor quality video calls because of having to negotiate a common denominator. Certainly, baseline will work interoperably, but better encoding quality and lower bandwidth will only be achieved if all endpoints use the same browser.

Thus, we will get to the funny situation where we buy ourselves interoperability at the cost of video quality and bandwidth. I’d call that a “degree of interoperability” and not the best possible outcome.

I’m going to go out on a limb and say that at this stage, Google is going to consider strongly to improve the case of VP8/VP9 by improving its bandwidth adaptability: I think they will buy themselves some SVC capability and make VP9 the best quality codec for live video conferencing. Thus, when Safari eventually follows the standard and also implements VP8/VP9 support, the interoperability win of H.264/H.265 will become only temporary overshadowed by a vastly better video quality when using VP9.

The Enterprise Boundary

Like all video conferencing technology, WebRTC is having a hard time dealing with the corporate boundary: firewalls and proxies get in the way of setting up video connections from within an enterprise to people outside.

The telco world has come up with the concept of SBCs (session border controller). SBCs come packed with functionality to deal with security, signalling protocol translation, Quality of Service policing, regulatory requirements, statistics, billing, and even media service like transcoding.

SBCs are a total overkill for a world where a large number of Web applications simply want to add a WebRTC feature – probably mostly to provide a video or audio customer support service, but it could be a live training session with call-in, or an interest group conference all.

We cannot install a custom SBC solution for every WebRTC service provider in every enterprise. That’s like saying we need a custom Web proxy for every Web server. It doesn’t scale.

Cloud services thrive on their ability to sell directly to an individual in an organisation on their credit card without that individual having to ask their IT department to put special rules in place. WebRTC will not make progress in the corporate environment unless this is fixed.

We need a solution that allows all WebRTC services to get through an enterprise firewall and enterprise proxy. I think the WebRTC standards have done pretty well with firewalls and connecting to a TURN server on port 443 will do the trick most of the time. But enterprise proxies are the next frontier.

What it takes is some kind of media packet forwarding service that sits on the firewall or in a proxy and allows WebRTC media packets through – maybe with some configuration that is necessary in the browsers or the Web app to add this service as another type of TURN server.

I don’t have a full understanding of the problems involved, but I think such a solution is vital before WebRTC can go mainstream. I expect that this year we will see some clever people coming up with a solution for this and a new type of product will be born and rolled out to enterprises around the world.

Summary

So these are my predictions. In summary, they address the key areas where I think WebRTC still has to make progress: interoperability between browsers, video quality at low bitrates, and the enterprise boundary. I’m really curious to see where we stand with these a year from now.

It’s worth mentioning Philipp Hancke’s tweet reply to my post:

— we saw some clever people come up with a solution already. Now it needs to be implemented 🙂

by silvia at February 17, 2016 09:48 PM

Silvia Pfeiffer : SWAY at RFWS using Coviu


A SWAY session by Joanne of Royal Far West School. http://sway.org.au/ via https://coviu.com/ SWAY is an oral language and literacy program based on Aboriginal knowledge, culture and stories. It has been developed by Educators, Aboriginal Education Officers and Speech Pathologists at the Royal Far West School in Manly, NSW.

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at February 14, 2016 09:02 PM

Silvia Pfeiffer : My journey to Coviu


My new startup just released our MVP – this is the story of what got me here.

I love creating new applications that let people do their work better or in a manner that wasn’t possible before.

German building and loan socityMy first such passion was as a student intern when I built a system for a building and loan association’s monthly customer magazine. The group I worked with was managing their advertiser contacts through a set of paper cards and I wrote a dBase based system (yes, that long ago) that would manage their customer relationships. They loved it – until it got replaced by an SAP system that cost 100 times what I cost them, had really poor UX, and only gave them half the functionality. It was a corporate system with ongoing support, which made all the difference to them.

Dr Scholz und Partner GmbHThe story repeated itself with a CRM for my Uncle’s construction company, and with a resume and quotation management system for Accenture right after Uni, both of which I left behind when I decided to go into research.

Even as a PhD student, I never lost sight of challenges that people were facing and wanted to develop technology to overcome problems. The aim of my PhD thesis was to prepare for the oncoming onslaught of audio and video on the Internet (yes, this was 1994!) by developing algorithms to automatically extract and locate information in such files, which would enable users to structure, index and search such content.

Many of the use cases that we explored are now part of products or continue to be challenges: finding music that matches your preferences, identifying music or video pieces e.g. to count ads on the radio or to mark copyright infringement, or the automated creation of video summaries such as trailers.

CSIRO

This continued when I joined the CSIRO in Australia – I was working on segmenting speech into words or talk spurts since that would simplify captioning & subtitling, and on MPEG-7 which was a (slightly over-engineered) standard to structure metadata about audio and video.

In 2001 I had the idea of replicating the Web for videos: i.e. creating hyperlinked and searchable video-only experiences. We called it “Annodex” for annotated and indexed video and it needed full-screen hyperlinked video in browsers – man were we ahead of our time! It was my first step into standards, got several IETF RFCs to my name, and started my involvement with open codecs through Xiph.

vquence logoAround the time that YouTube was founded in 2006, I founded Vquence – originally a video search company for the Web, but pivoted to a video metadata mining company. Vquence still exists and continues to sell its data to channel partners, but it lacks the user impact that has always driven my work.

As the video element started being developed for HTML5, I had to get involved. I contributed many use cases to the W3C, became a co-editor of the HTML5 spec and focused on video captioning with WebVTT while contracting to Mozilla and later to Google. We made huge progress and today the technology exists to publish video on the Web with captions, making the Web more inclusive for everybody. I contributed code to YouTube and Google Chrome, but was keen to make a bigger impact again.

NICTA logoThe opportunity came when a couple of former CSIRO colleagues who now worked for NICTA approached me to get me interested in addressing new use cases for video conferencing in the context of WebRTC. We worked on a kiosk-style solution to service delivery for large service organisations, particularly targeting government. The emerging WebRTC standard posed many technical challenges that we addressed by building rtc.io , by contributing to the standards, and registering bugs on the browsers.

Fast-forward through the development of a few further custom solutions for customers in health and education and we are starting to see patterns of need emerge. The core learning that we’ve come away with is that to get things done, you have to go beyond “talking heads” in a video call. It’s not just about seeing the other person, but much more about having a shared view of the things that need to be worked on and a shared way of interacting with them. Also, we learnt that the things that are being worked on are quite varied and may include multiple input cameras, digital documents, Web pages, applications, device data, controls, forms.

Coviu logoSo we set out to build a solution that would enable productive remote collaboration to take place. It would need to provide an excellent user experience, it would need to be simple to work with, provide for the standard use cases out of the box, yet be architected to be extensible for specialised data sharing needs that we knew some of our customers had. It would need to be usable directly on Coviu.com, but also able to integrate with specialised applications that some of our customers were already using, such as the applications that they spend most of their time in (CRMs, practice management systems, learning management systems, team chat systems). It would need to require our customers to sign up, yet their clients to join a call without sign-up.

Collaboration is a big problem. People are continuing to get more comfortable with technology and are less and less inclined to travel distances just to get a service done. In a country as large as Australia, where 12% of the population lives in rural and remote areas, people may not even be able to travel distances, particularly to receive or provide recurring or specialised services, or to achieve work/life balance. To make the world a global village, we need to be able to work together better remotely.

The need for collaboration is being recognised by specialised Web applications already, such as the LiveShare feature of Invision for Designers, Codassium for pair programming, or the recently announced Dropbox Paper. Few go all the way to video – WebRTC is still regarded as a complicated feature to support.

Coviu in action

With Coviu, we’d like to offer a collaboration feature to every Web app. We now have a Web app that provides a modern and beautifully designed collaboration interface. To enable other Web apps to integrate it, we are now developing an API. Integration may entail customisation of the data sharing part of Coviu – something Coviu has been designed for. How to replicate the data and keep it consistent when people collaborate remotely – that is where Coviu makes a difference.

We have started our journey and have just launched free signup to the Coviu base product, which allows individuals to own their own “room” (i.e. a fixed URL) in which to collaborate with others. A huge shout out goes to everyone in the Coviu team – a pretty amazing group of people – who have turned the app from an idea to reality. You are all awesome!

With Coviu you can share and annotate:

  • images (show your mum photos of your last holidays, or get feedback on an architecture diagram from a customer),
  • pdf files (give a presentation remotely, or walk a customer through a contract),
  • whiteboards (brainstorm with a colleague), and
  • share an application window (watch a YouTube video together, or work through your task list with your colleagues).

All of these are regarded as “shared documents” in Coviu and thus have zooming and annotations features and are listed in a document tray for ease of navigation.

This is just the beginning of how we want to make working together online more productive. Give it a go and let us know what you think.

http://coviu.com/

by silvia at October 27, 2015 10:08 AM

Silvia Pfeiffer : WebVTT Audio Descriptions for Elephants Dream


When I set out to improve accessibility on the Web and we started developing WebSRT – later to be renamed to WebVTT – I needed an example video to demonstrate captions / subtitles, audio descriptions, transcripts, navigation markers and sign language.

I needed a freely available video with spoken text that either already had such data available or that I could create it for. Naturally I chose “Elephants Dream” by the Orange Open Movie Project , because it was created under the Creative Commons Attribution 2.5 license.

As it turned out, the Blender Foundation had already created a collection of SRT files that would represent the English original as well as the translated languages. I was able to reuse them by merely adding a WEBVTT header.

Then there was a need for a textual audio description. I read up on the plot online and finally wrote up a time-alignd audio description. I’m hereby making that file available under the Create Commons Attribution 4.0 license. I’ve added a few lines to the medadata headers so it doesn’t confuse players. Feel free to reuse at will – I know there are others out there that have a similar need to demonstrate accessibility features.

by silvia at March 10, 2015 12:50 PM

Silvia Pfeiffer : Progress with rtc.io


At the end of July, I gave a presentation about WebRTC and rtc.io at the WDCNZ Web Dev Conference in beautiful Wellington, NZ.

webrtc_talk

Putting that talk together reminded me about how far we have come in the last year both with the progress of WebRTC, its standards and browser implementations, as well as with our own small team at NICTA and our rtc.io WebRTC toolbox.

WDCNZ presentation page5

One of the most exciting opportunities is still under-exploited: the data channel. When I talked about the above slide and pointed out Bananabread, PeerCDN, Copay, PubNub and also later WebTorrent, that’s where I really started to get Web Developers excited about WebRTC. They can totally see the shift in paradigm to peer-to-peer applications away from the Server-based architecture of the current Web.

Many were also excited to learn more about rtc.io, our own npm nodules based approach to a JavaScript API for WebRTC.

rtcio_modules

We believe that the World of JavaScript has reached a critical stage where we can no longer code by copy-and-paste of JavaScript snippets from all over the Web universe. We need a more structured module reuse approach to JavaScript. Node with JavaScript on the back end really only motivated this development. However, we’ve needed it for a long time on the front end, too. One big library (jquery anyone?) that does everything that anyone could ever need on the front-end isn’t going to work any longer with the amount of functionality that we now expect Web applications to support. Just look at the insane growth of npm compared to other module collections:

Packages per day across popular platforms (Shamelessly copied from: http://blog.nodejitsu.com/npm-innovation-through-modularity/)

For those that – like myself – found it difficult to understand how to tap into the sheer power of npm modules as a font end developer, simply use browserify. npm modules are prepared following the CommonJS module definition spec. Browserify works natively with that and “compiles” all the dependencies of a npm modules into a single bundle.js file that you can use on the front end through a script tag as you would in plain HTML. You can learn more about browserify and module definitions and how to use browserify.

For those of you not quite ready to dive in with browserify we have prepared prepared the rtc module, which exposes the most commonly used packages of rtc.io through an “RTC” object from a browserified JavaScript file. You can also directly download the JavaScript file from GitHub.

Using rtc.io rtc JS library
Using rtc.io rtc JS library

So, I hope you enjoy rtc.io and I hope you enjoy my slides and large collection of interesting links inside the deck, and of course: enjoy WebRTC! Thanks to Damon, JEeff, Cathy, Pete and Nathan – you’re an awesome team!

On a side note, I was really excited to meet the author of browserify, James Halliday (@substack) at WDCNZ, whose talk on “building your own tools” seemed to take me back to the times where everything was done on the command-line. I think James is using Node and the Web in a way that would appeal to a Linux Kernel developer. Fascinating!!

by silvia at August 12, 2014 07:06 AM

Silvia Pfeiffer : Nodebots Day Sydney 2014 (take 2)


Nodebots Day Sydney 2014 (take 2)

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 11:00 PM

Silvia Pfeiffer : Nodebots Day 2014 Sydney


Nodebots and WebRTC Day in Sydney, NICTA http://www.meetup.com/WebRTC-Sydney/events/173234022/

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

by silvia at July 25, 2014 05:26 PM

Silvia Pfeiffer : AppRTC : Google’s WebRTC test app and its parameters


If you’ve been interested in WebRTC and haven’t lived under a rock, you will know about Google’s open source testing application for WebRTC: AppRTC.

When you go to the site, a new video conferencing room is automatically created for you and you can share the provided URL with somebody else and thus connect (make sure you’re using Google Chrome, Opera or Mozilla Firefox).

We’ve been using this application forever to check whether any issues with our own WebRTC applications are due to network connectivity issues, firewall issues, or browser bugs, in which case AppRTC breaks down, too. Otherwise we’re pretty sure to have to dig deeper into our own code.

Now, AppRTC creates a pretty poor quality video conference, because the browsers use a 640×480 resolution by default. However, there are many query parameters that can be added to the AppRTC URL through which the connection can be manipulated.

Here are my favourite parameters:

  • hd=true : turns on high definition, ie. minWidth=1280,minHeight=720
  • stereo=true : turns on stereo audio
  • debug=loopback : connect to yourself (great to check your own firewalls)
  • tt=60 : by default, the channel is closed after 30min – this gives you 60 (max 1440)

For example, here’s how a stereo, HD loopback test would look like: https://apprtc.appspot.com/?r=82313387&hd=true&stereo=true&debug=loopback .

This is not the limit of the available parameter, though. Here are some others that you may find interesting for some more in-depth geekery:

  • ss=[stunserver] : in case you want to test a different STUN server to the default Google ones
  • ts=[turnserver] : in case you want to test a different TURN server to the default Google ones
  • tp=[password] : password for the TURN server
  • audio=true&video=false : audio-only call
  • audio=false : video-only call
  • audio=googEchoCancellation=false,googAutoGainControl=true : disable echo cancellation and enable gain control
  • audio=googNoiseReduction=true : enable noise reduction (more Google-specific parameters)
  • asc=ISAC/16000 : preferred audio send codec is ISAC at 16kHz (use on Android)
  • arc=opus/48000 : preferred audio receive codec is opus at 48kHz
  • dtls=false : disable datagram transport layer security
  • dscp=true : enable DSCP
  • ipv6=true : enable IPv6

AppRTC’s source code is available here. And here is the file with the parameters (in case you want to check if they have changed).

Have fun playing with the main and always up-to-date WebRTC application: AppRTC.

UPDATE 12 May 2014

AppRTC now also supports the following bitrate controls:

  • arbr=[bitrate] : set audio receive bitrate
  • asbr=[bitrate] : set audio send bitrate
  • vsbr=[bitrate] : set video receive bitrate
  • vrbr=[bitrate] : set video send bitrate

Example usage: https://apprtc.appspot.com/?r=&asbr=128&vsbr=4096&hd=true

by silvia at July 23, 2014 05:02 AM

Ben Schwartz : Gooseberry


Anyone who loves video software has probably caught more than one glimpse of the Blender Foundation’s short films: Elephants Dream, Big Buck Bunny, Sintel, and Tears of Steel. I’ve enjoyed them from the beginning, and never paid a dime, on account of their impeccable Creative Commons licensing.

I always hoped that the little open source project would one day grow up enough to make a full length feature film. Now they’ve decided to try, and they’ve raised more than half their funding target … with only two days to go. You can donate here. I think of it like buying a movie ticket, except that what you get is not just the right to watch the movie, but actually ownership of the movie itself.

by Ben at May 07, 2014 04:16 AM

Ben Schwartz : Stingy/stylish


My home for the next two nights is the Hotel 309, right by the office.  It’s the stingiest hotel I’ve ever stayed in.  Nothing is free: the wi-fi is $8/night and the “business center” is 25 cents/minute after the first 20.  There’s no soap by the bathroom sink, just the soap dispenser over the tub.  Even in the middle of winter, there is no box of tissues.  Its status as a 2-star hotel is well-deserved.

The rooms are also very stylish.  There’s a high-contrast color scheme that spans from the dark wood floors and rug to the boldly matted posters and high-concept lamps.  The furniture has high design value, or at least it did before it got all beat up.

These two themes come together beautifully for me in the (custom printed?) shower curtain, which features a repeating pattern of peacocks and crowns … with severe JPEG artifacts!  The luma blocks are almost two centimeters across.wpid-IMG_20140217_181332.jpgSomeone should tell the artist that bits are cheap these days.

 

by Ben at February 18, 2014 02:58 AM

Ben Schwartz : Efficiency


So you’re trying to build a DVD player using Debian Jessie and an Atom D2700 on a Poulsbo board, and you’ve even biked down to the used DVD warehouse and picked up a few $3 90’s classics for test materials.  Here’s what will happen next:

  1. Gnome 3 at 1920×1080.  The interface is sluggish even on static graphics.  Video is right out, since the graphics is unaccelerated, so every pixel has to be pushed around by the CPU.
  2. Reduce mode to 1280×720 (half the pixels to push), and try VLC in Gnome 3.  Playback is totally choppy.  Sigh.  Not really surprising, since Gnome is running in composited mode via OpenGL commands, which are then being faked on the low-power CPU using llvmpipe.  God only knows how many times each pixels is getting copied.  top shows half the CPU time is spent inside the gnome-shell process.
  3. Switch to XFCE.  Now VLC runs, and nothing else is stealing CPU time.  Still VLC runs out of CPU when expanded to full screen.  top shows it using 330% of CPU time, which is pretty impressive for a dual-core system.
  4. Switch to Gnome-mplayer, because someone says it’s faster.  Aspect is initially wrong; switch to “x11” output mode to fix it.  Video playback finally runs smooth, even at full screen.  OK, there’s a little bit of tearing, but just pretend that it’s 1999.  top shows … wait for it … 67% CPU utilization, or about one fifth of VLC’s.  (Less, actually, since at that usage VLC was dropping frames.)  Too bad Gnome-mplayer is buggy as heck: buttons like “pause” and “stop” do nothing, and the rest of the user interface is a crapshoot at best.

On a system like this, efficiency makes a big difference.  Now if only we could get efficiency and functionality together…

by Ben at January 21, 2014 07:00 AM

Thomas Daede : Zynq DMA performance


I finally got my inverse transform up to snuff, complete with hardware matrix transposer. The transposer was trivial to implement – I realized that I could use the Xilinx data width converter IP to register entire 4×4 blocks at once, allowing my transposer to simply be a bunch of wires (assign statements in Verilog).

Screenshot from 2014-01-08 22:04:27

Unfortunately, I wasn’t getting the performance I was expecting. At a clock speed of 100MHz and a 64-bit width, I expected to be able to perform 25 million transforms per second. However, I was having trouble even getting 4 million. To debug the problem, I used the Xilinx debug cores in Vivado:

daala_memory_latency

There are several problems. Here’s an explanation of what is happening in the above picture:

  1. The CPU configures the DMA registers and starts the transfer. This works for a few clock cycles.
  2. The Stream to Memory DMA (s2mm) starts a memory transfer, but its FIFOs fill up almost immediately and it has to stall (tready goes low).
  3. The transform stream pipeline also stalls, making its tready go low.
  4. The s2mm DMA is able to start its first burst transfer, and everything goes smoothly.
  5. The CPU sees that the DMA has completed, and schedules the second pass. The turnaround time for this is extremely large, and ends up taking the majority of the time.
  6. The same process happens again, but the latency is even larger due to writing to system memory.

Fortunately, the solution isn’t that complicated. I am going to switch to a scatter-gather DMA engine, which allows me to construct a request chain, and then the DMA will execute the operations without CPU intervention, avoiding the CPU latency. In addition, a FIFO can be used to reduce the impact of the initial write latency somewhat, though this costs FPGA area and it might be better just to strive for longer DMA requests.

There are other problems with my memory access at the moment – the most egregious being that my hardware expects a tiled buffer, but the Daala reference implementation uses linear buffers everywhere. This is the problem that I plan to tackle next.

by Thomas Daede at January 09, 2014 04:26 AM

Silvia Pfeiffer : Use deck.js as a remote presentation tool


deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.

Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.

I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.

I loaded a slide master URL:
http://html5videoguide.net/presentations/lca_2014_webrtc/?master
and the room loaded the URL without query string:
http://html5videoguide.net/presentations/lca_2014_webrtc/.

Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.

How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!

We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!

Here are the few things we added to deck.js to make it work:

  • Code added to index.html to make the video connection work:
    <meta name="rtc-signalhost" content="http://rtc.io/switchboard/">
    <meta name="rtc-room" content="lca2014">
    ...
    <video id="localV" rtc-capture="camera" muted></video>
    <video id="peerV" rtc-peer rtc-stream="localV"></video>
    ...
    <script src="glue.js"></script>
    <script>
    glue.config.iceServers = [{ url: 'stun:stun.l.google.com:19302' }];
    </script>
    

    The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.

  • Added glue.js library to deck.js:

    Downloaded from https://raw.github.com/rtc-io/rtc-glue/master/dist/glue.js into the source directory of deck.js.

  • Code added to index.html to synchronize slide navigation:
    glue.events.once('connected', function(signaller) {
      if (location.search.slice(1) !== '') {
        $(document).bind('deck.change', function(evt, from, to) {
          signaller.send('/slide', {
            idx: to,
            sender: signaller.id
          });
        });
      }
      signaller.on('slide', function(data) {
        console.log('received notification to change to slide: ', data.idx);
        $.deck('go', data.idx);
      });
    });
    

    This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.

And that’s it!

You can find my slide deck on GitHub.

Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!

Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.

Many thanks to Damon Oehlman for his help in getting this working.

BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. 😉

by silvia at January 08, 2014 02:28 AM

Maik Merten : Tinkering with a H.261 encoder


On the rtcweb mailing list the struggle regarding what Mandatory To Implement (MTI) video codec should be chosen rages on. One camp favors H.264 ("We cannot have VP8"), the other VP8 ("We cannot have H.264"). Some propose that there should be a "safe" fallback codec anyone can implement, and H.261 is "as old and safe" as it can get. H.261 was specified in the final years of the 1980ies and is generally believed to have no non-expired patents left standing. Roughly speaking, this old gem of coding technology can transport CIF resolution (352x288) video at full framerate (>= 25 fps) with (depending on your definition) acceptable quality starting roughly in the 250 to 500 kbit/s range (basically, I've witnessed quite some Skype calls with similar perceived effective resolution, mostly driven by mediocre webcams, and I can live with that as long as the audio part is okay). From today's perspective, H.261 is very very light on computation, memory, and code footprint.

H.261 is, of course, outgunned by any semi-decent more modern video codec, which can, for instance, deliver video with higher resolution at similar bitrates. Those, however, don't have the luxury of having their patents expired with "as good as it can be" certainty.

People on the rtcweb list were quick to point out that having an encoder with modern encoding techniques may by itself be problematic regarding patents. Thankfully, for H.261, a public domain reference-style en- and decoder from 1993 can be found, e.g., at http://wftp3.itu.int/av-arch/video-site/h261/PVRG_Software/P64v1.2.2.tar - so that's a nice reference on what encoding technologies were state-of-the-art in the early 1990ies.

With some initial patching done by Ron Lee this old code builds quite nicely on modern platforms - and as it turns out, the encoder also produces intact video, even on x86_64 machines or on a Raspberry Pi. Quite remarkably portable C code (although not the cleanest style-wise). The original code is slow, though: It barely does realtime encoding of 352x288 video on a 1.65 GHz netbook, and I can barely imagine having it encode video on machines from 1993! Some fun was had in making it faster (it's now about three times faster than before) and the program can now encode from and decode to YUV4MPEG2 files, which is quite a lot more handy than the old mode of operation (still supported), where each frame would consist of three files (.Y, .U, .V).

For those interested, the patched up code is available at https://github.com/maikmerten/p64 - however, be aware that the original coding style (yay, global variables) is still alive and well.

So, is it "useful" to resurrect such an old codebase? Depends on the definition of "useful". For me, as long as it is fun and teaches me something, it's reasonably useful.

So is it fun to toy around with this ancient coding technology? Yes, especially as most modern codecs still follow the same overall design, but H.261 is the most basic instance of "modern" video coding and thus most easy to grasp. Who knows, with some helpful comments here and there that old codebase could be used for teaching basic principles of video coding.

January 07, 2014 07:35 PM

Thomas Daede : Off by one


My hardware had a bug that made about 50% of the outputs off by one. I compared my Verilog code to the original C and it was a one-for-one match, with the exception of a few OD_DCT_RSHIFT that I translated into arithmetic shifts. That turned out to break the transform. Looking at the definition of OD_DCT_RSHIFT:

/*This should translate directly to 3 or 4 instructions for a constant _b:
#define OD_UNBIASED_RSHIFT(_a,_b) ((_a)+(((1<<(_b))-1)&-((_a)<0))>>(_b))*/
/*This version relies on a smart compiler:*/
# define OD_UNBIASED_RSHIFT(_a, _b) ((_a)/(1<<(_b)))

I had always thought a divide by a power of two equals a shift, but this is wrong: an integer divide rounds towards zero, whereas a shift rounds towards negative infinity. The solution is simple: if the value to be shifted is less than zero, add a mask before shifting. Rather than write this logic in Verilog, I simply switched my code to the / operator as in the C code above, and XST inferred the correct logic. After verifying operation with random inputs, I also wrote a small benchmark to test the performance of my hardware:

[root@alarm hwtests]# ./idct4_test 
Filling input buffer... Done.
Running software benchmark... Done.
Time: 0.030000 s
Running hardware benchmark... Done.
Time: 0.960000 s

Not too impressive, but the implementation is super basic, so it’s not unsurprising. Most of the time is spent shuffling data across the extremely slow MMIO interface.

At the same time I was trying to figure out why the 16-bit version of the intra predictor performed uniformly worse than the double precision version – I thought 16 bits ought to be enough. The conversion is done by expanding the range to fill a 16 bit integer and then rounding:

OD_PRED_WEIGHTS_4x4_FIXED[i] = (od_coeff)floor(OD_PRED_WEIGHTS_4x4[i] * 32768 + 0.5);

The 16 bit multiplies with the 16 bit coefficients are summed in a 32 bit accumulator. The result is then truncated to the original range. I did this with a right shift – after the previous ordeal, I tried swapping it with the “round towards zero” macro. Here are the results:

ssim

The new 16 bit version even manages to outperform the double version slightly. I believe the reason why the “round to zero” does better than simply rounding down is because it tends to create a slightly negative bias in the encoded coefficients, decreasing coding gain.

by Thomas Daede at January 03, 2014 12:39 AM

Thomas Daede : Daala 4-input idct (tenatively) working!


I’ve implemented Daala’s smallest inverse transform in Verilog code. It appears as an AXI-Lite slave, with 2 32-bit registers for input and two for output. Right now it can do one transform per clock cycle, though at a pitiful 20MHz. I also haven’t verified that all of its output values are identical to the C version yet, but it passes preliminary testing with peek and poke.

Screenshot from 2014-01-01 23:23:22

I also have yet to figure out why XST only uses one DSP block even though my design has 3 multipliers…

by Thomas Daede at January 02, 2014 05:25 AM

Thomas Daede : Quantized intra predictor


Daala’s intra predictor currently uses doubles. Floating point math units are really expensive in hardware, and so is loading 64 bit weights. Therefore, I modified Daala to see what would happen if the weights were rounded to signed 16 bit. The result is below:ssimRed is before quantization, green after. This is too much loss – I’ll have to figure out why this happened. Worst case I move to 32 bit weights, though maybe my floor(+0.5) method of rounding is also suspect? Maybe the intra weights should be trained taking quantization into account?

by Thomas Daede at December 31, 2013 11:37 PM

Thomas Daede : First Zynq bitstream working!


Screenshot from 2013-12-31 14:51:20I got my first custom PL hardware working! Following the Zedboard tutorials, it was relatively straightforward, though using Vivado 2013.3 required a bit of playing around – I ended up making my own clock sources and reset controller until I realized that the Zynq PS had them if you enabled them. Next up: ChipScope or whatever it’s called in Vivado.

I crashed the chip numerous times until realizing that the bitstream file name had changed somewhere in the process, so I was uploading an old version of the bitstream….

by Thomas Daede at December 31, 2013 08:56 PM

Thomas Daede : Daala profiling on ARM


I reran the same decoding as yesterday, but this time on the Zynq Cortex-A9 instead of x86. Following is the histogram data, again with the functions I plan to accelerate highlighted:

 19.60%  lt-dump_video  [.] od_intra_pred16x16_mult
  6.66%  lt-dump_video  [.] od_intra_pred8x8_mult
  6.02%  lt-dump_video  [.] od_bin_idct16
  4.88%  lt-dump_video  [.] .divsi3_skip_div0_test
  4.54%  lt-dump_video  [.] od_bands_from_raster
  4.21%  lt-dump_video  [.] laplace_decode
  4.03%  lt-dump_video  [.] od_chroma_pred
  3.92%  lt-dump_video  [.] od_raster_from_bands
  3.66%  lt-dump_video  [.] od_post_filter16
  3.20%  lt-dump_video  [.] od_intra_pred4x4_mult
  3.09%  lt-dump_video  [.] od_apply_filter_cols
  3.08%  lt-dump_video  [.] od_bin_idct8
  2.60%  lt-dump_video  [.] od_post_filter8
  2.00%  lt-dump_video  [.] od_tf_down_hv
  1.69%  lt-dump_video  [.] od_intra_pred_cdf
  1.55%  lt-dump_video  [.] od_ec_decode_cdf_unscaled
  1.46%  lt-dump_video  [.] od_post_filter4
  1.45%  lt-dump_video  [.] od_convert_intra_coeffs
  1.44%  lt-dump_video  [.] od_convert_block_down
  1.28%  lt-dump_video  [.] generic_model_update
  1.24%  lt-dump_video  [.] pvq_decoder
  1.21%  lt-dump_video  [.] od_bin_idct4

The results are very similar as expected to x86, however there are a few oddities. One is that the intra prediction is even slower than on x86, and another is that the software division routine shows up relatively high in the list. It turns out that the division comes from the inverse lapping filters – although division by a constant can be replaced by a fixed point multiply, the compiler seems not to have done this, which hurts performance a lot.

For fun, let’s see what happens when remove the costly transforms and force 4×4 block sizes only:

 26.21%  lt-dump_video  [.] od_intra_pred4x4_mult
  7.35%  lt-dump_video  [.] od_intra_pred_cdf
  6.28%  lt-dump_video  [.] od_post_filter4
  6.17%  lt-dump_video  [.] od_chroma_pred
  5.77%  lt-dump_video  [.] od_bin_idct4
  4.04%  lt-dump_video  [.] od_bands_from_raster
  3.94%  lt-dump_video  [.] generic_model_update
  3.86%  lt-dump_video  [.] od_apply_filter_cols
  3.64%  lt-dump_video  [.] od_raster_from_bands
  3.29%  lt-dump_video  [.] .divsi3_skip_div0_test
  2.47%  lt-dump_video  [.] od_convert_intra_coeffs
  2.07%  lt-dump_video  [.] od_intra_pred4x4_get
  1.95%  lt-dump_video  [.] od_apply_postfilter
  1.82%  lt-dump_video  [.] od_tf_up_hv_lp
  1.81%  lt-dump_video  [.] laplace_decode
  1.74%  lt-dump_video  [.] od_ec_decode_cdf
  1.67%  lt-dump_video  [.] pvq_decode_delta
  1.61%  lt-dump_video  [.] od_apply_filter_rows
  1.55%  lt-dump_video  [.] od_bin_idct4x4

The 4×4 intra prediction has now skyrocketed to the top, with the transforms and filters increasing as well. I was surprised by the intra prediction decoder (od_intra_pred_cdf) taking up so much time, but it can be explained by much more prediction data coded relative to the image size due to the smaller blocks. The transform still doesn’t take much time, which I suppose shouldn’t be surprising given how simple it is – my hardware can even do it in 1 cycle.

by Thomas Daede at December 30, 2013 12:23 AM

Thomas Daede : Daala profiling on x86


Given that the purpose of my hardware acceleration is to run Daala at realtime speeds, I decided to benchmark the Daala player on my Core 2 Duo laptop. I used a test video at 720p24, encoded with -v 16 and no reference frames (intra only). The following is the perf annotate output:

 19.49%  lt-player_examp  [.] od_state_upsample8
 11.64%  lt-player_examp  [.] od_intra_pred16x16_mult
  5.74%  lt-player_examp  [.] od_intra_pred8x8_mult
...

20 percent for od_state_upsample8? Turns out that the results of this aren’t even used in intra only mode, so commenting it out yields a more reasonable result:

 14.50%  lt-player_examp  [.] od_intra_pred16x16_mult
  7.17%  lt-player_examp  [.] od_intra_pred8x8_mult
  6.37%  lt-player_examp  [.] od_bin_idct16
  5.09%  lt-player_examp  [.] od_post_filter16
  4.63%  lt-player_examp  [.] laplace_decode
  4.41%  lt-player_examp  [.] od_bin_idct8
  4.10%  lt-player_examp  [.] od_post_filter8
  3.86%  lt-player_examp  [.] od_apply_filter_cols
  3.28%  lt-player_examp  [.] od_chroma_pred
  3.18%  lt-player_examp  [.] od_raster_from_bands
  3.14%  lt-player_examp  [.] od_intra_pred4x4_mult
  2.84%  lt-player_examp  [.] pvq_decoder
  2.76%  lt-player_examp  [.] od_ec_decode_cdf_unscaled
  2.71%  lt-player_examp  [.] od_tf_down_hv
  2.58%  lt-player_examp  [.] od_post_filter4
  2.45%  lt-player_examp  [.] od_bands_from_raster
  2.13%  lt-player_examp  [.] od_intra_pred_cdf
  1.98%  lt-player_examp  [.] od_intra_pred16x16_get
  1.89%  lt-player_examp  [.] pvq_decode_delta
  1.50%  lt-player_examp  [.] od_convert_intra_coeffs
  1.43%  lt-player_examp  [.] generic_model_update
  1.37%  lt-player_examp  [.] od_convert_block_down
  1.21%  lt-player_examp  [.] od_ec_decode_cdf
  1.18%  lt-player_examp  [.] od_ec_dec_normalize
  1.18%  lt-player_examp  [.] od_bin_idct4

I have bolded the functions that I plan to implement in hardware. As you can see, they sum to only about 23% of the total execution time – this means that accelerating these functions alone won’t bring me into realtime decoding performance. Obvious other targets include the intra prediction matrix multiplication, though this might be better handled by NEON acceleration for now – I’m not too familiar with that area of the code yet.

by Thomas Daede at December 29, 2013 03:44 AM

Thomas Daede : Senior Honors Thesis – Daala in Hardware


not actually the daala logo

For my honors thesis, I am implementing part of the Daala decoder in hardware. This is not only a way for me to learn more about video coding and hardware, but also a way to provide feedback to the Daala project and create a reference hardware implementation.

The Chip

Part of the reason for a hardware implementation of any video codec is to make it possible to decode on an otherwise underpowered chip, such as the mobile processors common in smart phones and tablets. A very good model of this sort of chip is the Xilinx Zynq processor, which has two midrange ARM Cortex cores, and a large FPGA fabric surrounding them. The custom video decoder will be implemented in the FPGA, with high speed direct-memory-access providing communication with the ARM cores.

The Board

Image from zedboard.org

I will be using the ZedBoard, a low cost prototyping board based on the Zynq 7020 system-on-chip. It includes 512MB of DDR, both HDMI and VGA video output, Ethernet, serial, and boots Linux out of the box. The only thing that could make it better would be a cute kitten on the silkscreen.

Choosing what to accelerate

For now, parts of the codec will still run in software. This is because many of them would be very complicated state machines in hardware, and more importantly, it allows me to incrementally add hardware acceleration while maintaining a functional decoder. To make a particular algorithm a good candidate for hardware acceleration, it needs to have these properties:

  • Stable – Daala is a rapidly changing codec, and while much of it is expected to change, it takes time to implement hardware, and it’s much easier if the reference isn’t changing under my feet.
  • Parallel – Compared to a CPU, hardware can excel at exploiting parallelism. CPUs can do it too with SIMD instructions, but hardware can be tailor made to the application.
  • Independent – The hardware accelerator can act much like a parallel thread, which means that locking and synchronization comes into play. Ideally the hardware and CPU should rarely have to wait for each other.
  • Interesting – The hardware should be something unique to Daala.

The best fit that I have found for these is the transform stage of Daala. The transform stage is a combination of the discrete cosine transform (actually an integer approximation), and a lapping filter. While the DCT is an old concept, the 2D lapping filter is pretty unique to Daala, and implementing both in tightly coupled hardware can create a large performance benefit. More info on the transform stage can be found on Monty’s demo pages.

by Thomas Daede at November 26, 2013 03:04 AM

Thomas Daede : Inside a TTL crystal oscillator


Inside Crystal View 1

In case you ever wanted to know what is inside an oscillator can… I used a dremel so that now you can know. The big transparent disc on the right is the precisely cut quartz resonator, suspended on springs. On the left is a driver chip and pads for loading capacitors to complete the oscillator circuit. The heat from my dremel was enough to melt the solder and remove the components. Your average crystal can won’t have the driver chip or capacitors – most microcontrollers now have the driver circuitry built-in.

 

by Thomas Daede at November 25, 2013 10:49 PM

Thomas B. Rücker : Icecast – How to prevent listeners from being disconnected


A set of hints for common questions and situations people encounter while setting up a radio station using Icecast.

  • The source client is on an unstable IP connection.
    • You really want to avoid such situations. If this is a professional setup you might even consider a dedicated internet connection for the source client. The least would be QoS with guaranteed bandwidth.
      I’ve seen it far too often over the years that people complain about ‘Icecast drops my stream all the time’ and after some digging we find that the same person or someone on their network runs a BitTorrent or some other bandwidth intensive application
    • If the TCP connection just stalls from time to time for a few seconds, then you might improve the situation by increasing the source-timeout, but don’t increase it too much as that will lead to weird behaviour for your listeners too. Beyond 15-20s I’d highly recommend to consider the next option instead.
    • If the TCP connection breaks, so the source client gets fully disconnected and has to reconnect to Icecast, then you really want to look into setting up fallbacks for your streams, as otherwise this also immediately disconnects all listeners.
      What this does is it transfers all listeners to a backup stream (or static file) instead of disconnecting them. Preferably you’d have that fallback stream generated locally on the same machine as the Icecast server, e.g. some ‘elevator music’ with an announcement of ‘technical difficulties, regular programming will resume soon’.
      There is a special case: If you opt for ‘silence’ as your fallback and you stream in e.g. Ogg/Vorbis, then it will compress digital silence to a few bit per second, as it’s incredibly compressible. This completely messes up most players. The workaround is to inject some -90dB AWGN, which is below hearing level, but keeps the encoder busy and the bit-rate up.
      Important note: To avoid problems you should match the parameters of both streams closely to avoid problems in playback, as many players won’t cope well if the codec, sample rate or other things change mid stream…

 

  • You want to have several different live shows during the week and in between some automated playlist driven programming
    • use (several) fallbacks
      primary mountpoint: live shows
      secondary mountpoint: playlist
      tertiary mountpoint: -90dB AWGN (optional, e.g. as a insurance if the playlist fails)
    • If you want to have this all completely hidden from the listeners with one mount point, that is automatically also listed on http://dir.xiph.org, then you need to add one more trick:
      Set up a local relay of the primary mountpoint, set it to force YP submission.
      Set all the three other mounts to hidden and force disable YP submission.
      This gives you one visible mountpoint for your users and all the ‘magic’ is hidden.

 

I’ll expand this post by some simple configuration examples later today/tomorrow.

by Thomas B. Rücker at July 08, 2013 07:08 AM

Thomas B. Rücker : Icecast · analyzing your logs to create useful statistics


There are many ways to analyze Icecast usage and generate statistics. It has a special statistics TCP stream, it has hooks that trigger on source or listener connect/disconnect events, you can query it over custom XSLT or read the stats.xml or you can analyze its log files. This post however will be only about the latter. I’ll be writing separate posts on the other topics soon.

  • Webalizer streaming version
    A fork of webalizer 2.10 extended to support icecast2 logs and produce nice listener statistics.
    I’ve used this a couple of years ago. It could use some love (it expects a manual input of ‘average bitrate’ while that could be calculated), but otherwise does a very good job of producing nice statistics.
  • AWStats – supports ‘streaming’ logs.
    Haven’t used this one myself for Icecast analysis, but it’s a well established and good open source log analysis tool.
  • Icecast log parser  – Log parser to feed a mySQL database
    I ran into this one recently, it seems to be a small script that helps you feed a mySQL database from your log files. Further analysis is then possible through SQL queries. A nice building block if you want to build a custom solution.

In case you want to parse the access log yourself you should be aware of several things. Largely the log is compatible with other httpd log formats, but there are slight differences. Please look at those two log entries:

203.0.113.42 - - [02/Jun/2013:20:35:50 +0200] "GET /test.webm HTTP/1.1" 200 3379662 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0" 88
198.51.100.23 - - [02/Jun/2013:20:38:13 +0200] "PUT /test.webm HTTP/1.1" 200 19 "-" "-" 264

The first one is a listener (in this case Firefox playing a WebM stream). You can see it received 3379662 bytes, but the interesting part is the last entry on that line “88”. It’s the number of seconds it was connected. Something important for streaming, less important for a file serving httpd.

The second entry is a source client (in this case using the HTTP PUT method we’re adding for Icecast 2.4). Note that it only says “19” after the HTTP status code. That might seem very low at first, but it’s important as we’re not logging the number of bytes sent by the client to the server, but from the server to the client. If necessary you could extract this from the error.log though. The “264” at the end once again indicates it was streaming for 264 seconds.

Another thing, as we’ve heard that question pop up repeatedly. The log entry is only generated once the listener/source actually disconnects, as only then we know all the details that should go into that line. If you are looking for real time statistics, then the access.log is the wrong place, sorry.

There are also some closed source solutions, but I’ve never used them and I don’t think they provide significant benefit over the available open source solutions mentioned above.

If you know a good Icecast tool, not only for log analysis, please write a comment or send a mail to one of the Icecast mailing lists.

by Thomas B. Rücker at June 02, 2013 06:52 PM

Chris Pearce : HTML5 video playbackRate and Ogg chaining support landed in Firefox


Paul Adenot has recently landed patches in Firefox to enable the playbackRate attribute on HTML5 <audio> and <video> elements.

This is a cool feature that I've been looking forward to for a while; it means users can speed up playback of videos (and audio) so that you can for example watch presentations sped up and only slow down for the interesting bits. Currently Firefox's <video> controls don't have support for playbackRate, but it is accessible from JavaScript, and hopefully we'll get support added to our built-in controls soon.

Paul has also finally landed support for Ogg chaining. This has been strongly desired for quite some time by community members, and the final patch also had contributions from "oneman" (David Richards), who also was a strong advocate for this feature.

We decided to reduce the scope of our chaining implementation in order to make it easier and quicker to implement. We targeted the features most desired by internet radio providers, and so we only support chaining in Ogg Vorbis and Opus audio files and we disable seeking in chained files.

Thanks Paul and David for working on these features!

by Chris Pearce (noreply@blogger.com) at December 23, 2012 02:28 AM

Thomas Daede : Centaurus 3 Lights


Centaurus 3′s lights gathered some attention during FSGP 2012. Lights are a surprisingly hard thing to get right on a solar car – they need to be bright, efficient, aerodynamic, and pretty.

Previous solar cars have tried many things, with varying success. A simple plastic window and row of T1 LEDs works well, but if the window needs to be a strange shape, it can get difficult and ugly. Acrylic light pipes make very thin and aerodynamic lights, but it can be difficult to get high efficiency, or even acceptable brightness.

There are two general classes of LEDs. One is the low power, highly focused, signaling types of LEDs. These generally draw less than 75mA, and usually come with focused or diffuse lenses, such as a T1 3/4 case. This is also the type of LED in the ASC 2012 reference lights. The other type of LED is the high brightness lighting-class LED, such as those made by Philips, Cree, or Seoul Semiconductor. These usually draw 350mA or more, are rated for total lumen output rather than millicandelas+viewing angle, and require a heat sink.

We eventually chose the low-current type of LEDs, because the lumen count we needed was low, and the need for heatsinking would significantly complicate matters. We ended up using the 30-LED version of the ASC 2012 reference lighting. The outer casing is ugly, so the LED PCB was removed using a utility knife.

Now we need a new lens. I originally looked for acrylic casting, but settled on Alumilite Water-Clear urethane casting material. This material is much stronger than acrylic and supposedly easier to use.

This product is designed to be used with rubber molds. Instead, I used our team’s sponsorship from Stratasys to 3D print some molds:

The molds are made as two halves which bolt together with AN3 fasteners. They are sanded, painted, and sanded again, then coated with mold release wax and PVA. Then, each part of the alumilite urethane is placed in a vacuum pot to remove bubbles. Then the two parts are mixed, and poured into the mold. Next, the lights are dipped into the mold, suspended by two metal pins. The wires are bent and taped around the outside of the mold to hold the LED lights in place. Then, the entire mold is placed in the vacuum pot for about one minute to pull out more bubbles, then the mold is moved to an oven and allowed to cure for 4 more hours at 60C. Using an oven is not strictly necessary but makes the process happen faster and makes sure it reaches full hardness.

Once cured, the lights are removed, epoxied into place in the shell, and bondo’d. Then they are bondo’d and masked before sending off to paint. The masked area is somewhat smaller than the size of the cast part, to make the lights look more seamless and allow room for bondo to fix any bumps.

The lights generally come out with a rather diffuse surface, however it becomes water clear after the clear coat of automotive paint. The finished product looks like this:

 

by Thomas Daede at October 27, 2012 03:19 PM

Ben Schwartz : It’s Google


I’m normally reticent to talk about the future; most of my posts are in the past tense. But now the plane tickets are purchased, apartment booked, and my room is gradually emptying itself of my furniture and belongings. The point of no return is long past.

A few days after Independence Day, I’ll be flying to Mountain View for a week at the Googleplex, and from there to Seattle (or Kirkland), to start work as a software engineer on Google’s WebRTC team, within the larger Chromium development effort. The exact project I’ll be working on initially isn’t yet decided, but a few very exciting ideas have floated by since I was offered the position in March.

Last summer I told a friend that I had no idea where I would be in a year’s time, and when I listed places I might be — Boston, Madrid, San Francisco, Schenectady — Seattle wasn’t even on the list. It still wasn’t in March, when I was offered this position in the Cambridge (MA) office. It was an unfortunate coincidence that the team I’d planned to join was relocated to Seattle shortly after I’d accepted the offer.

My recruiters and managers were helpful and gracious in two key ways. First, they arranged for me to meet with ~5 different leaders in the Cambridge office whose teams I might be able to join instead of moving. Second, they flew me out to Seattle (I’d never been to the city, nor the state, nor any of the states or provinces that it borders) and arranged for meetings with various managers and developers in the Kirkland office, just so I could learn more about the office and the city. I spent the afternoon wandering the city and (with help from a friend of a friend), looking at as many districts as I could squeeze between lunch and sleep.

The visit made all the difference. It made the city real to me … and it seemed like a place that I could live. It also confirmed an impressive pattern: every single Google employee I met, at whichever office, seemed like someone I would be happy to work alongside.

When I returned there were yet more meetings scheduled, but I began to perceive that the move was essentially inevitable. The hiring committee had done their job well, and assigned me to the best fitting position. Everything else was second best at best.

It’s been an up and down experience, with the drudgery of packing and schlepping an unwelcome reminder of the feeling of loss that accompanies leaving history, family, and friends behind. I am learning in the process that, having never really moved, I have no idea how to move.

But there’s also sometimes a sense of joy in it. I am going to be an independent, free adult, in a way that cannot be achieved by even the happiest potted plant.

After signing the same lease on the same student apartment for the seventh time, I worried about getting stuck, in some metaphysical sense, about failure to launch from my too-comfortable cocoon. It was time for a grand adventure.

This is it.

by Ben at June 29, 2012 05:10 PM

David Schleef : GStreamer Streaming Server Library


Introducing the GStreamer Streaming Server Library, or GSS for short.

This post was originally intended to be a release announcement, but I started to wander off to work on other projects before the release was 100% complete.  So perhaps this is a pre-announcement.  Or it’s merely an informational piece with the bonus that the source code repository is in a pretty stable and bug-free state at the moment.  I tagged it with “gss-0.5.0″.

What it is

GSS is a standalone HTTP server implemented as a library.  Its special focus is to serve live video streams to thousands of clients, mainly for use inside an HTML5 video tag.  It’s based on GStreamer, libsoup, and json-glib, and uses Bootstrap and BrowserID in the user interface.

GSS comes with a streaming server application that is essentially a small wrapper around the library.  This application is referred to as the Entropy Wave Streaming Server (ew-stream-server); the code that is now GSS was originally split out of this application.  The app can be found in the tools/ directory in the source tree.

Features

  • Streaming formats: WebM, Ogg, MPEG-TS.  (FLV support is waiting for a flvparse element in GStreamer.)
  • Streams in different formats/sizes/bitrates are bundled into a single “program”.
  • Streaming to Flash via HTTP.
  • Authentication using BrowserID.
  • Automatic conversion from properly formed MPEG-TS to HTTP Live Streaming.
  • Automatic conversion to RTP/RTSP (Experiemental, works for Ogg/Theora/Vorbis only.)
  • Stream upload via HTTP PUT (3 different varieties), Icecast, raw TCP socket.
  • Stream pull from another HTTP streaming server.
  • Content protection via automatic one-time URLs.
  • (Experimental) Video-on-Demand stream types.
  • Per-stream, per-program, and server metrics.
  • HTTP configuration interface and REST API is used to control the server, allowing standalone operation and easy integration with other web servers.

What’s not there?

  • Other types of authentication, LDAP or other authorization.
  • RTMP support.  (Maybe some day, but there are several good open-source Flash servers out there already.)
  • Support for upload using HTTP PUT with no 100-Continue header.  Several HTTP libraries do this.
  • Decent VOD support, with rate-controlled streaming, burst start, and seeking.

The details

 

by David Schleef at June 14, 2012 06:21 PM

David Schleef : GStreamer backend for video in Firefox


Good news to hear that the GStreamer backend for video playback in Firefox has landed, due to a flurry of work by Alessandro Decina in the last few months.  Of course, this isn’t part of the standard Firefox build (but maybe some day?), but it’s very useful for putting Firefox on mobile and embedded platforms, since GStreamer has a well-established ecosystem of vendor-provided plugins for hardware decoding.

by David Schleef at April 29, 2012 10:19 PM

David Schleef : OggStreamer: audio capture and streaming device


Recently learned about a cool new open hardware project called OggStreamer.  They’re designing and making a small device that records an analog audio signal and streams it using Ogg/Vorbis.  It’s an open hardware project, so all the schematics and PCB layout is provided.

by David Schleef at April 06, 2012 12:30 AM

David Schleef : Update on the GStreamer DeckLink Elements


A little more than a year ago, I posted about GStreamer support for SDI and HD-SDI using DeckLink hardware from BlackMagic Design.  In the meantime, the decklinksrc and decklinksink elements have grown up a bit, and work with most devices in the DeckLink and Intensity line of hardware.  A laundry list of features:

  • Multiple device support
  • Multiple input and output support on a single device
  • HDMI, component analog, and composite input and output with Intensity Pro
  • Analog, AES/EBU, and embedded (HDMI/SDI) audio input
  • SDI, HD-SDI, and Optical SDI input and output with DeckLink
  • Works on Linux, OS/X (new), and Windows
  • 8-bit and 10-bit support for SDI/HD-SDI
  • Supports most video modes in the DeckLink SDK
  • Implements GstPropertyProbe interface for proper detection as a source element
  • Lots of bug fixes from previous releases

Kudos to Blake Tregre and Joshua Doe for submitting several of the patches implementing the above list.  There still a bunch of outstanding bug reports (some with patches) that need to be fixed.  Several of these relate to output, which is currently rather clumsy and broken.

People have asked me about automatically detecting the video mode for input.  Some DeckLink hardware has this capability, but not any of the hardware I have to test with.  However, I’ve had some success with cycling through the video modes at the application level, with a 200 ms timeout between modes, stopping when it finds a mode than generates output.  This works ok, except that it tends to confuse 60i and 30p modes (and 50i with 25p), which can be differentiated with a bit of processing on the images.  At some point I’d like to integrate this functionality into decklinksrc, but wouldn’t be upset if someone else did it first.

by David Schleef at April 03, 2012 04:56 AM

David Schleef : HDTV Color Matrix


Digital video is a time series of pictures, and each picture is comprised of an array of pixels, and each pixel is comprised of three numbers representing how brightly the red, green, and blue LCD dots (or CRT phosphors, if you’re old school) glow.  The representation in memory, however, is not of RGB values, but of YCbCr values, which one calculates by multiplying a 3×3 matrix with the RGB values, and then adding/subtracting some offsets.  This converts the components into a gray value (Y, or luma) and Cb and Cr (chroma blue and chroma red). The reason for doing this is because the human visual system is more sensitive to variations in luma compared to variations in chroma (er, actually luminance and chrominance, see below).  Furthermore, for this reason, typically half or 3/4 of the chroma values are dropped and not stored — the missing ones are interpolated when converting back to RGB for display.

There are various theoretical reasons for choosing a particular matrix, and I’ve recently become interested if these reasons are actually valid.  For historical reasons, early digital video copied analog precedent and used a matrix that is theoretically suboptimal.  This matrix is used in standard definition (SD) video, but was changed to the theoretically correct matrix for high-definition (HD) video.  There are other technical differences between SD and HD video, but this is the most significant for color accuracy.

For some time, I’ve been curious how much of a visual difference there is between the two matrices.  Here are two stills from Big Buck Bunny, the first is the original, correct image, and the second is the same picture converted to YCbCr with the HDTV matrix and then back to RGB with the SDTV matrix.  (To best see the differences, open the images in separate browser tabs and flip between them.)

Big Buck Bunny frame 660, originalBig Buck Bunny frame 660, wrong matrixIf you are like me, you probably have trouble seeing the difference side by side, but flipping between them makes it fairly obvious.  I chose this image because it has relatively saturated green and greenish-yellow, which shows off some of the largest differences.

The RGB values for the pixels that are used in computation are not proportional to the actual amount of power output by a monitor.  This is known as gamma correction, and is a clever byproduct of the fact that the response curve of television phosphors (the amount of light output for a given voltage) is approximately similar to the response curve of the eye (the perceived brightness based on the amount of light).  Thus voltage became synonymous with perceived brightness, televisions had fewer vacuum tubes, and we’re left with that legacy.  But it’s not a bad legacy, because just like dropping chroma values, it makes it easier to compress images.

However, color comes along and messes with that simplicity a bit.  Luminance in color theory is used to describe how the brain interprets the brightness of a particular pixel, which is proportional to the RGB values in linear light space, i.e., the amount of light emanating from a display.  Luma is proportional to the RGB values in gamma-corrected (actually, gamma-compressed) space.  This means that luma doesn’t simply depend on luminance, and contains some variation due to color.  This messes with our idea that matrixing RGB values will separate variations in brightness from variations in color.  How visible is it?  I took the above picture and squashed the luma to one value, leaving chroma values the same (HD matrix):

Big Buck Bunny, frame 660, luma squashed

What you see here is that saturated areas appear brighter than the grey areas.  This is chroma (i.e., the color values we use in calculations) feeding into luminance (i.e., the perception of brightness).

How much does this matter for image and video compression efficiency?  It’s a minor inefficiency of a subtle visual difference.  In other words, not very much.

Earlier I mentioned that the HD matrix was theoretically more correct than the SD matrix.  What about in practice?  Here’s the same luma-squashed image with the SD matrix.  Notice that there’s a lot more leakage from chroma into luminance, especially in the green leaves:

Big Buck Bunny, frame 660, chroma squashed with SD matrix

by David Schleef at March 24, 2012 03:07 AM

Ben Schwartz : Ethics in an unethical world: Ethics Offsets


The recent hubbub regarding the (admirably public) debate within Mozilla about codec support has set me thinking about how to deal with untenable situations. After rightly railing against H.264 on the web for several years, and pushing free codecs with the full thrust of the organization, Mozilla may now be approaching consensus that they cannot win, and that continued refusal to capitulate to the cartel is tantamount to organizational suicide.

So what can you do, when you find yourself compelled to do something that goes against your ethics? To make a choice that you feel is wrong on its own because it benefits you in other ways, a choice you would like to make only when really necessary and never otherwise? Any thinking person will have this problem, to greater and lesser degrees, throughout their lives. We are not martyrs, so we do what we have to do to survive and try to keep in mind our need to escape from the trap.

Organizations cannot simply keep something in mind, but they can adopt structures that remind their members of their values even when those values are compromised. A common structure of this type is the sin tax, a tax designed (in a democracy) by members of a state to help them break or prevent their own bad habits. Sin taxes work by countering the locally perceived benefit of some action that’s harmful in a larger way, by reminding us of less visible but still important negative considerations. Some of their effect is straightforwardly economic, but some is psychological, to help us remember the bigger picture.

Sin taxes are more or less involuntary, but when the government does not impose these reminders, we often choose to remind ourselves. One currently popular implementation of this concept is the Carbon offset, a payment typically made when burning fuel to counter the effect of global warming. Organizations that buy carbon offsets for their fuel consumption do so to send a message, both internally and externally, that they place real value on minimizing carbon emissions. They may send this message both explicitly (by publicizing the purchase) and implicitly (by its effect on internal and external economic incentives).

Carbon offsets may be in fashion this decade, but there are many older forms of this concept. Maybe the most quotidian is the Curse Jar*, traditionally a place in a home or small office where individuals may make a small payment when using discouraged vocabulary. The Curse Jar provides a disincentive to coarse language despite being strictly voluntary, and despite not purchasing any effect on the linguistic environment (although the coffee fund may help for some). The Curse Jar works simply by reminding group members which behaviors are accepted and which are not.

For Mozilla, the difficulty is not emissions, verbal or vaporous, but ethical behavior. How can Mozilla publicly commit to a standard of behavior while violating it? I humbly submit that the answer is to balance its karmic books, by introducing an Ethics Offset**. When Mozilla finds itself cornered, it may take the necessary unfortunate action … and introduce a proportionate positive action as a reminder about its real values.

In the case at hand, a reasonable Ethics Offset might look like an internal “tax” on all uses of patented codecs. For example, for every Boot2Gecko device that is sold, Mozilla could commit to an offset equal to double the amount spent on patent licenses for the device. The offset could be donated to relevant worthy causes, like organizations that oppose software patents or contribute to the development of patent-free multimedia … but the actual recipient matters much less than the commitment. By accumulating and periodically (and publicly) “losing” this money, Mozilla would remind us all about its commitment to freedom in the multimedia realm. A similar scheme may be appropriate for Firefox Mobile if it is also configured for H.264 support.

Without a reminder of this kind, Mozilla risks becoming dangerously complacent and complicit to the cartel-controlled multimedia monopolies. As long as H.264 support appears to serve Mozilla’s other goals, Mozilla’s commitment to multimedia freedom will remain uncomfortable, inconvenient, and tempting to forget. Greater organizations have slid down off their ethical peaks, on paths paved all along with good intentions.

Most companies would not even consider a public and persistent admission of compromise, but Mozilla is not most companies. Neither are the companies that produce free operating systems, and many other components of the free software ecosystem. None of them should be ashamed to admit when they are forced to compromise their values and support enterprises that, on ethical grounds, they despise … but they should make their position clear, by committing to an Ethics Offset until they can escape from the compromise entirely.

*: Why is there no Wikipedia entry for “Curse Jar”!?
**: Let’s not call it an indulgence.

by Ben at March 14, 2012 04:43 AM

David Schleef : New Schrödinger Release


I recently added support for 10- and 16-bit encoding and decoding to Schrödinger, so I did a little release. Presenting Schrödinger-1.0.11. Also pushed changes to GStreamer to handle the new features. Although these changes have been in the works for some time, a little prompting from j-b caused me to finish this off, so this will probably appear in VLC soon, too.
This was the last piece needed to create a 10-bit master of Sintel, which I’ve been planning to do for some time.

by David Schleef at January 23, 2012 04:23 PM

Chris Pearce : Changes to DOM full-screen API in Firefox 11


We've made some changes to how the HTML full-screen API exits full-screen mode in Firefox 11, which is scheduled to ship in March 2012. Previously Document.mozCancelFullScreen() would fully-exit full-screen and return the browser to "normal" mode. Starting in Firefox 11, Document.mozCancelFullScreen() will restore full-screen state to the element that was previously full-screen. If there is no previous full-screen element in either the document or a parent document (full-screen mode isn't restored to former full-screen elements in child documents), then the browser will "fully-exit full-screen", and return the browser to normal mode.

To see how this is useful, consider the case of a PowerPoint clone or presentation web app that wants to run full-screen. One way to implement such a web app would be to have a full-screen <div> element where the slides are shown. The developer may want to be able to switch full-screen mode seamlessly between the slide deck <div> and (say) a <video>, and then return to having the slide deck <div> as the full-screen element so that the user can carry on with the presentation. Before this change, if the <video> was in a cross-origin subdocument (like a YouTube embedded player in an <iframe>) returning full-screen mode to the slide deck <div> from the <video> was a two-step process; users would have to fully-exit full-screen, and re-request full-screen mode on the slide deck element. Now developers can simply call Document.mozCancelFullScreen() and seamlessly switch back. The browser won't drop out of full-screen mode during the transition.

Note that if users press the escape key they will always fully-exit full-screen, i.e. Firefox won't restore the previous full-screen element to full-screen state on escape key press. So to seamlessly restore full-screen to the previous full-screen element, developers must explicitly call Document.mozCancelFullScreen(), they can't rely on the user pressing the escape key.

We've also added webconsole logging upon full-screen request failures to Firefox 11, to make debugging denied full-screen requests easier.

Another change coming in Firefox 11 is we'll no longer deny full-screen requests in web pages which contain windowed plugins. Now we'll exit full-screen when a windowed plugin is focused instead (on Windows and Linux, MacOSX is unaffected).

by Chris Pearce (noreply@blogger.com) at December 19, 2011 08:48 PM

Ben Schwartz : Route9.js


I was really impressed by Michael Bebenita’s Broadway.js, the recent port of an H.264 decoder to pure Javascript using Emscripten, a LLVM-based C-to-JS converter … but of course this is the opposite of what we want! Who needs H.264? We want WebM!

I’ve spent the past few weekends digging into Broadway.js, stripping out the H.264 bits and replacing them with libvpx and libnestegg. Now it’s working, to a degree. You can see it for yourself at the demo page (so far tested only in Firefox 7…).

I’m not going to be able to take this much further … at least not right now. It’s been a fun exercise though. I invite all interested comers to read some more details and then fork the repo.

Take this thing, and make it your own.

Reactions: Hacker News, r/programming, and BadassJS, Twitter.

by Ben at November 30, 2011 05:18 AM

Chris Pearce : Firefox's HTML full-screen API enabled in Nightly builds


A few days ago I enabled the HTML full-screen API in Firefox nightly builds. This enables developers to make an arbitrary HTML element "full-screen", hiding the browser's UI and stretching the element to encompass the entire screen. This will be particularly useful for HTML5 video and games.

If all goes well, this feature will ship in Firefox 10 at the end of January.

The API has changed slightly since I last blogged about it. The current API is Mozilla-specific, but is similar to the W3C's Fullscreen draft specification.

To enter full-screen mode, call the following method on the HTML Element you'd like to enter full-screen:
  • void mozRequestFullScreen() : posts an asynchronous request to make the HTML element the full-screen element. If the request is granted, some time later a bubbling "mozfullscreenchange" event is dispatched to the element which requested full-screen. If the request is denied, a "mozfullscreenerror" event is dispatched to the element's owning document. We only grant requests for full-screen when:
    • mozRequestFullScreen() is called in a user-generated event handler, e.g. a mouse click handler, and
    • the requesting element is in its document, and
    • there are no windowed plugins present in any document/iframe in the current page, and
    • all iframes containing the requesting element (if any) have the mozallowfullscreen attribute.
We added the following method and attributes to HTML Document:
  • void mozCancelFullScreen() : exits the document from full-screen mode. This dispatches a "mozfullscreenchange" event to the document containing the (now former) full-screen element. Note that the "mozfullscreenchange" event which is dispatched when you enter full-screen is targeted at the full-screen element, so if you want to receive the "mozfullscreenchange" on both entering and exiting full-screen in the same listener you should add your listener to the document, rather than the full-screen element.
  • readonly attribute boolean mozFullScreen : true when the document is in full-screen mode.
  • readonly attribute Element mozFullScreenElement : reference to the current full-screen element.
  • readonly attribute boolean mozFullScreenEnabled : returns true if calls to mozRequestFullScreen() would be granted in the current document. This returns false if there are any windowed plugins present in any document/iframe in the current page, or if any iframes containing this document don't have the mozallowfullscreen attribute present, or if the user has disabled the API by preference. If this returns false you may want to not show the user your enter-full-screen button in your page, since you know it won't work!
We also added the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

We added the mozallowfullscreen attribute to iframe elements. Without this, full-screen requests made by script in the iframe's content (i.e embedded ads, or a YouTube player in an iframe for that matter) will be denied.

While in full-screen mode, the user can press the ESC key (or F11) to exit. Alpha-numeric keyboard input while in full-screen mode causes a warning message to pop-up to guard against phishing attacks. The only key input which doesn't cause the warning message to pop up are: left, right, up, down, space, shift, control, alt, page up, page down, end, home, tab, and meta.

Navigating, changing tab, changing app (ALT+TAB) while in full-screen mode will cause full-screen mode to exit.

Here's about a simple example, which will work in current Firefox nightly builds:




(Press ESC to exit full-screen)

The code for that button's onclick handler is simply:
document.getElementById('bruce_video').mozRequestFullScreen();

How is Firefox's full-screen API different from Webkit/Chrome/Safari's full-screen API? Firefox's API adds a "width: 100%; height: 100%;" CSS rule to the element which requests full-screen, so that it's stretched to occupy the entire screen. Chrome's API does not do this, but instead it centers the full-screen element in the window and blacks-out the underlying webpage. So the full-screen element won't occupy the entire screen with Chrome's API unless you specify a "width: 100%; height: 100%;" rule yourself. Conversely if you want to vertically and horizontally center something while in full-screen with Firefox's API, you need to make the containing element of your desired centered element full-screen instead, and apply CSS rules to vertically and horizontally center the contained element.

For a cross-browser full-screen API example, see html5-demos.appspot.com's full-screen demo.

Edit: 11 Nov 2011, clarified Document.mozCancelFullScreen() and Document.mozFullScreenEnabled, fixed typos.

by Chris Pearce (noreply@blogger.com) at November 10, 2011 11:01 PM

Chris Pearce : Mozilla full-screen API progress update


Update 10 November 2011: the full-screen API has been changed slightly and enabled in Firefox Nightly builds, see http://blog.pearce.org.nz/2011/11/firefoxs-html-full-screen-api-enabled.html for details.

I've been working on implementing Robert O'Callahan's HTML full-screen API proposal in Firefox (bug 545812). Support for the base API has landed, disabled by default, in Firefox nightly builds. To enable the full-screen API, set the pref full-screen-api.enabled to true.

We have implemented a general purpose full-screen API which can make any HTML element the full-screen element (it seems WebKit based browsers' full-screen API allow only making <video> elements full-screen).

This feature makes the following API changes to HTML Element:
  1. void mozRequestFullScreen() : makes an HTML element the full-screen element. Causes browser chrome to hide, and expands the element to encompass the entire screen. Upon success, this dispatches a "mozfullscreenchange" event to the requesting full-screen element, or the element's owner document if the element is not in a document. We only grant requests for full-screen when running in user-generated event handlers, e.g. a mouse click handler.
This feature makes the following API changes to HTML Document:
  1. void mozCancelFullScreen() : exits the document from full-screen mode.
  2. readonly attribute mozFullScreen : true when the document is in full-screen mode.
  3. readonly attribute mozFullScreenElement : reference to the current full-screen element, if it's in the current document.
This feature adds the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

For a request for full-screen to be granted in content inside an iframe, the containing iframe needs to have the mozallowfullscreen attribute present. This is a boolean attribute, so the attribute only needs to be present, it doesn't matter what value it's set to.

Keyboard input is restricted in full-screen mode. When alpha-numeric key input occurs in full-screen mode, full-screen mode immediately exits. This is to help protect against phishing attacks.

We also plan to deny requests for full-screen mode when windowed plugins are present (since we can't easily monitor key events to windowed plugins on non-MacOSX platforms). We will exit full-screen mode when a windowed plugin is added to a document as well. I have a patch for this, but its dependencies haven't landed yet.

Work remaining to be done before this can be enabled:
  1. Adding a warning message when we enter DOM full-screen mode (on desktop Firefox, and on Fennec too).
  2. Making the full-screen API work in multi-process Firefox/Fennec (bug 684620). This requires a way of getting the PBrowserParent from C++ in the chrome process to be implemented, there's not a way to do that yet unfortunately.
  3. Make change/open tab cause full-screen mode to exit (bug 685402).
  4. A security review must be completed, and concerns raised there must be addressed. This could involve changing the API.
We also want a clearer transition effect when entering full-screen, to somehow show the full-screen element "stretching out" to encompass the screen.

You can test out our work-in-progress full-screen implementation, by grabbing the latest Firefox nightly build, setting the pref full-screen-api.enabled to true, and pointing your browser at my not-very-exciting full-screen API demo page.

by Chris Pearce (noreply@blogger.com) at November 09, 2011 10:43 PM

Conrad Parker : Iteratees at Tsuru Capital


Tsuru Capital is a small company. We build our internal systems for live trading and offline analysis in Haskell, and we're proud to be sponsoring ICFP 2011. We use iteratees throughout our systems, and have actively encouraged all our staff to contribute changes upstream and participate in community design discussions. By being part of the open source community and taking part in peer-review, we all end up with better software.

Over time various Tsuru staff members have worked on tools using iteratees, including (grepping the CONTRIBUTORS files): Bryan Buecking, Michael Baikov, Elliott Pace, Conrad Parker, Akio Takano, and Maciej Wos. There's been some lively discussions and many small patches providing functions that we use in production every day.

Last year Conal Elliott provided some mentoring to Tsuru staff, during which we worked through a denotational semantics for iteratees. This resulted in discussions on both the iteratee project list and haskell-cafe about Semantics of iteratees, enumerators, enumeratees.

By using iteratees in production we've contributed various simple but practical functions, including:

  • enumFdFollow, an enumerator (data source) which allows you to process the growing tail of a log file as it is being written.
  • ioIter, an iteratee that uses an IO action to determine what to do. Typically this is action involves some user interaction, such as a user issuing commands like play/pause/next/prev.
  • ListLike functions last (an iteratee that efficiently returns the last element of a stream), mapM_ and foldM.
  • mapChunksM_, a more efficient version of mapM_ that operates on the underlying chunks, eg. logger = mapChunksM_ (liftIO . print).
  • takeWhile, and its enumeratee variant takeWhileE


  • endianRead8, an iteratee for reading 64bit values with a given endianness. I've used this in ght as well as an internal project.

Stream conversion We've done quite a bit of work on stream conversion, as we use a few different layers of data processing. The iteratee architecture allows you to isolate the data source, conversion and processing functions; much of what we've worked on involves ensuring the converters (enumeratees) can control or translate control messages, so that commands like "seek" do not get lost. We've also built combinators to simplify the task of creating new stream converters.
  • convStateStream, which converts one stream into another while continually updating an internal state. Importantly for variable bitrate binary data, it can produce elements of the output stream from data that spans stream chunks.
  • (>) and (<). These allow stream converters to be composed without rewriting boilerplate. Jon Lato gives a good example using these in the StackOverflow answer to Attoparsec Iteratee.
  • zip, zip[345], sequence_ for using multiple iteratees to process a single stream instance, and (for zip*) collecting the results.
  • eneeCheckIfDone*: This family of functions (eneeCheckIfDoneHandle, eneeCheckIfDonePass, eneeCheckIfDoneIgnore) can be used with
    unfoldConvStreamCheck to make a version of unfoldConvStream which respects seek messages.


Parallel stream processing We often want to do multiple unrelated analysis tasks on a data stream. Whereas sequence_ takes a list of iteratees to run simultaneously and handles each input chunk by mapM across that list, psequence_ runs each input iteratee in a separate forkIO thread. For a real-world example, see Michael Baikov's post about psequence, psequence_, parE, parI.

Thanks

Thanks to John Lato for consistently and reliably maintaining the iteratee package, providing thoughtful feedback and graciously suggesting improvements.

by Conrad Parker (noreply@blogger.com) at September 18, 2011 09:10 AM

Chris Pearce : New media element APIs and better media seeking resolution


French intern Paul Adenot has recently implemented the seekable and played attributes on the HTML5 video and audio elements in Firefox. The seekable attribute enables script to see what regions of the media can be seeked into (particularly handy with live streams), and the played attribute enables script to see what regions of the media has already been played. Paul has also done some work improving the built in controls on media elements. Thanks for your hard work Paul! These should be available in release builds in November (Firefox 8).

Also in Firefox 8 are my changes to media seeking resolution. Now media seeking should be accurate to the nearest microsecond. It's been reported elsewhere how important accurate seeking for video is. We were previously accurate to the nearest video frame, but we could still be up to one audio packet off (often between 4 and 8 ms out). Now we prune audio samples when seeking so we're down to microsecond resolution.

by Chris Pearce (noreply@blogger.com) at August 24, 2011 11:08 PM

Ben Schwartz : Evolution


So I wrote this song, sort of. Maybe you’ll like it.

YouTube version
Sheet Music
Reference files at Archive.org

After about 6 years of covering pop songs in my a cappella groups, I really wanted to sing some original music. In part, I was motivated by the US’s aggressively restrictive copyright regime, which always prevented us from freely sharing recordings of our own performances.

I tried to write a song from scratch for a while, but it wasn’t working out, mostly because I don’t have anything interesting to say. Then I struck upon the idea of using the text of an old out-of-copyright poem (which, because of the US’s effectively perpetual copyright, has to be very old indeed). I started browsing through the poetry section of WikiSource, until I stumbled across this brilliant 1895 poem by Langdon Smith. The choice was clear.

I drew up a thoroughly derivative 4-part a cappella arrangement in MuseScore, and VoiceLab indulged me by adding it to the repertoire. We’ve sung it twice so far, but the first time we didn’t have a good recording, and then this time I had to solve this audio-video alignment problem… but now it’s here.

The recordings and sheet music are all CC0 dedicated to the public domain. I would appreciate attribution as the arranger, but I find threats of legal action to be just as distasteful as plagiarism. I wouldn’t want to do anything to discourage people from adopting and adapting the music as they see fit. Maybe someone will make a recording with a soloist who can really sing!

by Ben at August 05, 2011 05:12 PM

Chris Pearce : Simple rate limited HTTP server for testing HTML5 media/streaming


While working on the Firefox HTML5 video and audio support, I've found it extremely useful to have an HTTP server on which the transfer rate is reliably limited. Existing servers are either too heavy weight, like apache, or have inconsistent rate-limiting, like lighttpd which I found to have very "bursty" rate limiting.

I ended up taking the educational route, and implementing a simple HTTP server in C++. It supports the following features:

  1. Support for HTTP1.1 Byte Range Requests. This means you can seek into unbuffered data when watching HTML5 video.
  2. Rate limiting, configurable on a per request basis by passing the "rate=x" HTTP query parameter, where x is the transfer rate of the connection in kilobytes per second. The server will send x/10 KB ten times per second to maintain this rate smoothly.
  3. Simulated live streaming, configurable on a request basis by passing the "live" query parameter. When in "live" mode, no Content-Length header is sent, and the server doesn't advertise or perform byte range requests - so you can't seek into unbuffered video/audio, just like in a live stream.
  4. Cross platform; tested on Windows (runs on port 80) and Linux (runs on port 8080). I haven't test it on MacOS yet.
  5. Simply serves all files in the program's working directory, making it easy to use (and abuse).
  6. Open source! Get the code at https://github.com/cpearce/HttpMediaServer, or download a pre-built win32 binary.
For example, if you wanted to simulate a live stream being served at 100KB/s, your test URL might look something like http://localhost:80/video.ogg?rate=100&live.

I've been using it for quite a while, and over the weekend I finally cleaned it up and put it up on GitHub. Check it out.

by Chris Pearce (noreply@blogger.com) at August 03, 2011 07:03 AM

Ben Schwartz : An Auto-Aligner for PiTiVi


It’s rare to get exactly one recording of an a capella concert. Usually someone’s parents have a fancy but outdated camcorder, someone in the front row has a cell phone video with a great angle but terrible quality, and there’s a beautiful audio-only recording, maybe straight from the mixing board. All the recordings are independent, starting and stopping at different times. Some are only one song long, or are broken into many short pieces.

If you want to combine all these inputs into a video that anyone could watch, you’ll first have to line them up correctly in a video editor. This is a painful process of dragging clips around on the timeline with the mouse, trying to figure out if they’re in sync or not. The usual trick to making this achievable is to look at the audio waveform visualization, but even so, the process can be tedious and irritating.

This year, when I got three recordings from the VoiceLab spring concert, I resolved to solve the problem once and for all. I set about writing an automatic clip alignment algorithm as a patch to PiTiVi, a beautiful (if not mature) free software video editor written in Python.

Today, after about two months of nights and weekends, the result is ready for testing in PiTiVi mainline. Jean-François Fortin Tam has a great writeup explaining how it works from a user’s perspective.

I hadn’t looked into it until after the fact, but of course this is not the first auto-alignment function in a video editor. Final Cut Pro appears to have a similar function built in, and there are also plug-ins such as “Plural Eyes” for many editors. However, to the best of my knowledge, this is the first free implementation, and the first available on Linux. Comparing features in PiTiVi vs. the proprietary giants, I think of this as “one down, 20,000 to go”.

I guess this is as good a place as any to talk about the algorithm, which is almost The Simplest Thing that could Possibly Work. Alignment works by analyzing the audio tracks, relying on every video camera to have a microphone of its own. The most direct approach might be to compute the cross-correlation of these audio tracks and look for the peak … but this could require storing multi-gigabyte audio files in memory, and performing impossibly large FFTs. On computers of today, the direct approach is technologically infeasible.

The algorithm I settled on resembles the method a human uses when looking at the waveform view. First, it breaks each input audio stream into 40 ms blocks and computes the mean absolute value of each block. The resulting 25 Hz signal is the “volume envelope”. The code subtracts the mean volume from each track’s envelope, then performs a cross-correlation between tracks and looks for the peak, which identifies the relative shift. To avoid performing N^2 cross-correlations, one clip is selected as the fixed reference, and all others are compared to it. The peak position is quantized to the block duration (creating an error of +/- 20ms), so to improve accuracy a parabolic fit is used to interpolate the true maximum. I don’t know the exact residual error, but I expect it’s typically less than 5 ms, which should be plenty good enough, seeing as sound travels about 1 foot per ms.

My original intent was to compensate for clock skew as well, because all these recording devices are using independent sample clocks that are running at slightly different rates due to manufacturing variation. There’s even code in the commit for a far more complex algorithm that can measure this clock skew. At the moment, this code is disused, for two reasons: none of our test clips actually showed appreciable skew, and PiTiVi doesn’t actually support changing the speed of clips, especially audio.

If you want to help, just stop by the PiTiVi mailing list or IRC channel. We can use more test clips, a real testing framework, a cancel button, UI improvements, conversion to C for speed, and all sorts of general bug squashing. For this feature, and throughout PiTiVi, there’s always more to be done. I’ve found the developer community to be extremely welcoming of new contributions … come and join us.

by Ben at July 26, 2011 04:12 AM