Good news to hear that the GStreamer backend for video playback in Firefox has landed, due to a flurry of work by Alessandro Decina in the last few months. Of course, this isn’t part of the standard Firefox build (but maybe some day?), but it’s very useful for putting Firefox on mobile and embedded platforms, since GStreamer has a well-established ecosystem of vendor-provided plugins for hardware decoding.
From Sunday 15th April through to 29th April I’ll be mostly offline as I take some leave to visit Pitcairn Island, one of the remotest inhabited islands with a population of about 50 people.
My first stop is flying from New Zealand to Tahiti where I spend a couple of days, then I fly to Mangareva on the 17th to board the Xplore sailing yacht for the approximately two day trip to Pitcairn. I spend a couple of days on the island itself, then return on the Xplore back to Mangareva, followed by flying back to Tahiti for a few more days.
The trip was easy to organise through Pitcairn Travel. Longer trips are available than the one I’m taking but none are scheduled at this time of year. Assuming this trip goes well I hope to go for longer, and maybe to the other Islands in the Pitcairn group, in the future.
Why Pitcairn? Pitcairn is the island that was settled by the Bounty mutineers. My grandmother was born on the island and through her I’m a descendant of three mutineers (Fletcher Christian, John Mills, Ned Young and their Tahitian wives are my sixth great grandparents). I’m looking forward to visiting the Bounty monument in Tahiti and the Bounty Plaque on Pitcairn.
Electricity is available on Pitcairn for about 10 hours per day which limits laptop/gadget usage time. Luckily I plan to spend as much time as possible exploring the island, weather permitting.
I’ll have internet access while in Tahiti but I suspect access to be a bit hit or miss on Pitcairn. In the past internet access was available by sharing satellite internet that was provided by a United States Geologic Survey station on the island. A description of the setup is available here.
Later the Pitcairn Island Government arranged their own satellite internet capability. Recently speeds have been improved to 512 kilobits per second - shared amongst the approximately 50 people on the island. Costs for residents of the island are around $40 per 400MB of usage from what I hear. I would imagine that if someone wanted to regularly access the internet there for work they’d require a dedicated satellite internet connection just for that (Something like Pactel’s VSAT internet maybe). I’ll be sure to do a later post on what using the modern web is like in this part of the world.
Anyone in the area of Tahiti, Mangareva or Pitcairn, let me know, I’d be keen to meet.
Recently learned about a cool new open hardware project called OggStreamer. They’re designing and making a small device that records an analog audio signal and streams it using Ogg/Vorbis. It’s an open hardware project, so all the schematics and PCB layout is provided.
A little more than a year ago, I posted about GStreamer support for SDI and HD-SDI using DeckLink hardware from BlackMagic Design. In the meantime, the decklinksrc and decklinksink elements have grown up a bit, and work with most devices in the DeckLink and Intensity line of hardware. A laundry list of features:
Multiple device support
Multiple input and output support on a single device
HDMI, component analog, and composite input and output with Intensity Pro
Analog, AES/EBU, and embedded (HDMI/SDI) audio input
SDI, HD-SDI, and Optical SDI input and output with DeckLink
Works on Linux, OS/X (new), and Windows
8-bit and 10-bit support for SDI/HD-SDI
Supports most video modes in the DeckLink SDK
Implements GstPropertyProbe interface for proper detection as a source element
Lots of bug fixes from previous releases
Kudos to Blake Tregre and Joshua Doe for submitting several of the patches implementing the above list. There still a bunch of outstanding bug reports (some with patches) that need to be fixed. Several of these relate to output, which is currently rather clumsy and broken.
People have asked me about automatically detecting the video mode for input. Some DeckLink hardware has this capability, but not any of the hardware I have to test with. However, I’ve had some success with cycling through the video modes at the application level, with a 200 ms timeout between modes, stopping when it finds a mode than generates output. This works ok, except that it tends to confuse 60i and 30p modes (and 50i with 25p), which can be differentiated with a bit of processing on the images. At some point I’d like to integrate this functionality into decklinksrc, but wouldn’t be upset if someone else did it first.
Digital video is a time series of pictures, and each picture is comprised of an array of pixels, and each pixel is comprised of three numbers representing how brightly the red, green, and blue LCD dots (or CRT phosphors, if you’re old school) glow. The representation in memory, however, is not of RGB values, but of YCbCr values, which one calculates by multiplying a 3×3 matrix with the RGB values, and then adding/subtracting some offsets. This converts the components into a gray value (Y, or luma) and Cb and Cr (chroma blue and chroma red). The reason for doing this is because the human visual system is more sensitive to variations in luma compared to variations in chroma (er, actually luminance and chrominance, see below). Furthermore, for this reason, typically half or 3/4 of the chroma values are dropped and not stored — the missing ones are interpolated when converting back to RGB for display.
There are various theoretical reasons for choosing a particular matrix, and I’ve recently become interested if these reasons are actually valid. For historical reasons, early digital video copied analog precedent and used a matrix that is theoretically suboptimal. This matrix is used in standard definition (SD) video, but was changed to the theoretically correct matrix for high-definition (HD) video. There are other technical differences between SD and HD video, but this is the most significant for color accuracy.
For some time, I’ve been curious how much of a visual difference there is between the two matrices. Here are two stills from Big Buck Bunny, the first is the original, correct image, and the second is the same picture converted to YCbCr with the HDTV matrix and then back to RGB with the SDTV matrix. (To best see the differences, open the images in separate browser tabs and flip between them.)
If you are like me, you probably have trouble seeing the difference side by side, but flipping between them makes it fairly obvious. I chose this image because it has relatively saturated green and greenish-yellow, which shows off some of the largest differences.
The RGB values for the pixels that are used in computation are not proportional to the actual amount of power output by a monitor. This is known as gamma correction, and is a clever byproduct of the fact that the response curve of television phosphors (the amount of light output for a given voltage) is approximately similar to the response curve of the eye (the perceived brightness based on the amount of light). Thus voltage became synonymous with perceived brightness, televisions had fewer vacuum tubes, and we’re left with that legacy. But it’s not a bad legacy, because just like dropping chroma values, it makes it easier to compress images.
However, color comes along and messes with that simplicity a bit. Luminance in color theory is used to describe how the brain interprets the brightness of a particular pixel, which is proportional to the RGB values in linear light space, i.e., the amount of light emanating from a display. Luma is proportional to the RGB values in gamma-corrected (actually, gamma-compressed) space. This means that luma doesn’t simply depend on luminance, and contains some variation due to color. This messes with our idea that matrixing RGB values will separate variations in brightness from variations in color. How visible is it? I took the above picture and squashed the luma to one value, leaving chroma values the same (HD matrix):
What you see here is that saturated areas appear brighter than the grey areas. This is chroma (i.e., the color values we use in calculations) feeding into luminance (i.e., the perception of brightness).
How much does this matter for image and video compression efficiency? It’s a minor inefficiency of a subtle visual difference. In other words, not very much.
Earlier I mentioned that the HD matrix was theoretically more correct than the SD matrix. What about in practice? Here’s the same luma-squashed image with the SD matrix. Notice that there’s a lot more leakage from chroma into luminance, especially in the green leaves:
Update 2012-03-29 - The Nexus S port has moved to an ICS base system and the existing Gingerbread base no longer works correctly. I’ve adjusted the instructions below to build the ICS based system. It just involves using ‘config-nexuss-ics’ instead of ‘config-nexus’.
I’ve built and run B2G on the emulator before but I wanted to try it out on real hardware to test the video support and play around with the OS. I upgraded my main phone to a Galaxy Note recently leaving a my Nexus S spare for trying different ROMS on it. Support for the Nexus S has started becoming available for B2G (previously the main consumer phone for testing was the Galaxy S II) so I gave it a try. The Nexus S I have is the GSM (non-4G) version. The steps to get the source code:
$ git clone git://github.com/andreasgal/B2G
$ cd B2G
$ make sync
This takes a long time (on New Zealand networks anyway…). Multiple gigabytes of git submodules are cloned.
You’ll want to make sure you have a build environment set up, as per this MDN article so that ‘adb’ and other android tools work. Once done, configure for a Nexus S build:
$ make config-nexuss-ics
This will download binaries for the phone and get you to confirm a bunch of licenses. To build:
$ make gonk
...
$ make
...
The ‘gonk’ make invocation builds the underlying android layer. The following ‘make’ builds gecko and related parts of B2G. Once those are completed you can flash the phone with the result. Note, you do the following at your own risk! You’re flashing your phone, overwriting everything, with experimental, possibly buggy software.
To flash a Nexus S you need to have unlocked the bootloader. If you haven’t done this yet, boot into the bootloader (hold the up volume key down while pressing the power button) and run:
$ make unlock-bootloader
This runs “fastboot oem unlock” which is the command to unlock the Nexus S bootloader. You’ll need to agree to it on the phone. I also installed the CyanogenMod recovery firmware but this is optional. Instructions to do that are here.
To flash, boot the phone into recovery mode (Hold down the up volume key while pressing power, from the menu that appears choose ‘Recovery’), plugin in the USB cable to your PC. Run:
$ make flash-only
‘flash-only’ will flash your B2G build onto the phone. You’ll lose everything on the phone, sorry.
If your Nexus S was running a 2.3 based Android this should just work. If it was running ICS (as mine was) then you might get an error about unsupported baseband and/or bootloader versions. If you get this, I fixed it by editing ‘glue/gonk/device/samsung/crespo/board-info.txt’ to add the versions that my bootloader and baseband had. My file looked like:
Note the addition of ‘I9020XXKL1’ and ‘I9020XXKI1’ to the bootloader and baseband lines respectively. After editing I redid “make gonk”, “make” and “make flash-only”.
After ‘flash-only’ your phone will reboot. If it boots into an error box saying “no homescreen found” then do the following while the USB cable is connected and that error is showing:
$ make install-gaia
$ adb reboot
This will install the user files and reboot the phone. B2G should now be running on the device. Enjoy! I’ve found calls, text messages, Wifi and Web Browsing works.
The recent hubbub regarding the (admirably public) debate within Mozilla about codec support has set me thinking about how to deal with untenable situations. After rightly railing against H.264 on the web for several years, and pushing free codecs with the full thrust of the organization, Mozilla may now be approaching consensus that they cannot win, and that continued refusal to capitulate to the cartel is tantamount to organizational suicide.
So what can you do, when you find yourself compelled to do something that goes against your ethics? To make a choice that you feel is wrong on its own because it benefits you in other ways, a choice you would like to make only when really necessary and never otherwise? Any thinking person will have this problem, to greater and lesser degrees, throughout their lives. We are not martyrs, so we do what we have to do to survive and try to keep in mind our need to escape from the trap.
Organizations cannot simply keep something in mind, but they can adopt structures that remind their members of their values even when those values are compromised. A common structure of this type is the sin tax, a tax designed (in a democracy) by members of a state to help them break or prevent their own bad habits. Sin taxes work by countering the locally perceived benefit of some action that’s harmful in a larger way, by reminding us of less visible but still important negative considerations. Some of their effect is straightforwardly economic, but some is psychological, to help us remember the bigger picture.
Sin taxes are more or less involuntary, but when the government does not impose these reminders, we often choose to remind ourselves. One currently popular implementation of this concept is the Carbon offset, a payment typically made when burning fuel to counter the effect of global warming. Organizations that buy carbon offsets for their fuel consumption do so to send a message, both internally and externally, that they place real value on minimizing carbon emissions. They may send this message both explicitly (by publicizing the purchase) and implicitly (by its effect on internal and external economic incentives).
Carbon offsets may be in fashion this decade, but there are many older forms of this concept. Maybe the most quotidian is the Curse Jar*, traditionally a place in a home or small office where individuals may make a small payment when using discouraged vocabulary. The Curse Jar provides a disincentive to coarse language despite being strictly voluntary, and despite not purchasing any effect on the linguistic environment (although the coffee fund may help for some). The Curse Jar works simply by reminding group members which behaviors are accepted and which are not.
For Mozilla, the difficulty is not emissions, verbal or vaporous, but ethical behavior. How can Mozilla publicly commit to a standard of behavior while violating it? I humbly submit that the answer is to balance its karmic books, by introducing an Ethics Offset**. When Mozilla finds itself cornered, it may take the necessary unfortunate action … and introduce a proportionate positive action as a reminder about its real values.
In the case at hand, a reasonable Ethics Offset might look like an internal “tax” on all uses of patented codecs. For example, for every Boot2Gecko device that is sold, Mozilla could commit to an offset equal to double the amount spent on patent licenses for the device. The offset could be donated to relevant worthy causes, like organizations that oppose software patents or contribute to the development of patent-free multimedia … but the actual recipient matters much less than the commitment. By accumulating and periodically (and publicly) “losing” this money, Mozilla would remind us all about its commitment to freedom in the multimedia realm. A similar scheme may be appropriate for Firefox Mobile if it is also configured for H.264 support.
Without a reminder of this kind, Mozilla risks becoming dangerously complacent and complicit to the cartel-controlled multimedia monopolies. As long as H.264 support appears to serve Mozilla’s other goals, Mozilla’s commitment to multimedia freedom will remain uncomfortable, inconvenient, and tempting to forget. Greater organizations have slid down off their ethical peaks, on paths paved all along with good intentions.
Most companies would not even consider a public and persistent admission of compromise, but Mozilla is not most companies. Neither are the companies that produce free operating systems, and many other components of the free software ecosystem. None of them should be ashamed to admit when they are forced to compromise their values and support enterprises that, on ethical grounds, they despise … but they should make their position clear, by committing to an Ethics Offset until they can escape from the compromise entirely.
*: Why is there no Wikipedia entry for “Curse Jar”!?
**: Let’s not call it an indulgence.
Articles last month revealed that musician Neil Young and Apple's
Steve Jobs discussed offering digital music downloads of
'uncompromised studio quality'. Much of the press and user
commentary was particularly enthusiastic about the prospect of
uncompressed 24 bit 192kHz downloads. 24/192 featured prominently
in my own conversations with Mr. Young's group several months
ago.
Unfortunately, there is no point to distributing music in
24-bit/192kHz format. Its playback fidelity is slightly inferior
to 16/44.1 or 16/48, and it takes up 6 times the space.
It's fairly long... but hearing,
perception and fidelity are complicated topics. Shysters and
charlatans exploit that nuance (and misunderstanding) to bilk
unsuspecting consumers of their money, all the while convincing
them they're paying for 'quality'.
During LCA 2012, I got to meet face-to-face (for only the second time) with David Rowe and discuss Codec2. This led to a hacking session where we figured out how to save about 10 bits on LSP quantization by using vector quantization (VQ). This may not sound like a lot, but for a 2 kb/s codec, 10 bits every 20 ms is 500 b/s, so one quarter of the bit-rate. That new code is now in David's hands and he's been doing a good job of tweaking it to get optimal quality/bitrate. This led me to look at the rest of the bits, which are taken mostly by the pitch frequency (between 50 Hz and 400 Hz) and the excitation energy (between -10 dB and 40 dB). The pitch is currently coded linearly (constant spacing in Hz) with 7 bits, while the energy is coded linearly in dB using 5 bits. That's a total of 12 bits for pitch and energy. Now, how can we improve that?
The first assumption I make here is that David already checked that both gain and energy are encoded at the "optimal" resolution that balances bitrate and coding artefacts. To reduce the rate, we need a smarter quantizer. Below is the distribution of the pitch and energy for my training database.
So what if we were to use vector quantization to reduce the bit-rate. In theory, we could reduce the rate (for equal error) by having more codevectors in areas where the figure above shows more data. Same error, lower rate, but still a bad idea. It would be bad because it would mean that for some people, whose pitch falls into the range that is less likely, codec2 wouldn't work well. It would also mean that just changing the audio gain could make codec2 do worse. That is clearly not acceptable. We need to not just care about the mean square error (MSE), but also about the outliers. We need to be able to encode any amplitude with increments of 1-2 dB and any pitch with an increment around 0.04-0.08 (between half a semitone and a semitone). So it looks like we're stuck and the best we could do is to have uniform VQ, which wouldn't save much compared to scalar quantization.
The key here is to relax our resolution constraint above. In practice, we only need such good resolution when the signal is stationnary. For example, when the pitch in unvoiced frames jumps around randomly, it's not really important to encode it accurately. Similarly, energy error are much more perceivable when the energy is stable than when it's fluctuating. So this is where prediction becomes very useful, because stationary signals are exactly the ones that are easily predicted. By using a simple first-order recursive predictor (prediction = alpha*previous_value), we can reduce the range for which we need good resolution by a factor (1-alpha). For example, if we have a signal that ranges from 0 to 100 and we want a resolution of 1, then using alpha=0.1, the prediction error (current_value-prediction) will have a range of 0 to 10 when the signal is stationary. We still need to have quantizer values outside that range to encode variations, but we don't need a good resolution.
Now that we have reduced the domain for which we need good resolution, we can actually start using vector quantization too. By combining prediction and vector quantization, it's possible to have a good enough quantizer using only 8 bits for both the energy and the pitch, saving 4 bits, so 200 b/s. The figure below illustrates how the quantizer is trained, with the distribution of the prediction residual (actual value minus prediction) in blue, and the distribution of the code vectors in red. The prediction coefficients are 0.8 for pitch and 0.9 for energy.
First thing we notice from the residual distribution is that it's much less uniform and there's two higher-density areas that stand out. The first is around (0.3,0), which corresponds to the case where the pitch and energy are stationary and is about one fifth of the range for pitch (which has a prediction coefficient of 4/5) and one tenth of the range for energy (which has a prediction coefficient of 9/10). The second higher-density area is a line around residual energy of -2.5, and it corresponds to silence. Now looking at the codebook in red, we can see a very high density of vectors in the area of stationary speech, enough for a resolution of 1-2 dB energy and 1/2 to 1 semitone for pitch. The difference is that this time the high resolution is only needed for much smaller range. Now, the reason we see such a high density of code vectors around stationary speech and not so much around the "silence line" is that the last detail of this quantizer: weighting. The whole codebook training procedure uses weighting based on how important the quantization error is. The weight given to pitch and energy error on stationary voiced speech is much higher than it is for non-stationary speech or silence. This is why this quantizer is able to give good enough quality with 8 bits instead of 12.
With the latest developments in HTML5 and the still fairly new ARIA (Accessible Rich Interface Applications) attributes introduced by the W3C WAI (Web Accessibility Initiative), browsers have now implemented many features that allow you to make your JavaScript-heavy Web applications accessible.
Since I began working on making a complex web application accessible just over a year ago, I discovered that there was no step-by-step guide to approaching the changes necessary for creating an accessible Web application. Therefore, many people believe that it is still hard, if not impossible, to make Web applications accessible. In fact, it can be approached systematically, as this article will describe.
This post is based on a talk that Alice Boxhall and I gave at the recent Linux.conf.au titled “Developing accessible Web apps – how hard can it be?” (slides, video), which in turn was based on a Google Developer Day talk by Rachel Shearer (slides).
These talks, and this article, introduce a process that you can follow to make your Web applications accessible: each step will take you closer to having an application that can be accessed using a keyboard alone, and by users of screenreaders and other accessibility technology (AT).
The recommendations here only roughly conform to the requirements of WCAG (Web Content Accessibility Guidelines), which is the basis of legal accessibility requirements in many jurisdictions. The steps in this article may or may not be sufficient to meet a legal requirement. It is focused on the practical outcome of ensuring users with disabilities can use your Web application.
Step-by-step Approach
The steps to follow to make your Web apps accessible are as follows:
Use native HTML tags wherever possible
Make interactive elements keyboard accessible
Provide extra markup for AT (accessibility technology)
If you are a total newcomer to accessibility, I highly recommend installing a screenreader and just trying to read/navigate some Web pages. On Windows you can install the free NVDA screenreader, on Mac you can activate the pre-installed VoiceOver screenreader, on Linux you can use Orca, and if you just want a browser plugin for Chrome try installing ChromeVox.
1. Use native HTML tags
As you implement your Web application with interactive controls, try to use as many native HTML tags as possible.
HTML5 provides a rich set of elements which can be used to both add functionality and provide semantic context to your page. HTML4 already included many useful interactive controls, like <a>, <button>, <input> and <select>, and semantic landmark elements like <h1>. HTML5 adds richer <input> controls, and a more sophisticated set of semantic markup elements like such as <time>, <progress>, <meter>, <nav>, <header>, <article> and <aside>. (Note: check browser support for browser support of the new tags).
Using as much of the rich HTML5 markup as possible means that you get all of the accessibility features which have been implemented in the browser for those elements, such as keyboard support, short-cut keys and accessibility metadata, for free. For generic tags you have to implement them completely from scratch.
What exactly do you miss out on when you use a generic tag such as <div> over a specific semantic one such as <button>?
Generic tags are not focusable. That means you cannot reach them through using the [tab] on the keyboard.
You cannot activate them with the space bar or enter key or perform any other keyboard interaction that would be regarded as typical with such a control.
Since the role that the control represents is not specified in code but is only exposed through your custom visual styling, screenreaders cannot express to their users what type of control it is, e.g. button or link.
Neither can screenreaders add the control to the list of controls on the page that are of a certain type, e.g. to navigate to all headers of a certain level on the page.
And finally you need to manually style the element in order for it to look distinctive compared to other elements on the page; using a default control will allow the browser to provide the default style for the platform, which you can still override using CSS if you want.
Example:
Compare these two buttons. The first one is implemented using a <div> tag, the second one using a <button> tag. Try using a screenreader to experience the difference.
Many sophisticated web applications have some interactive controls that just have no appropriate HTML tag equivalent. In this case, you will have had to build an interactive element with JavaScript and <div> and/or <span> tags and lots of custom styling. The good news is, it’s possible to make even these custom controls accessible, and as a side benefit you will also make your application smoother to use for power users.
The first thing you can do to test usability of your control, or your Web app, is to unplug the mouse and try to use only the [TAB] and [ENTER] keys to interact with your application.
Try the following:
Can you reach all interactive elements with [TAB]?
Can you activate interactive elements with [ENTER] (or [SPACE])?
Are the elements in the right tab order?
After interaction: is the right element in focus?
Is there a keyboard shortcut that activates the element (accesskey)?
No? Let’s fix it.
2.1. Reaching interactive elements
If you have an element on your page that cannot be reached with [TAB], put a @tabindex attribute on it.
Example:
Here we have a <span> tag that works as a link (don’t do this – it’s just a simple example). The first one cannot be reached using [TAB] but the second one has a tabindex and is thus part of the tab order of the HTML page.
(Note: since we experiment lots with the tabindex in this article, to avoid confusion, click on some text in this paragraph and then hit the [TAB] key to see where it goes next. The click will set your keyboard focus in the DOM.)
You set @tabindex=0 to add an element into the native tab order of the page, which is the DOM order.
2.2. Activating interactive elements
Next, you typically want to be able to use the [ENTER] and [SPACE] keys to activate your custom control. To do so, you will need to implement an onkeydown event handler. Note that the keyCode for [ENTER] is 13 and for [SPACE] is 32.
Example:
Let’s add this functionality to the <span> tag from before. Try tabbing to it and hit the [ENTER] or [SPACE] key.
<script>
function handlekey(event) {
var target = event.target || event.srcElement;
if (event.keyCode == 13 || event.keyCode == 32) { target.onclick(); }
}
</script> Click
<span class="customlink" onclick="alert('activated!')" tabindex="0"
onkeydown="handlekey(event);">
Click
</span>
<script>
function handlekey(event) {
var target = event.target || event.srcElement;
if (event.keyCode == 13 || event.keyCode == 32) {
target.onclick();
}
}
</script>
Note that there are some controls that might need support for keys other than [tab] or [enter] to be able to use them from the keyboard alone, for example a custom list box, menu or slider should respond to arrow keys.
2.3. Elements in the right tab order
Have you tried tabbing to all the elements on your page that you care about? If so, check if the order of tab stops seems right. The default order is given by the order in which interactive elements appear in the DOM. For example, if your page’s code has a right column that is coded before the main article, then the links in the right column will receive tab focus first before the links in the main article.
You could change this by re-ordering your DOM, but oftentimes this is not possible. So, instead give the elements that should be the first ones to receive tab focus a positive @tabindex. The tab access will start at the smallest non-zero @tabindex value. If multiple elements share the same @tabindex value, these controls receive tab focus in DOM order. After that, interactive elements and those with @tabindex=0 will receive tab focus in DOM order.
Example:
The one thing that always annoys me the most is if the tab order in forms that I am supposed to fill in is illogical. Here is an example where the first and last name are separated by the address because they are in a table. We could fix it by moving to a <div> based layout, but let’s use @tabindex to demonstrate the change.
Be very careful with using non-zero tabindex values. Since they change the tab order on the page, you may get side effects that you might not have intended, such as having to give other elements on the page a non-zero tabindex value to avoid skipping too many other elements as I would need to do here.
2.4. Focus on the right element
Some of the controls that you create may be rather complex and open elements on the page that were previously hidden. This is particularly the case for drop-downs, pop-ups, and menus in general. Oftentimes the hidden element is not defined in the DOM right after the interactive control, such that a [TAB] will not put your keyboard focus on the next element that you are interacting with.
The solution is to manage your keyboard focus from JavaScript using the .focus() method.
Example:
Here is a menu that is declared ahead of the menu button. If you tab onto the button and hit enter, the menu is revealed. But your tab focus is still on the menu button, so your next [TAB] will take you somewhere else. We fix it by setting the focus on the first menu item after opening the menu.
You will notice that there are still some things you can improve on here. For example, after you close the menu again with one of the menu items, the focus does not move back onto the menu button.
Also, after opening the menu, you may prefer not to move the focus onto the first menu item but rather just onto the menu <div>. You can do so by giving that div a @tabindex and then calling .focus() on it. If you do not want to make the div part of the normal tabbing order, just give it a @tabindex=-1 value. This will allow your div to receive focus from script, but be exempt from accidental tabbing onto (though usually you just want to use @tabindex=0).
Bonus: If you want to help keyboard users even more, you can also put outlines on the element that is currently in focus using CSS”s outline property. If you want to avoid the outlines for mouse users, you can dynamically add a class that removes the outline in mouseover events but leaves it for :focus.
2.5. Provide sensible keyboard shortcuts
At this stage your application is actually keyboard accessible. Congratulations!
However, it’s still not very efficient: like power-users, screenreader users love keyboard shortcuts: can you imagine if you were forced to tab through an entire page, or navigate back to a menu tree at the top of the page, to reach each control you were interested in? And, obviously, anything which makes navigating the app via the keyboard more efficient for screenreader users will benefit all power users as well, like the ubiquitous keyboard shortcuts for cut, copy and paste.
HTML4 introduced so-called accesskeys for this. In HTML5 @accesskey is now allowed on all elements.
The @accesskey attribute takes the value of a keyboard key (e.g. @accesskey="x") and is activated through platform- and browser-specific activation keys. For example, on the Mac it’s generally the [Ctrl] key, in IE it’ the [Alt] key, in Firefox on Windows [Shift]-[Alt], and in Opera on Windows [Shift]-[ESC]. You press the activation key and the accesskey together which either activates or focuses the element with the @accesskey attribute.
Example:
<script>
var button = document.getElementById('accessbutton');
if (button.accessKeyLabel) {
button.innerHTML += ' (' + button.accessKeyLabel + ')';
}
</script>
Now, the idea behind this is clever, but the execution is pretty poor. Firstly, the different activation keys between different platforms and browsers make it really hard for people to get used to the accesskeys. Secondly, the key combinations can conflict with browser and screenreader shortcut keys, the first of which will render browser shortcuts unusable and the second will effectively remove the accesskeys.
In the end it is up to the Web application developer whether to use the accesskey attribute or whether to implement explicit shortcut keys for the application through key event handlers on the window object. In either case, make sure to provide a help list for your shortcut keys.
Also note that a page with a really good hierarchical heading layout and use of ARIA landmarks can help to eliminate the need for accesskeys to jump around the page, since there are typically default navigations available in screen readers to jump directly to headings, hyperlinks, and ARIA landmarks.
3. Provide markup for AT
Having made the application keyboard accessible also has advantages for screenreaders, since they can now reach the controls individually and activate them. So, next we will use a screenreader and close our eyes to find out where we only provide visual cues to understand the necessary interaction.
Here are some of the issues to consider:
Role may need to get identified
States may need to be kept track of
Properties may need to be made explicit
Labels may need to be provided for elements
This is where the W3C’s ARIA (Accessible Rich Internet Applications) standard comes in. ARIA attributes provide semantic information to screen readers and other AT that is otherwise conveyed only visually.
Note that using ARIA does not automatically implement the standard widget behavior – you’ll still need to add focus management, keyboard navigation, and change aria attribute values in script.
3.1. ARIA roles
After implementing a custom interactive widget, you need to add a @role attribute to indicate what type of controls it is, e.g. that it is playing the role of a standard tag such as a button.
Example:
This menu button is implemented as a <div>, but with a role of “button” it is announced as a button by a screenreader.
Menu
<div tabindex="0" role="button">Menu</div>
ARIA roles also describe composite controls that do not have a native HTML equivalent.
Example:
This menu with menu items is implemented as a set of <div> tags, but with a role of “menu” and “menuitem” items.
Some interactive controls represent different states, e.g. a checkbox can be checked or unchecked, or a menu can be expanded or collapsed.
Example:
The following menu has states on the menu items, which are here not just used to give an aural indication through the screenreader, but also a visual one through CSS.
Some of the functionality of interactive controls cannot be captured by the role attribute alone. We have ARIA properties to add features that the screenreader needs to announce, such as aria-label, aria-haspopup, aria-activedescendant, or aria-live.
Example:
The following drop-down menu uses aria-haspopup to tell the screenreader that there is a popup hidden behind the menu button together with an ARIA state of aria-expanded to track whether it’s open or closed.
<script>
var button = document.getElementById("button");
var menu = document.getElementById("menu");
var items = document.getElementsByClassName("menuitem");
var focused = 0;
function showMenu(evt) {
evt.stopPropagation();
menu.style.visibility = 'visible';
button.setAttribute('aria-expanded','true');
focused = getSelected();
items[focused].focus();
}
function hideMenu(evt) {
evt.stopPropagation();
menu.style.visibility = 'hidden';
button.setAttribute('aria-expanded','false');
button.focus();
}
function getSelected() {
for (var i=0; i < items.length; i++) {
if (items[i].getAttribute('aria-checked') == 'true') {
return i;
}
}
}
function setSelected(elem) {
var curSelected = getSelected();
items[curSelected].setAttribute('aria-checked', 'false');
elem.setAttribute('aria-checked', 'true');
}
function selectItem(evt) {
setSelected(evt.target);
hideMenu(evt);
}
function getPrevItem(index) {
var prev = index - 1;
if (prev < 0) {
prev = items.length - 1;
}
return prev;
}
function getNextItem(index) {
var next = index + 1;
if (next == items.length) {
next = 0;
}
return next;
}
function handleButtonKeys(evt) {
evt.stopPropagation();
var key = evt.keyCode;
switch(key) {
case (13): /* ENTER */
case (32): /* SPACE */
showMenu(evt);
default:
}
}
function handleMenuKeys(evt) {
evt.stopPropagation();
var key = evt.keyCode;
switch(key) {
case (38): /* UP */
focused = getPrevItem(focused);
items[focused].focus();
break;
case (40): /* DOWN */
focused = getNextItem(focused);
items[focused].focus();
break;
case (13): /* ENTER */
case (32): /* SPACE */
setSelected(evt.target);
hideMenu(evt);
break;
case (27): /* ESC */
hideMenu(evt);
break;
default:
}
}
button.addEventListener('click', showMenu, false);
button.addEventListener('keydown', handleButtonKeys, false);
for (var i = 0; i < items.length; i++) {
items[i].addEventListener('click', selectItem, false);
items[i].addEventListener('keydown', handleMenuKeys, false);
}
</script>
<div class="custombutton" id="button" tabindex="0" role="button"
aria-expanded="false" aria-haspopup="true">
<span>Justify</span>
</div>
<div role="menu" class="menu" id="menu" style="display: none;">
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="true">
Left
</div>
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="false">
Center
</div>
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="false">
Right
</div>
</div>
[CSS and JavaScript for example omitted]
3.4. Labelling
The main issue that people know about accessibility seems to be that they have to put alt text onto images. This is only one means to provide labels to screenreaders for page content. Labels are short informative pieces of text that provide a name to a control.
There are actually several ways of providing labels for controls:
on img elements use @alt
on input elements use the label element
use @aria-labelledby if there is another element that contains the label
use @title if you also want a label to be used as a tooltip
otherwise use @aria-label
I'll provide examples for the first two use cases - the other use cases are simple to deduce.
Example:
The following two images show the rough concept for providing alt text for images: images that provide information should be transcribed, images that are just decorative should receive an empty @alt attribute.
When marking up decorative images with an empty @alt attribute, the image is actually completely removed from the accessibility tree and does not confuse the blind user. This is a desired effect, so do remember to mark up all your images with @alt attributes, even those that don't contain anything of interest to AT.
Example:
In the example form above in Section 2.3, when tabbing directly on the input elements, the screen reader will only say "edit text" without announcing what meaning that text has. That's not very useful. So let's introduce a label element for the input elements. We'll also add checkboxes with a label.
In this example we use several different approaches to show what a different it makes to use the <label> element to mark up input boxes.
The first two fields just have a <label> element next to a <input> element. When using a screenreader you will not notice a difference between this and not using the <label> element because there is no connection between the <label> and the <input> element.
In the third field we use the @for attribute to create that link. Now the input field isn't just announced as "edit text", but rather as "Lastname edit text", which is much more useful. Also, the screenreader can now skip the labels and get straight on the input element.
In the fourth and fifth field we actually encapsulate the <input> element inside the <label> element, thus avoiding the need for a @for attribute, though it doesn't hurt to explicity add it.
Finally we look at the checkbox. By including a referenced <label> element with the checkbox, we change the screenreaders announcement from just "checkbox not checked" to "Remember me checkbox not checked". Also notice that the click target now includes the label, making the checkbox not only more usable to screenreaders, but also for mouse users.
4. Conclusions
This article introduced a process that you can follow to make your Web applications accessible. As you do that, you will noticed that there are other things that you may need to do in order to give the best experience to a power user on a keyboard, a blind user using a screenreader, or a vision-impaired user using a screen magnifier. But once you've made a start, you will notice that it's not all black magic and a lot can be achieved with just a little markup.
I spoke about the video and audio element in HTML5, how to provide fallback content, how to encode content, how to control them from JavaScript, and briefly about Drupal video modules, though the next presentation provided much more insight into those. I explained how to make the HTML5 media elements accessible, including accessible controls, captions, audio descriptions, and the new WebVTT file format. I ran out of time to introduce the last section of my slides which are on WebRTC.
Linux.conf.au
On the first day of LCA I gave a talk both in the Multimedia Miniconf and the Browser Miniconf.
Browser Miniconf
In the Browser Miniconf I talked about “Web Standardisation – how browser vendors collaborate, or not” (slides). Maybe the most interesting part about this was that I tried out a new slide “deck” tool called impress.js. I’m not yet sure if I like it but it worked well for this talk, in which I explained how the HTML5 spec is authored and who has input.
I also sat on a panel of browser developers in the Browser Miniconf (more as a standards than as a browser developer, but that’s close enough). We were asked about all kinds of latest developments in HTML5, CSS3, and media standards in the browser.
Multimedia Miniconf
In the Multimedia Miniconf I gave a “HTML5 media accessibility update” (slides). I talked about the accessibility problems of Flash, how native HTML5 video players will be better, about accessible video controls, captions, navigation chapters, audio descriptions, and WebVTT. I also provided a demo of how to synchronize multiple video elements using a polyfill for the multitrack API.
Finally, and most importantly, Alice Boxhall and myself gave a talk in the main linux.conf.au titled “Developing Accessible Web Apps – how hard can it be?” (video, slides). I spoke about a process that you can follow to make your Web applications accessible. I’m writing a separate blog post to explain this in more detail. In her part, Alice dug below the surface of browsers to explain how the accessibility markup that Web developers provide is transformed into data structures that are handed to accessibility technologies.
I recently added support for 10- and 16-bit encoding and decoding to Schrödinger, so I did a little release. Presenting Schrödinger-1.0.11. Also pushed changes to GStreamer to handle the new features. Although these changes have been in the works for some time, a little prompting from j-b caused me to finish this off, so this will probably appear in VLC soon, too.
This was the last piece needed to create a 10-bit master of Sintel, which I’ve been planning to do for some time.
I just got back from linux.conf.au 2012 in Ballarat. The video for the talk I gave,
Opus, the Swiss Army Knife of Audio Codecs, is now available on the
Opus presentations page.
For the Ogg-impaired, a lower-quality version is also available on YouTube.
For those who are into speech codecs, I also recommend watching David Rowe's
presentation: Codec 2 -
Open Source Speech Coding at 2400 bit/s and Below. His presentation was
selected as one of the four best talks at LCA this year -- well worth watching.
Camilla forwarded a
necessary tip for installing the XiphQT components on a 64 bit
Mac OS X so that it works with iTunes. This is a reasonably well
known tip, but it wasn't in our FAQ or installation
instructions (well it is now as of about ten minutes ago) so I'm
passing it along now too...
I upgraded to Lion, and my ogg files stopped being able to play in iTunes (silently). Here's how to make it go:
"show in finder" your iTunes binary (either navigate to the Applications folder, or right/control click on it in the dock, and choose "show in finder")
right/control click on iTunes in the finder, and select "Get Info"
Under General, check the box marked "Open in 32-bit mode"
You should put the above on something linked from: http://www.xiph.org/quicktime/download.html I paraphrased it from roaringapps.com.
If XiphQT can be rebuilt in 64 bit mode, and that shipped that way to Lion users, that would also be a good solution.
That last comment is actually a bit of an embarrassment for us at the
moment; neither the XiphQT builds nor code have been updated since
2009 or so, despite multiple releases, fundamental improvements and new features in
the Xiph codecs since. There are actually more recent beta builds of
updated Mac OS X and Win32 XiphQT components than never got bumped
to the official
XiphQT download page, but even these builds are from mid 2009.
We don't have any high-powered Mac OS hackers in the
core Xiph group at the moment. I have some relatively insignificant
amount of experience coding for Mac OS X and Quicktime, but I've been
hoping for a volunteer with more chops. Any takers?
Turns out I missed blogging about the latest Ghost update... back in November...
Ghost
Demo4 is up on the demo list showing the sinusoidal extractor
doing some very early sinusoidal tracking frame to frame, and a
very early example of the analysis performing real
sinusoidal/non-sinusoidal audio splitting. Pictures and
interactive listening, oh my!
It looks like I'll be putting a month or two into transOgg before getting back to Ghost work (and demo 5). The work that went into demo4 raised a number of questions I'm not sure how to approach answering yet, so I'm going to let that percolate for a bit.
I made a quick change to the Xiph.Org front page that a few people
have suggested now over the past few years.
The top few blog posts
aggregated by Planet Xiph now appear as a five-item teaser list near
the top of the Xiph.Org home page. The idea is both to get some more
live content on the front page as well as
to draw more attention to both the Planet and our developer community.
Those who have been following the Opus git repository in the past few weeks probably haven't noticed much work going on. The reason is pretty simple, most of the work has been going on elsewhere in an experimental branch (exp_wip3 names for now) of my private repository. The reason it's in an experimental branch is that its not fully converted to fixed-point and hasn't been tested on any frame size other than 20 ms. Here's an (incomplete) list of changes for now:
Really unconstrained VBR (not trying to keep the same average rate)
Tonality detection to give highly tonal audio a boost in bit-rate
(yet another) rewrite of the transient detection code
New dynamic allocation code that boosts the rate of bands that have significant spectral leakage caused by short blocks
Thanks to these changes, the quality has (as far as we can tell) gone up compared to the current master branch. I invite you to judge for yourself by comparing the audio coded with the current master branch with the audio coded with the new exp_wip3 experimental branch. This is 64 kb/s, so fairly low rate for stereo music. The original is here. Let me know what you think.
We've made some changes to how the HTML full-screen API exits full-screen mode in Firefox 11, which is scheduled to ship in March 2012. Previously Document.mozCancelFullScreen() would fully-exit full-screen and return the browser to "normal" mode. Starting in Firefox 11, Document.mozCancelFullScreen() will restore full-screen state to the element that was previously full-screen. If there is no previous full-screen element in either the document or a parent document (full-screen mode isn't restored to former full-screen elements in child documents), then the browser will "fully-exit full-screen", and return the browser to normal mode.
To see how this is useful, consider the case of a PowerPoint clone or presentation web app that wants to run full-screen. One way to implement such a web app would be to have a full-screen <div> element where the slides are shown. The developer may want to be able to switch full-screen mode seamlessly between the slide deck <div> and (say) a <video>, and then return to having the slide deck <div> as the full-screen element so that the user can carry on with the presentation. Before this change, if the <video> was in a cross-origin subdocument (like a YouTube embedded player in an <iframe>) returning full-screen mode to the slide deck <div> from the <video> was a two-step process; users would have to fully-exit full-screen, and re-request full-screen mode on the slide deck element. Now developers can simply call Document.mozCancelFullScreen() and seamlessly switch back. The browser won't drop out of full-screen mode during the transition.
Note that if users press the escape key they will always fully-exit full-screen, i.e. Firefox won't restore the previous full-screen element to full-screen state on escape key press. So to seamlessly restore full-screen to the previous full-screen element, developers must explicitly call Document.mozCancelFullScreen(), they can't rely on the user pressing the escape key.
We've also added webconsole logging upon full-screen request failures to Firefox 11, to make debugging denied full-screen requests easier.
Another change coming in Firefox 11 is we'll no longer deny full-screen requests in web pages which contain windowed plugins. Now we'll exit full-screen when a windowed plugin is focused instead (on Windows and Linux, MacOSX is unaffected).
In a project I’m working on I’m using linear lists. This is the list_vt type in the ATS prelude. list_vt is similar to the list types in Lisp and functional programming languages except it is linear. The memory for the list is not managed by the garbage collector and the type system enforces the rule that only one reference to the linear object can exist. This sometimes requires a bit of extra effort when using pattern matching against the list_vt instances.
Pattern Matching
When pattern matching against linear objects you can do a destructive match or a non-destructive match. The former will destroy and free the memory allocated for the object automatically. The latter will not. Destructive matches are done by having the pattern match clause prefixed with a ~. For example, the following will print an integer list and destroy the list while it does it:
fun print_list (l: List_vt (int)): void =
case+ l of
| ~list_vt_nil () => printf("nil\n", @())
| ~list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))
fun test1 (): void = {
val a = list_vt_cons {int} (1, list_vt_nil)
val () = print_list (a)
}
Things get complicated when doing non-destructive matches. The following won’t typecheck:
fun print_list2 (l: !List_vt (int)): void =
case+ l of
| list_vt_nil () => printf("nil\n", @())
| list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))
fun test2 (): void = {
val a = list_vt_cons {int} (1, list_vt_nil)
val () = print_list2 (a)
val () = list_vt_free (a)
}
The problem with this example is that when the match is made we are effectively taking the linear object out of the variable l. This leaves l with a different type, but we’ve stated in the function signature for print_list2 that the type is not modified or consumed. We need a way of putting the linear object back into l once we’re done using the match. This primitive to do this is fold@ which I briefly introduced in my linear datatypes post. fold@ will change the type of l back to the original and prevent access to the pattern match variables. Usage looks like this:
fun print_list2 (l: !List_vt (int)): void =
case+ l of
| list_vt_nil () => (fold@ l; printf("nil\n", @()))
| list_vt_cons (x, !xs) => (printf("cons %d\n", @(x)); print_list2(!xs); fold@ l)
fun test2 (): void = {
val a = list_vt_cons {int} (1, list_vt_nil)
val () = print_list2 (a)
val () = list_vt_free (a)
}
You’ll notice with this version that the match for list_vt_cons has changed the xs parameter to be !xs. The second argument in the cons constructor is a linear object. If the object itself is matched against xs then it is another example of aliasing the linear object. It is taken out of the l and needs to be put back. The way ATS handles this is to require pattern matching with a ! prefixed. This makes xs be a pointer to the object rather than the object itself. So in this example xs has the type ptr addr where addr is the address of the actual List_vt object. This is why the xs is prefixed by ! in the recursive call to print_list2. The ! means dereference the pointer, so the List_vt it is pointing to is passed as the argument to the recursive call.
In this way the linear object is never taken out, we only access it via its pointer. The fold@ call in this clause will change xs back to the List_vt object. The fold@ call is done after the usage of !xs. If it was done before then we wouldn’t have access to the view for xs to be able to derefence it. print_list2 is still tail recursive as the fold@ call is only used during typechecking and is erased afterwards.
Filtering a linear list
In my project I needed to filter a linear list. Unfortunately ATS doesn’t have a filter implementation in the standard prelude for linear lists (it does for persistent lists). My first attempt at writing a list_vt_filter looked like:
fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) =
case+ l of
| list_vt_nil () => (fold@ l; list_vt_nil)
| list_vt_cons (x, !xs) when f (x) => let
val r = list_vt_cons (x, list_vt_filter (!xs, f))
in
fold@ l; r
end
| list_vt_cons (x, !xs) => let
val r = list_vt_filter (!xs, f)
in
fold@ l; r
end
This should look familiar since it’s very similar to the print_list2 code shown previously in the way it uses non-destructive matching and fold@. The function list_vt_filter takes a list_vt as an argument and a function to apply to each element in the list. That function returns true if the element should be included in the result list. Usage looks like:
val a = list_vt_cons (1, list_vt_cons (2, list_vt_cons (3, list_vt_cons (4, list_vt_nil ()))))
val b = list_vt_filter (a, lam (x) => x mod 2 = 0)
val () = list_vt_foreach_fun<int> (a, lam(x) =<> $effmask_all (printf("Value: %d\n", @(x))))
val () = list_vt_free (b)
val () = list_vt_free (a)
One issue with this implementation is it is not tail recursive. It has stack growth proportional to the size of the result list.
Tail Recursive Filtering
In Lisp code I’d often build the result list tail recursively by passing an accumulator, with each new element in the result being prepended to the accumulator. This builds a list in the reverse order so before returning it the list would be reversed. The ATS code for this is:
fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
fun loop (l: !List_vt (int), accum: List_vt (int)):<cloptr1> List_vt (int) =
case+ l of
| list_vt_nil () => (fold@ l; accum)
| list_vt_cons (x, !xs) when f (x) => let
val r = loop (!xs, list_vt_cons (x, accum))
in
(fold@ l; r)
end
| list_vt_cons (x, !xs) => let
val r = loop (!xs, accum)
in
(fold@ l; r)
end
in
list_vt_reverse (loop (l, list_vt_nil))
end
The cloptr1 function annotation marks the inner function as being a closure where the memory for the closure’s environment is managed by the compiler using malloc and free instead of the garbage collector (which is what cloref1 would signify). See my post on closures in ATS for more about the different closure and function types used by ATS.
Unfortunately the requirement to use fold@ after we’ve finished with using the pattern matched variables makes the code slightly more verbose as we need to do the tail recursion, obtaining the result, then do the fold@ and return the result. Remember that the fold@ is erased at type checking type which is how this code remains tail recursive even though the code structure makes it look like it isn’t.
One downside to this approach is we iterate over the list twice. Once to build the result, and once over the result to reverse it.
Single Pass Tail Recursive Filtering
The creation of the result list can be done in a single pass if we could create a cons with no second argument, and fill in that argument later when we have a result to store there that passes filtering. ATS allows construction of datatypes with a ‘hole’ that can be filled in later. The ‘hole’ is an unintialized type and we get a pointer to it. An example of doing this is:
var x = list_vt_cons {int} {0} (1, ?)
This creates a list_vt_cons with the data set to 1 but no second parameter. Instead of that parameter being of type List_vt (int) it is of type List_vt (int)?, the ? signifying it is uninitialized. For this example we have to pass the universal type parameters explicitly (the {int} {0}) as the ATS type inference algorithm can’t compute them.
To get a pointer to the ‘hole’ we have to pattern match:
val+ list_vt_cons (_, !xs) = x
val () = !xs := list_vt_nil
val () = fold@ x
In this example the xs is a pointer, pointing to the List_vt (int)?. It assigns a list_vt_nil to this, making the tail of the cons a list_vt_nil. Just like in our previous pattern matching examples using case, the code has to do a fold@ to change the type of x back to that containing a linear object once we’ve finished using xs.
Now that we can get pointers to the tail of the list we can implement a single pass tail recursive filter function:
fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
fun loop (l: !List_vt (int), res: &List_vt (int)? >> List_vt (int)):<cloptr1> void =
case+ l of
| list_vt_nil () => (fold@ l; (res := list_vt_nil))
| list_vt_cons (x, !xs) when f (x) => let
val () = res := list_vt_cons {int} {0} (x, ?)
val+ list_vt_cons (_, !p_xs) = res
in
loop (!xs, !p_xs); fold@ l; fold@ res
end
| list_vt_cons (x, !xs) => (loop (!xs, res); fold@ l)
var res: List_vt (int)?
val () = loop (l, res)
in
res
end
The loop function here no longer returns a result. Instead the result is passed via a reference (the & signifies ‘by reference’). When there is something that needs to be stored in the list, a cons is created with a hole in the tail position. This cons is stored in the result we are passing by reference and we tail recursively call with the hole as the new result. ATS converts this to nice C code that is a simple loop rather than recursive function calls.
Miscellaneous
The code examples in this post use List_vt (a). This is actually a typedef for list_vt (a,n) where a is the type and n is the length of the list. The typedef allows shorter examples without needing to specify the sorts for the list length. Using the full type though has the advantage of being able to specifiy a bit more type safety. For example, the original filter function would be declared as:
fun list_vt_filter {n:nat} (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)
This defines the type of the result as having a length equal to or less than that of the original list. This helps prevent errors in the implementation of the filter - it can’t accidentally leave extra items in the list. I cover this type of thing in my post on dependent types.
Another addition to safety that adding the extra sorts can provide is the ability to check that the function terminates. This can be done by adding a termination metric to the function definition:
fun list_vt_filter {n:nat} .<n>. (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)
A description of how fold@ works is in the ATS/Anairats User’s Guide PDF. It’s in the ‘Dataviewtypes’ section of the ‘Programming with Linear Types’ chapter and is referred to as folding and unfolding a linear type.
It’s the usage of linear types and dealing with their restrictions that makes my examples a bit more complex. If you use ATS mainly with non-linear types and link with the garbage collector then it becomes very much like using any other functional programming language, but with additional features in the type system. My interest has been around avoiding using a garbage collector and having the compiler give errors when memory is not allocated or free’d correctly. Don’t be put off from using ATS by these complex examples if you’re fine with using garbage collection and non-linear datatypes. You might never need to deal with the cases that bring in the extra complexity.
I was really impressed by Michael Bebenita’s Broadway.js, the recent port of an H.264 decoder to pure Javascript using Emscripten, a LLVM-based C-to-JS converter … but of course this is the opposite of what we want! Who needs H.264? We want WebM!
I’ve spent the past few weekends digging into Broadway.js, stripping out the H.264 bits and replacing them with libvpx and libnestegg. Now it’s working, to a degree. You can see it for yourself at the demo page (so far tested only in Firefox 7…).
I’m not going to be able to take this much further … at least not right now. It’s been a fun exercise though. I invite all interested comers to read some more details and then fork the repo.
A few days ago I enabled the HTML full-screen API in Firefox nightly builds. This enables developers to make an arbitrary HTML element "full-screen", hiding the browser's UI and stretching the element to encompass the entire screen. This will be particularly useful for HTML5 video and games.
If all goes well, this feature will ship in Firefox 10 at the end of January.
To enter full-screen mode, call the following method on the HTML Element you'd like to enter full-screen:
void mozRequestFullScreen() : posts an asynchronous request to make the HTML element the full-screen element. If the request is granted, some time later a bubbling "mozfullscreenchange" event is dispatched to the element which requested full-screen. If the request is denied, a "mozfullscreenerror" event is dispatched to the element's owning document. We only grant requests for full-screen when:
mozRequestFullScreen() is called in a user-generated event handler, e.g. a mouse click handler, and
the requesting element is in its document, and
there are no windowed plugins present in any document/iframe in the current page, and
all iframes containing the requesting element (if any) have the mozallowfullscreen attribute.
We added the following method and attributes to HTML Document:
void mozCancelFullScreen() : exits the document from full-screen mode. This dispatches a "mozfullscreenchange" event to the document containing the (now former) full-screen element. Note that the "mozfullscreenchange" event which is dispatched when you enter full-screen is targeted at the full-screen element, so if you want to receive the "mozfullscreenchange" on both entering and exiting full-screen in the same listener you should add your listener to the document, rather than the full-screen element.
readonly attribute boolean mozFullScreen : true when the document is in full-screen mode.
readonly attribute Element mozFullScreenElement : reference to the current full-screen element.
readonly attribute boolean mozFullScreenEnabled : returns true if calls to mozRequestFullScreen() would be granted in the current document. This returns false if there are any windowed plugins present in any document/iframe in the current page, or if any iframes containing this document don't have the mozallowfullscreen attribute present, or if the user has disabled the API by preference. If this returns false you may want to not show the user your enter-full-screen button in your page, since you know it won't work!
We also added the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.
We added the mozallowfullscreen attribute to iframe elements. Without this, full-screen requests made by script in the iframe's content (i.e embedded ads, or a YouTube player in an iframe for that matter) will be denied.
While in full-screen mode, the user can press the ESC key (or F11) to exit. Alpha-numeric keyboard input while in full-screen mode causes a warning message to pop-up to guard against phishing attacks. The only key input which doesn't cause the warning message to pop up are: left, right, up, down, space, shift, control, alt, page up, page down, end, home, tab, and meta.
Navigating, changing tab, changing app (ALT+TAB) while in full-screen mode will cause full-screen mode to exit.
The code for that button's onclick handler is simply: document.getElementById('bruce_video').mozRequestFullScreen();
How is Firefox's full-screen API different from Webkit/Chrome/Safari's full-screen API? Firefox's API adds a "width: 100%; height: 100%;" CSS rule to the element which requests full-screen, so that it's stretched to occupy the entire screen. Chrome's API does not do this, but instead it centers the full-screen element in the window and blacks-out the underlying webpage. So the full-screen element won't occupy the entire screen with Chrome's API unless you specify a "width: 100%; height: 100%;" rule yourself. Conversely if you want to vertically and horizontally center something while in full-screen with Firefox's API, you need to make the containing element of your desired centered element full-screen instead, and apply CSS rules to vertically and horizontally center the contained element.
We have implemented a general purpose full-screen API which can make any HTML element the full-screen element (it seems WebKit based browsers' full-screen API allow only making <video> elements full-screen).
This feature makes the following API changes to HTML Element:
void mozRequestFullScreen() : makes an HTML element the full-screen element. Causes browser chrome to hide, and expands the element to encompass the entire screen. Upon success, this dispatches a "mozfullscreenchange" event to the requesting full-screen element, or the element's owner document if the element is not in a document. We only grant requests for full-screen when running in user-generated event handlers, e.g. a mouse click handler.
This feature makes the following API changes to HTML Document:
void mozCancelFullScreen() : exits the document from full-screen mode.
readonly attribute mozFullScreen : true when the document is in full-screen mode.
readonly attribute mozFullScreenElement : reference to the current full-screen element, if it's in the current document.
This feature adds the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.
For a request for full-screen to be granted in content inside an iframe, the containing iframe needs to have the mozallowfullscreen attribute present. This is a boolean attribute, so the attribute only needs to be present, it doesn't matter what value it's set to.
Keyboard input is restricted in full-screen mode. When alpha-numeric key input occurs in full-screen mode, full-screen mode immediately exits. This is to help protect against phishing attacks.
We also plan to deny requests for full-screen mode when windowed plugins are present (since we can't easily monitor key events to windowed plugins on non-MacOSX platforms). We will exit full-screen mode when a windowed plugin is added to a document as well. I have a patch for this, but its dependencies haven't landed yet.
Work remaining to be done before this can be enabled:
Making the full-screen API work in multi-process Firefox/Fennec (bug 684620). This requires a way of getting the PBrowserParent from C++ in the chrome process to be implemented, there's not a way to do that yet unfortunately.
Make change/open tab cause full-screen mode to exit (bug 685402).
A security review must be completed, and concerns raised there must be addressed. This could involve changing the API.
ATS has record types which are like tuples but each item is referenced via a name. They closely approximate C structs and in the generated C code are represented in this way. The following uses a record to hold x and y values representing a point on a 2D plane:
fun print_point (p: @{x= int, y= int}): void =
printf("%d@%d\n", @(p.x, p.y))
implement main () = {
val p1 = @{x=10, y=20}
val () = print_point (p1)
}
A literal record object in this example is created using the @{ ... } syntax. Dereferencing record fields is done using ’.’. The generated C code shows that the representation for the record is a C struct:
The @ syntax used for the record literal and type marks the record as a ‘flat’ record. Flat records have value semantics and variables of this type have a size in bytes equivalent to the size of the underlying C structure. This is shown by the generated C code for the main function above:
Here the variable tmp4 is the p1 in our ATS code. It is an instance of the C struct representing the record and is created on the stack. It is initialized and passed to the print_point function by value:
Records can also be defined using a '{...} syntax (note the ' instead of @) for boxed records which are heap allocated and the memory managed by the garbage collector. Boxed records have pointer semantics and have a size in bytes equivalent to the size of a pointer:
implement main () = {
val x = sizeof<@{x=int}>
val y = sizeof<'{x=int}>
val a = int_of_size x
val b = int_of_size y
val () = printf ("%d %d\n", @(a, b))
}
This outputs “4 8” showing the flat type as having the size of an int and the boxed type having the size of a pointer. Most usage of records in ATS I’ve done is using flat records as I tend to avoid the use of the garbage collector.
To pass a reference to a flat record so it can be modified by a function you need to mark the function argument as ‘by reference’ using &:
typedef point = @{x= int, y= int}
fun print_point (p: point): void =
printf("%d@%d\n", @(p.x, p.y))
fun add1 (p: &point): void = {
val () = p.x := p.x + 1
val () = p.y := p.y + 1
}
implement main () = {
var p1 = @{x=10, y=20}
val () = add1 (p1)
val () = print_point (p1)
}
In this example I use typedef to create a type alias for the record so I can refer to the type as point. The add1 function takes a point by reference (as indicated by the & prefix). This works like C++ reference arguments. The function effectively takes a pointer to an instance of the struct and can modify the instance passed to the function. For this to work the point passed to the function must be an lvalue. That is, it must be mutable. This is done in ATS by making it a var vs a val.
Note that it’s a type error to create an uninitialized point object and pass it to add1. For example, the following gives a type check error:
implement main () = {
var p1: point?
val () = add1 (p1)
val () = print_point (p1)
}
Types that are unintialized have a ? suffix added to it. The type of p1, due to it being unintialized, is point?. Since add1 takes a point this fails type checking. Initializing it allows it to pass:
implement main () = {
var p1: point
val () = p1.x := 5
val () = p1.y := 10
val () = add1 (p1)
val () = print_point (p1)
}
As well as passing by reference to functions you can pass a pointer and deal with the pointer management directly. This requires using the proof system and I hope to go through dealing with pointers and their associated proofs in a later post:
fun add1 {l:agz} (pf: !point @ l | p: ptr l): void = {
val () = p->x := p->x + 1
val () = p->y := p->y + 1
}
implement main () = {
var p1 = @{x=10, y=20}
val () = add1 (view@ p1 | &p1)
val () = print_point (p1)
}
When interfacing with C API’s you often have to deal with C structures. In ATS you can declare a type as being defined as a struct in C with a matching ATS record definition so that ATS can be used to access the struct. The following example shows a struct declared in C, a function that uses it in C, and how this is wrapped in ATS:
%{^
typedef struct Point {
int x;
int y;
} Point;
void print_point (Point* p) {
printf("%d@%d\n", p->x, p->y);
}
%}
typedef point = $extype_struct "Point" of {x= int, y= int}
extern fun print_point (p: &point): void = "mac#print_point"
implement main () = {
var p1: point
val () = p1.x := 10;
val () = p1.y := 20;
val () = print_point (p1)
}
The $extype_struct keyword creates a type that is represented by a C struct with the given name. By using the of {x= int, y= int} suffix we define the record layout as seen via ATS. This will stop ATS from creating its own structure to map the type and instead uses the C structure. The generated C code for the main function looks like:
Earlier this year I posted about how I created my git mirror of the mozilla-central mercurial repository. I’ve been keeping a fork on github updated regularly since then that a number of people have started using.
Users of my mirror have wanted to be able to keep up to date with the mercurial mozilla-central repository themselves, or add updates from other branches of the Mozilla source. It takes a large amount of time to start a conversion from scratch but it’s possible to start from the existing git mirror, perform the incremental updates from mercurial yourself, and still stay compatible with the incremental updates I push to my github mirror.
hg-git uses a git-mapfile in the .hg directory of the mercurial clone to keep track of the mapping between mercurial and git commits. By using the same git-mapfile that I use for my mirror you can start your own incremental update from the latest data in the git-mapfile instead of from the beginning. I keep an up-to-date git-mapfile in git-mapfile.bz2. It’s compressed with bzip2 as the uncompressed version is quite large.
Clone my git mirror as a bare repository in .hg/git
Place the git-mapfile in .hg
Do an hg bookmark -f -r default master to mark the commit to convert to
Perform hg gexport to update .hg/git with recent mercurial commits
The .hg/git directory should now be up-to-date with respect to the mercurial repository. And the additional git commits will have the same SHA id as any commits I push into my mirror when I perform my own update.
The steps to do an incremental update are the normal:
Pull from the mercurial mozilla-central repository
Run hg bookmark -f -r default master
Run hg gexport
Push or pull to/from .hg/git as needed
As a working example, the following shell commands on a Linux system should set up your own repository ready for incremental updating:
Note that I use hg gexport and push from the .hg/git repository using the git command instead of doing an hg push and relying on hg-git to do the conversion and push. In my original article I did the latter. The hg-git push method uses slower python based routines to do the push which can take a long time and large amounts of memory on big repositories like mozilla-central. By splitting this up into gexport and using the native git command I save a lot of time and memory.
Update 2011-11-08 - The steps to build Rust have changed since I wrote this post, there’s now no need to build LLVM - it’s included as a submodule in the Rust git repository and will automatically be cloned and built.
A while ago I wrote a quick look at Rust post, describing how to build it and run some simple examples. Since that post the bootstrap compiler has gone away and the rustc compiler, written in Rust, is the compiler to use.
The instructions for building Rust in the wiki are good but I’ll briefly go through how I installed things on 64-bit Linux. The first step is to build the required version of LLVM from the LLVM source repository. I use a git mirror and install to a local directory so as not to clash with other programs that use older LLVM versions.
Note the use of prefix to ensure this custom LLVM build is a local install. I also do a git reset to go to the commit for SVN revision 142082 which is the current version that works with Rust according to the wiki. After installation add the install bin to the PATH and lib to LD_LIBRARY_PATH:
$ git clone git://github.com/graydon/rust
$ mkdir build
$ cd build
$ ../rust/configure
$ make
The build process downloads an existing build for your platform to bootstrap from. It uses this to build a stage1 version of the compiler. This stage1 is used to build a stage2, and that is then used to build a stage3. All the built compilers should work exactly the same if things are working correctly. The compiler can be run within the build directory, or outside it if you put the stage3/bin directory in the path:
$ export PATH=~/path/to/build/stage3/bin:$PATH
$ rustc
error: No input filename given.
Test with a simple ‘hello world’ program:
$ cat hello.rs
use std;
import std::io::print;
fn main () {
print ("hello\n");
}
$ rustc hello.rs
$ ./hello
hello
ATS allows overloading of functions where the function that is called is selected based on number of arguments and the type of the arguments. The following example shows overloading based on type to provide a generic print function:
symintr myprint
fun myprint_integer (a: int) = printf("%d\n", @(a))
fun myprint_double (a: double) = printf("%f\n", @(a))
fun myprint_string (a: string) = printf("%s\n", @(a))
overload myprint with myprint_integer
overload myprint with myprint_double
overload myprint with myprint_string
implement main() = {
val () = myprint ("hello")
val () = myprint (10)
val () = myprint (20.0)
}
The keyword symintr introduces a symbol that can be overloaded. The keyword overload will overload that symbol with an existing function. In this example we overload with three functions that take different types. The overload resolution is performed at compile time. The actual C code generated for the main function includes:
The ATS standard prelude includes a print function that is overloaded in this manner for most of the standard types. One downside to the way overloading works is the overload resolution sometimes fails in template functions. The following code gives a compile error for example:
fun {a:t@ype} printme (x: a) = print(x)
implement main() = {
val () = printme (10)
val () = printme (20.0)
}
The error given is that the symbol print cannot be resolved. The ATS compiler attempts to resolve the overload by looking up the t@ype sort. There is no overload for this so the resolution fails. This can be worked around using a template function to call the overloaded function and partially specialize the implementation of the new template function. The following code demonstrates this:
extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)
fun {a:t@ype} printme (x: a):void = gprint<a>(x)
implement main() = {
val () = printme (10)
val () = printme (20.0)
}
The print symbol can be overloaded with the new gprint function to allow print to be called over t@ype sorts:
extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)
overload print with gprint
fun {a:t@ype} printme (x: a) = print(x)
implement main() = {
val () = printme (10)
val () = printme (20.0)
}
This example is contrived in that you could just specialize printme but in real world code this issue comes up occasionally. The most common example for me has been using = in a template function, comparing arguments that are template type parameters. = is an overloaded function and the overload lookup fails in the same manner as above. A workaround is to create an equals template function specialized over the types you plan to compare as above.
The Open Video Conference that took place on 10-12 September was so overwhelming, I’ve still not been able to catch my breath! It was a dense three days for me, even though I only focused on the technology sessions of the conference and utterly missed out on all the policy and content discussions.
Roughly 60 people participated in the Open Media Software (OMS) developers track. This was an amazing group of people capable and willing to shape the future of video technology on the Web:
HTML5 video developers from Apple, Google, Opera, and Mozilla (though we missed the NZ folks),
codec developers from WebM, Xiph, and MPEG,
Web video developers from YouTube, JWPlayer, Kaltura, VideoJS, PopcornJS, etc.,
content publishers from Wikipedia, Internet Archive, YouTube, Netflix, etc.,
open source tool developers from FFmpeg, gstreamer, flumotion, VideoLAN, PiTiVi, etc,
and many more.
To provide a summary of all the discussions would be impossible, so I just want to share the key take-aways that I had from the main sessions.
Tim Terriberry (Mozilla), Serge Lachapelle (Google) and Ethan Hugg (CISCO) moderated this session together (slides). There are activities both at the W3C and at IETF – the ones at IETF are supposed to focus on protocols, while the W3C ones on HTML5 extensions.
The current proposal of a PeerConnection API has been implemented in WebKit/Chrome as open source. It is expected that Firefox will have an add-on by Q1 next year. It enables video conferencing, including media capture, media encoding, signal processing (echo cancellation etc), secure transmission, and a data stream exchange.
Current discussions are around the signalling protocol and whether SIP needs to be required by the standard. Further, the codec question is under discussion with a question whether to mandate VP8 and Opus, since transcoding gateways are not desirable. Another question is how to measure the quality of the connection and how to report errors so as to allow adaptation.
What always amazes me around RTC is the sheer number of specialised protocols that seem to be required to implement this. WebRTC does not disappoint: in fact, the question was asked whether there could be a lighter alternative than to re-use dozens of years of protocol development – is it over-engineered? Can desktop players connect to a WebRTC session?
We are already in a second or third revision of this part of the HTML5 specification and yet it seems the requirements are still being collected. I’m quietly confident that everything is done to make the lives of the Web developer easier, but it sure looks like a huge task.
Zohar Babin (Kaltura) and myself moderated this session and I must admit that this session was the biggest eye-opener for me amongst all the sessions. There was a large number of Flash developers present in the room and that was great, because sometimes we just don’t listen enough to lessons learnt in the past.
This session gave me one of those aha-moments: it the form of the Flash appendBytes() API function.
The appendBytes() function allows a Flash developer to take a byteArray out of a connected video resource and do something with it – such as feed it to a video for display. When I heard that Web developers want that functionality for JavaScript and the video element, too, I instinctively rejected the idea wondering why on earth would a Web developer want to touch encoded video bytes – why not leave that to the browser.
But as it turns out, this is actually a really powerful enabler of functionality. For example, you can use it to:
display mid-roll video ads as part of the same video element,
sequence playlists of videos into the same video element,
implement DVR functionality (high-speed seeking),
do mash-ups,
do video editing,
adaptive streaming.
This totally blew my mind and I am now completely supportive of having such a function in HTML5. Together with media fragment URIs you could even leave all the header download management for resources to the Web browser and just request time ranges from a video through an appendBytes() function. This would be easier on the Web developer than having to deal with byte ranges and making sure that appropriate decoding pipelines are set up.
Philip Jagenstedt (Opera) and myself moderated this session. We focused on the HTML5 track element and the WebVTT file format. Many issues were identified that will still require work.
One particular topic was to find a standard means of rendering the UI for caption, subtitle, und description selection. For example, what icons should be used to indicate that subtitles or captions are available. While this is not part of the HTML5 specification, it’s still important to get this right across browsers since otherwise users will get confused with diverging interfaces.
Chaptering was discussed and a particular need to allow URLs to directly point at chapters was expressed. I suggested the use of named Media Fragment URLs.
The use of WebVTT for descriptions for the blind was also discussed. A suggestion was made to use the voice tag <v> to allow for “styling” (i.e. selection) of the screen reader voice.
Finally, multitrack audio or video resources were also discussed and the @mediagroup attribute was explained. A question about how to identify the language used in different alternative dubs was asked. This is an issue because @srclang is not on audio or video, only on text, so it’s a missing feature for the multitrack API.
Beyond this session, there was also a breakout session on WebVTT and the track element. As a consequence, a number of bugs were registered in the W3C bug tracker.
This session was moderated by John Luther and John Koleszar, both of the WebM Project. They started off with a presentation on current work on WebM, which includes quality testing and improvements, and encoder speed improvement. Then they moved on to questions about how to involve the community more.
The community criticised that communication of what is happening around WebM is very scarce. More sharing of information was requested, including a move to using open Google+ hangouts instead of Google internal video conferences. More use of the public bug tracker can also help include the community better.
Another pain point of the community was that code is introduced and removed without much feedback. It was requested to introduce a peer review process. Also it was requested that example code snippets are published when new features are announced so others can replicate the claims.
This all indicates to me that the WebM project is increasingly more open, but that there is still a lot to learn.
This session was moderated by Frank Galligan and Aaron Colwell (Google), and Mark Watson (Netflix).
Mark started off by giving us an introduction to MPEG DASH, the MPEG file format for HTTP adaptive streaming. MPEG has just finalized the format and he was able to show us some examples. DASH is XML-based and thus rather verbose. It is covering all eventualities of what parameters could be switched during transmissions, which makes it very broad. These include trick modes e.g. for fast forwarding, 3D, multi-view and multitrack content.
MPEG have defined profiles – one for live streaming which requires chunking of the files on the server, and one for on-demand which requires keyframe alignment of the files. There are clear specifications for how to do these with MPEG. Such profiles would need to be created for WebM and Ogg Theora, too, to make DASH universally applicable.
Further, the Web case needs a more restrictive adaptation approach, since the video element’s API is already accounting for some of the features that DASH provides for desktop applications. So, a Web-specific profile of DASH would be required.
Then Aaron introduced us to the MediaSource API and in particular the webkitSourceAppend() extension that he has been experimenting with. It is essentially an implementation of the appendBytes() function of Flash, which the Web developers had been asking for just a few sessions earlier. This was likely the biggest announcement of OVC, alas a quiet and technically-focused one.
Aaron explained that he had been trying to find a way to implement HTTP adaptive streaming into WebKit in a way in which it could be standardised. While doing so, he also came across other requirements around such chunked video handling, in particular around dynamic ad insertion, live streaming, DVR functionality (fast forward), constraint video editing, and mashups. While trying to sort out all these requirements, it became clear that it would be very difficult to implement strategies for stream switching, buffering and delivery of video chunks into the browser when so many different and likely contradictory requirements exist. Also, once an approach is implemented and specified for the browser, it becomes very difficult to innovate on it.
Instead, the easiest way to solve it right now and learn about what would be necessary to implement into the browser would be to actually allow Web developers to queue up a chunk of encoded video into a video element for decoding and display. Thus, the webkitSourceAppend() function was born (specification).
The proposed extension to the HTMLMediaElement is as follows:
partial interface HTMLMediaElement {
// URL passed to src attribute to enable the media source logic.
readonly attribute [URL] DOMString webkitMediaSourceURL;
bool webkitSourceAppend(in Uint8Array data);
// end of stream status codes.
const unsigned short EOS_NO_ERROR = 0;
const unsigned short EOS_NETWORK_ERR = 1;
const unsigned short EOS_DECODE_ERR = 2;
void webkitSourceEndOfStream(in unsigned short status);
// states
const unsigned short SOURCE_CLOSED = 0;
const unsigned short SOURCE_OPEN = 1;
const unsigned short SOURCE_ENDED = 2;
readonly attribute unsigned short webkitSourceState;
};
The code is already checked into WebKit, but commented out behind a command-line compiler flag.
Frank then stepped forward to show how webkitSourceAppend() can be used to implement HTTP adaptive streaming. His example uses WebM – there are no examples with MPEG or Ogg yet.
The chunks that Frank’s demo used were 150 video frames long (6.25s) and 5s long audio. Stream switching only switched video, since audio data is much lower bandwidth and more important to retain at high quality. Switching was done on multiplexed files.
Every chunk requires an XHR range request – this could be optimised if the connections were kept open per adaptation. Seeking works, too, but since decoding requires download of a whole chunk, seeking latency is determined by the time it takes to download and decode that chunk.
Similar to DASH, when using this approach for live streaming, the server has to produce one file per chunk, since byte range requests are not possible on a continuously growing file.
Frank did not use DASH as the manifest format for his HTTP adaptive streaming demo, but instead used a hacked-up custom XML format. It would be possible to use JSON or any other format, too.
After this session, I was actually completely blown away by the possibilities that such a simple API extension allows. If I wasn’t sold on the idea of a appendBytes() function in the earlier session, this one completely changed my mind. While I still believe we need to standardise a HTTP adaptive streaming file format that all browsers will support for all codecs, and I still believe that a native implementation for support of such a file format is necessary, I also believe that this approach of webkitSourceAppend() is what HTML needs – and maybe it needs it faster than native HTTP adaptive streaming support.
This session was moderated by Zachary Ozer and Pablo Schklowsky (JWPlayer). Their motivation for the topic was, in fact, also HTTP adaptive streaming. Once you leave the decisions about when to do stream switching to JavaScript (through a function such a wekitSourceAppend()), you have to expose stream metrics to the JS developer so they can make informed decisions. The other use cases is, of course, monitoring of the quality of video delivery for reporting to the provider, who may then decide to change their delivery environment.
The discussion found that we really care about metrics on three different levels:
measuring the network performance (bandwidth)
measuring the decoding pipeline performance
measuring the display quality
In the end, it seemed that work previously done by Steve Lacey on a proposal for video metrics was generally acceptable, except for the playbackJitter metric, which may be too aggregate to mean much.
I didn’t actually attend this session held by Anant Narayanan (Mozilla), but from what I heard, the discussion focused on how to manage permission of access to video camera, microphone and screen, e.g. when multiple applications (tabs) want access or when the same site wants access in a different session. This may apply to real-time communication with screen sharing, but also to photo sharing, video upload, or canvas access to devices e.g. for time lapse photography.
This was another session that I wasn’t able to attend, but I believe the creation of good open source video editing software and similar video creation software is really crucial to giving video a broader user appeal.
Jeff Fortin (PiTiVi) moderated this session and I was fascinated to later see his analysis of the lifecycle of open source video editors. It is shocking to see how many people/projects have tried to create an open source video editor and how many have stopped their project. It is likely that the creation of a video editor is such a complex challenge that it requires a larger and more committed open source project – single people will just run out of steam too quickly. This may be comparable to the creation of a Web browser (see the size of the Mozilla project) or a text processing system (see the size of the OpenOffice project).
Jeff also mentioned the need to create open video editor standards around playlist file formats etc. Possibly the Open Video Alliance could help. In any case, something has to be done in this space – maybe this would be a good topic to focus next year’s OVC on?
Monday’s Breakout Groups
The conference ended officially on Sunday night, but we had a third day of discussions / hackday at the wonderful New York Lawschool venue. We had collected issues of interest during the two previous days and organised the breakout groups on the morning (Schedule).
In the Content Protection/DRM session, Mark Watson from Netflix explained how their API works and that they believe that all we need in browsers is a secure way to exchange keys and an indicator of protection scheme is used – the actual protection scheme would not be implemented by the browser, but be provided by the underlying system (media framework/operating system). I think that until somebody actually implements something in a browser fork and shows how this can be done, we won’t have much progress. In my understanding, we may also need to disable part of the video API for encrypted content, because otherwise you can always e.g. grab frames from the video element into canvas and save them from there.
In the Playlists and Gapless Playback session, there was massive brainstorming about what new cool things can be done with the video element in browsers if playback between snippets can be made seamless. Further discussions were about a standard playlist file formats (such as XSPF, MRSS or M3U), media fragment URIs in playlists for mashups, and the need to expose track metadata for HTML5 media elements.
What more can I say? It was an amazing three days and the complexity of problems that we’re dealing with is a tribute to how far HTML5 and open video has already come and exciting news for the kind of applications that will be possible (both professional and community) once we’ve solved the problems of today. It will be exciting to see what progress we will have made by next year’s conference.
Thanks go to Google for sponsoring my trip to OVC.
The group has been created to work on many aspects of video text tracks of which captioning and the WebVTT format are key parts.
The main reason behind creating this group is to create a forum at the W3C for working on WebVTT to allow all browsers to support this format and be involved in its development.
We’ve not gone the full way to creating a Working Group, although that was the initial intention. We had objections from W3C members for going down that path, so are using the CG path for now.
This is actually a good thing because CGs are open for anyone to join, while WGs are only open to W3C members. The key difference is that specs coming out of WGs can become RECs (“standards”), while CG’s specs cannot.
If we eventually see a need to move WebVTT to a REC, that move will be straight forward, since there is a clear path for work to transition from a CG to a WG.
Curious about any new requirements that the TV community may have for HTML5 video, I attended the W3C Web and TV Workshop in Hollywood last week. It’s already the third of its kind and was also the largest to date showing an increasing interest of the TV community to converge with the Web community.
The Workshop Aim
I went into the Workshop not quite knowing what to expect. My previous contact with members of this community was restricted to email exchanges on the W3C Web and TV Interest Group (IG) mailing list. I knew there was some interest in video accessibility (well: particularly captions) and little knowledge of existing HTML5 specifications around text tracks and why the browsers were going with WebVTT. So I had decided to attend the workshop to get a better understanding of the community, it’s background, needs, and issues, and to hopefully teach some of the ways of HTML5. For that reason I had also submitted a WebVTT presentation/demo.
As it turned out, the workshop had as its key target the facilitation of communication between the TV and the HTML5 community. The aim was to identify features that need to be added to the HTML5 video element to satisfy the needs of the TV community. I obviously came to the right workshop.
The process that is being used by the W3C in the Interest Group is to have TV community members express their needs, then have HTML5 experts express how these needs can be satisfied with existing HTML5 features, then make trial implementations and identify any shortcomings, then move forward to progress these through HTML5 or HTML.next. This workshop clearly focused on the first step: expressing needs.
Often times it was painful for me to watch presenters defending their requirements and trying to impress on the audience how important a certain feature is to them when that features actually already has a HTML5 specification, but just not yet a browser implementations. That there were so few HTML5 video experts present and that they were given very little space to directly reply to the expressed needs and actually explain what is already possible (or specified to be possible) was probably one of the biggest drawbacks of the workshop.
To be fair, detailed technical discussions were not possible in a room with 150 attendees with a panel sitting at the front discussing topics and taking questions. Solving a use case with existing HTML5 markup and identifying the gaps requires smaller break-out groups of a maximum of maybe 20 people and sufficient HTML5 knowledge in the room. Ultimately they require a single person to try to implement it using JavaScript alone, and, failing that, writing browser extensions. Only such code actually proves that a feature is missing.
Now, the video features of HTML5 are still continuing to change almost on a daily basis. Much development is, for example, happening around real-time communication features and around the track element as we speak. So, focusing on further requirements finding around HTML5 video for now is probably a good thing.
The TV Community Approach
Before I move on to some of the topics covered by the workshop, I have to express some concern about the behaviour that I observed with lots of the TV community folks. Many people tried pushing existing solutions from other spaces into the Web unchanged with a claim of not re-inventing the wheel and following paved cowpaths, which are some of the underlying design principles for HTML5. I can understand where such behaviour originates thinking that having solved the same problems elsewhere before, those solutions should apply here, too. But I would like to warn people of this approach.
If we blindly apply solutions that were not developed for HTML5 into HTML we will end up with suboptimal solutions that will hurt us further down the track. The principles of not re-inventing the wheel and following paved cowpaths were introduced for features that were already implemented by browsers or in de-facto standard use by JavaScript libraries. They were not created for new features in HTML. The video element is a completely new feature in HTML thus everything around it is new.
I would therefore like to see some more respect given to HTML5 and the complexities involved in finding the best possible technical solutions for the Web given that the video element does not stand alone in HTML5, but is part of a much larger picture of technical capabilities on the Web where many of the requested features for TV applications may already be solved by existing HTML markup that is not part of the video element.
Also, HTML5 is not just about the HTML markup, but also about CSS and JavaScript and HTTP. There are several layers of technology involved in creating a Web application: not only a separation of work between client and servers, but also between the Operating System, the media framework, the browser, browser plugins, and JavaScript has to be balanced. To get this balance right is a fine art that will take many discussion, many experiments and sometimes several design approaches. We need patience and calm to work through this, not a rushed adoption of existing solutions from other spaces.
Session 1 / Content Provider and Consumer Perspective:
The sessions participants postulate that we will see the creation of application stores for TV applications similar to how we have experienced this for mobile phones and tablets. People enjoy collecting apps like they collect badges. Right now, the app store domain is dominated by native apps and now Web apps. The reason is that we haven’t got a standard platform for setting up Web app stores with Web apps that work in all browsers on all operating systems. Thus, developers have to re-deploy their app for many environments.
While essentially an orthogonal need to HTML standardisation, this seems to be one of the key issues that keep Web apps back from making big market inroads and W3C may do well in setting up a new WG to define a standard Web app manifest format and JS APIs.
Session 2+3 / Multi-screen TV in the Home Network:
Several technologies of hybrid TV broadcast and set-top-box Web content delivery were being pointed out, including the European HbbTV and the Japanese Hybridcast, the latter of which gave an in-depth demo.
Web purists would probably say that it would be simpler to just deliver all content over the Web and not have to worry about any further technical challenges encountered by having to synchronize content received via two vastly different delivery mechanisms. I personally believe this development is one of business models: we don’t yet know exactly how to earn money from TV content delivered over the Internet, but we do know how to do so with TV content. So, hybrids allow the continuation of existing income streams while allowing the features to be augmented with those people enjoy from the Internet.
Should requirements that emerge from such a use case for HTML5 video be taken seriously? I think they absolutely should. What I see happening is that a new way of using the Web is starting to emerge. The new way is video-focused rather than text-focused. We receive our Web content by watching video programming online – video channels, not Web pages are the core content that we consume in the living room. Video channels are where we start our browsing experience from. Search may still be our first point of call, but it will be search for video content or a video-centric app rather than search for a Web site.
And it will be a matter of many interconnected devices in the house that contribute to the experience: the 5.1 stereos that are spread all over the house and should receive our video’s sound, the different screens in the different areas of our house between which we move around, and remote controls, laptops or tablets that function as remote controls and preview stations and are used to determine our viewing experience and provide a back-channel to the publishers.
We have barely begun to identify how such interconnected devices within a home fit within the server-client-based view of the Web world, and the new Web Sockets functionality. The Home Networking Task Force of the Web and TV IG is looking at the issues and analysing existing protocols and standards that solve this picture. But I have a gnawing feeling that the best solution will be something new that is more Web-specific and fits better with the technology layers of the Web.
Session 4 / Synchronized Metadata:
The TV environment offers many data services, some of which have been legally prescribed. This session analysed TV needs and how they can be satisfied with current HTML5.
Subtitles and closed captioning support are one of the key requirements that have been legally prescribed to allow for equal access of non-native speakers, and blind and vision-impaired users to TV content. After demonstration of some key features defined into the HTML5 track element and the WebVTT format, it was generally accepted that HTML5 is making big progress in this space, in particular that browsers are in the process of implementing support for the track element. A concern still exists for complete coverage of all the CEA-608/708 features in WebVTT.
Further concern was raised for support of audio descriptions and audio translations, in particular since no browser has as yet committed to implementing the HTML5′s media multitrack API with the @mediagroup attribute. In this context I am excited to see first JavaScript polyfills emerge (see captionator.js & mediagroup.js).
Another concern was that many captions are actually delivered as raster images (in particular DVD captions) and how that would work in the Web context. The proposal was to use WebVTT and encode the raster images as data-URIs included in timed cues, then render them by JavaScript as an overlay. This is something to explore further.
Demos were shown using WebVTT to synchronize ads with videos, to display related metadata from a user’s life log with videos, to display thumbnails along a video’s timeline, and to show the rendering of text descriptions through screen readers. General agreement by the panel was that WebVTT offers many opportunities and that this area will continue to need further development and that we will see new capabilities on the Web around metadata that were not previously possible on TV.
Session 5 / Content Format and Codecs: DASH and Codec standards
The introduction of HTTP adaptive streaming into HTML5 was one of the core issues that kept returning in the discussions. This panel focused on MPEG DASH, but also mentioned the need for programmatic implementation of adaptive streaming functionality.
The work around MPEG DASH would require specifications of how to use DASH with WebM and Ogg Theora, as well as a specification of a HTML5 profile for DASH, which would limit the functionality possible in DASH files to the ones needed in a HTML5 video element. One criticism of DASH was its verbosity. Another was its unclear patent position. Panel attendees with included Qualcomm, Apple and Microsoft made very clear that their position is pro a royalty-free use of DASH.
The work around a programmatic implementation for adaptive streaming would require at least a JavaScript API to measure the quality of service of a presented video element and a JavaScript API to feed the video element with chunks of (encrypted) video content on the fly. Interestingly enough, there are existing experiments both around Video metrics and MediaSource extensions, so we can expect some progress in this space, even if these are not yet a strong focus of the HTML WG.
I would personally support the creation of Community Group at the W3C around HTTP adaptive streaming and DASH. I think it would work towards alleviating the perceived patent issues around DASH and allow the right members of the community to participate in preparing a specification for HTML5 without requiring them to become W3C members.
Session 6 / Content Protection and DRM
A core concern of the TV community is around content protection. The requirements in this space seem, however, very confused.
The key assumption here is that Web browsers should support the decoding of DRM-protected content in the HTML5 video element because the video element provides a desirable JavaScript API, accessibility features (the track element), default controls, and the possibility to synchronize multiple media elements. However, at the same time, the video element is part of the core content of a Web page and thus allows direct access to the image content in a canvas etc, so some of its functionality is not desirable.
The picture is further confused by requests for authentication, authorization, encryption, obfuscation, same-origin, secure transmission, secure decryption key delivery, unique content identification and other “content protection” techniques without a clear understanding of what is already possible on the Web and what requirements to content publishers actually have for delivering their content on the Web. This is further complicated by the fact that there are many competing solutions for DRM systems in the market with no clear standard that all browsers could support.
A thorough analysis of the technologies and solutions available in this space as well as an analysis of the needs for HTML5 is required before it becomes clear what solution HTML5 browsers may need to support. There seemed to be agreement in the group, though, that browsers would not need to implement DRM solutions, but rather only hand through the functionality of the platform on which they are running (including the media frameworks and operating system functionalities). How this is supposed to work was, however, unclear.
Session 7 / Web & TV: Additional Device & User Requirements
This was a catch-all session for topics that had not been addressed in other sessions. Among the topics addressed in this group were:
Parental Guidance: how to deal with ratings in an internationally inconsistent ratings landscape, how to deliver the ratings with the content, and how to enforce the viewing restrictions
Emergency Notifications: how to replicate on the Web the emergency notification functionality of TV by providing text overlays to alert users
TV channels: how to detect what channels of programming are available to users
Overall, the workshop was a worthwhile experience. It seems there is a lot of work still ahead for making HTML5 video the best it can be on the Web.
Tsuru Capital is a small company. We build our internal systems for live trading and offline analysis in Haskell, and we're proud to be sponsoring ICFP 2011. We use iteratees throughout our systems, and have actively encouraged all our staff to contribute changes upstream and participate in community design discussions. By being part of the open source community and taking part in peer-review, we all end up with better software.
Over time various Tsuru staff members have worked on tools using iteratees, including (grepping the CONTRIBUTORS files): Bryan Buecking, Michael Baikov, Elliott Pace, Conrad Parker, Akio Takano, and Maciej Wos. There's been some lively discussions and many small patches providing functions that we use in production every day.
Last year Conal Elliott provided some mentoring to Tsuru staff, during which we worked through a denotational semantics for iteratees. This resulted in discussions on both the iteratee project list and haskell-cafe about Semantics of iteratees, enumerators, enumeratees.
By using iteratees in production we've contributed various simple but practical functions, including:
enumFdFollow, an enumerator (data source) which allows you to process the growing tail of a log file as it is being written.
ioIter, an iteratee that uses an IO action to determine what to do. Typically this is action involves some user interaction, such as a user issuing commands like play/pause/next/prev.
ListLike functions last (an iteratee that efficiently returns the last element of a stream), mapM_ and foldM.
mapChunksM_, a more efficient version of mapM_ that operates on the underlying chunks, eg. logger = mapChunksM_ (liftIO . print).
takeWhile, and its enumeratee variant takeWhileE
endianRead8, an iteratee for reading 64bit values with a given endianness. I've used this in ght as well as an internal project.
Stream conversion We've done quite a bit of work on stream conversion, as we use a few different layers of data processing. The iteratee architecture allows you to isolate the data source, conversion and processing functions; much of what we've worked on involves ensuring the converters (enumeratees) can control or translate control messages, so that commands like "seek" do not get lost. We've also built combinators to simplify the task of creating new stream converters.
convStateStream, which converts one stream into another while continually updating an internal state. Importantly for variable bitrate binary data, it can produce elements of the output stream from data that spans stream chunks.
(>) and (<). These allow stream converters to be composed without rewriting boilerplate. Jon Lato gives a good example using these in the StackOverflow answer to Attoparsec Iteratee.
zip, zip[345], sequence_ for using multiple iteratees to process a single stream instance, and (for zip*) collecting the results.
eneeCheckIfDone*: This family of functions (eneeCheckIfDoneHandle, eneeCheckIfDonePass, eneeCheckIfDoneIgnore) can be used with
unfoldConvStreamCheck to make a version of unfoldConvStream which respects seek messages.
Parallel stream processing We often want to do multiple unrelated analysis tasks on a data stream. Whereas sequence_ takes a list of iteratees to run simultaneously and handles each input chunk by mapM across that list, psequence_ runs each input iteratee in a separate forkIO thread. For a real-world example, see Michael Baikov's post about psequence, psequence_, parE, parI.
Thanks
Thanks to John Lato for consistently and reliably maintaining the iteratee package, providing thoughtful feedback and graciously suggesting improvements.
I just got the news today that LCA 2011 has accepted my talk proposal: "Opus, the Swiss Army Knife of Audio Codecs". I'll be presenting it in Ballarat, Australia in January. If there's any specific topic you'd like me to include in the talk, please let me know (by email or comment on this post).
Since yesterday, the IETF audio codec requirements are now published as RFC 6366. While the requirements aren't by themselves interesting (why discuss abstract requirements when you can discuss actual running code?), it's an important milestone in that it's the first document published by the Working Group. It also means one less source of pointless arguments. The guidelines document is now next in line and should go to IETF last call soon.
Now the interesting part of the Opus codec itself. That's the only document that really matters. That one should go to Working Group Last Call (WGLC) pretty soon (possibly next week or two). In the mean time, we're working on improving the clarity of the draft, cleaning up the code and fixing all the last few issues that have been reported since the first WGLC. Stay tuned.
French intern Paul Adenot has recentlyimplemented the seekable and played attributes on the HTML5 video and audio elements in Firefox. The seekable attribute enables script to see what regions of the media can be seeked into (particularly handy with live streams), and the played attribute enables script to see what regions of the media has already been played. Paul has also done some work improving the built in controls on media elements. Thanks for your hard work Paul! These should be available in release builds in November (Firefox 8).
Also in Firefox 8 are my changes to media seeking resolution. Now media seeking should be accurate to the nearest microsecond. It's been reported elsewhere how important accurate seeking for video is. We were previously accurate to the nearest video frame, but we could still be up to one audio packet off (often between 4 and 8 ms out). Now we prune audio samples when seeking so we're down to microsecond resolution.
This year I’m really excited to announce that the workshop will be an integral part of the Open Video Conference on 10-12 September 2011.
FOMS 2011 will take place as the Open Media Developers track at OVC and I would like to see as many if not more open media software developers attend as we had in last year’s FOMS.
Why should you go?
Well, firstly of course the people. As in previous years, we will have some of the key developers in open media software attend – not as celebrities, but to work with other key developers on hard problems and to make progress.
Then, secondly we believe we have some awesome sessions in preparation:
I’m actually not quite satisfied with just these sessions. I’d like to be more flexible on how we make the three days a success for everyone. And this implies that there will continue to be room to add more sessions, even while at the conference, and create breakout groups to address really hard issues all the way through the conference.
I insist on this flexibility because I have seen in past years that the most productive outcomes are created by two or three people breaking away from the group, going into a corner and hacking up some demos or solutions to hard problems and taking that momentum away after the workshop.
To allow this to happen, we will have a plenary on the first day during which we will identify who is actually present at the workshop, what they are working on, what sessions they are planning on a attending, and what other topics they are keen to learn about during the conference that may not yet be addressed by existing sessions.
We’ll repeat this exercise on the Monday after all the rest of the conference is finished and we get a quieter day to just focus on being productive.
But is it worth the effort?
As in the past years, whether the workshop is a success for you depends on you and you alone. You have the power to direct what sessions and breakout groups are being created, and you have the possibility to find others at the workshop that share an interest and drag them away for some productive brainstorming or coding.
I’m going to make sure we have an adequate number of rooms available to actually achieve such an environment. I am very happy to have the support of OVC for this and I am assured we have the best location with plenty of space.
Trip sponsorships
As in previous FOMSes, we have again made sure that travel and conference sponsorship is available to community software developers that would otherwise not be able to attend FOMS. We have several such sponsorships and I encourage you to email the FOMS committee or OVC about it. Mention what you’re working on and what you’re interested to take away from OVC and we can give you free entry, hotel and flight sponsorship.
The W3C has a Media Fragments Working Group whose mission is to specify temporal and media fragments in the Web using URI’s. The draft specification goes through in detail how these fragments work. I recently became a member of the working group and I’ve been working on adding support for the temporal dimension portion of the specification to Firefox.
In the most basic form you can specify a start time and an end time in the fragment part of a URI in an HTML video or audio element. For example:
The ‘#t’ portion of the URI identifies the fragment as being a temporal media fragment. In this example ‘50,100’ means start the video at a current time of 50 seconds, and stop playing at 100 seconds. There are various other formats for the temporal media fragment defined in the specification. Examples can be seen in the UA Test Cases.
Development of this feature is being done in bug 648595. I’ve done test builds of the first iteration of the patch and they are available at my Firefox media fragment test builds page. The page has builds, an example, and a list of limitations which are currently:
Temporal syntax only. This means no spatial or track dimensions.
NPT time support only. No SMPTE time codes or Wall-clock time codes.
When changing the media fragment portion of a URL the media is not immediately updated. You need to refresh the page to see the change. This is most noticeable when directly navigating to the video and adding or changing a fragment.
The user interface for identifying the fragment in the standard controls is ugly and needs polish.
The HTML standard includes an ‘initialTime’ attribute for obtaining the start time. There is no way to obtain an end time so I’ve exposed a ‘mozFragmentEnd’ attribute on the video DOM object.
While using the libevent API from ATS I came across a scenario where it was important to call a function to release objects in a particular order. I wanted to have ATS enforce at compile time that the destruction occurs safely in the right order. The following example uses the built in libevent HTTP API for creating simple web servers. It uses the ATS libevent wrapper I wrote about previously.
fn cb {l1,l2:agz} (request: !evhttp_request l1, arg: !event_base l2): void = let
val () = printf("here\n", @())
in
()
end
fn server () = let
val _ = signal (SIGPIPE, SIG_IGN)
val base = event_base_new ()
val () = assert_errmsg (~base, "event_base_new failed")
val http = evhttp_new (base)
val () = assert_errmsg (~http, "evhttp_new failed")
val r = evhttp_set_cb_with_base (http, "/", cb, base)
val () = assert_errmsg (r = 0, "evhttp_set_cb_with_base failed")
val r = evhttp_bind_socket (http, "0.0.0.0", uint16_of_int(8080))
val () = assert_errmsg (r = 0, "evhttp_bind_socket failed")
val r = event_base_dispatch (base)
val () = assert_errmsg (r >= 0, "event_base_dispatch failed")
val () = evhttp_free (http)
val () = event_base_free (base)
in
()
end
implement main () = server ()
The call to http_new requires an event_base object. Internally, inside the C API, the returned evhttp object holds a pointer to this event_base. This results in the pointer being shared in two places and requires careful destruction to prevent using a destroyed object.
Later I call evhttp_free to destroy and release resources associated with the evhttp object, followed by event_base_free to do the same with the event_base object. I do it in this order to prevent a dangling reference to the event_base inside the evhttp object. The evhttp object is associated with that particular event_base and it’s important not to pass an incorrect base to functions that use the http object. Ideally code like the following shouldn’t compile:
val base = event_base_new ()
val base2 = event_base_new ()
val http = evhttp_new (base)
// Wrong event_base passed, fail compilation
val r = evhttp_set_cb_with_base (http, "/", cb, base2)
// Destruction out of order, fail compilation
val () = event_base_free (base)
val () = evhttp_free (http)
To model this in ATS I changed the ATS definition of evhttp to have an event_base associated with it at the type level and modified evhttp_new to return this relationship (agz is an address that cannot be NULL, agez is an address that can be NULL):
With this change the type for evhttp depends on the value of the pointer to the event_base given as an argument. Now we can use this in the definition of evhttp_set_cb_with_base to ensure that the correct event_base object is used:
This definition states that the event_base given as the arg paramater, and as the second argument to the callback, must be the same as that associated with the http object. This is done by using the same l2 dependent type argument with these types. This gives us the desired error checking in the first ‘fail compilation’ test above.
The next step is to prevent out of order destruction. This is done by passing the event_base object as a proof argument to the evhttp_free function:
By adding this as proof argument I’m stating that the caller of evhttp_free must have a usable event_base object that was used to create this evhttp object to prove we can safely destroy it. The following is now a compile time error:
val () = event_base_free (base)
val () = evhttp_free (base | http)
This is due to event_base_free consuming the base linear object. The type of base following this call is no longer defined. The evhttp_free call is now an error due to using an undefined base. The following will work:
val () = evhttp_free (base | http)
val () = event_base_free (base)
This method of ensuring correct order of destruction and resource usage requires no changes to the libevent C code. Everything occurs during type checking. The generated C code looks exactly like normal libevent C usage with no runtime overhead for tracking the association between evhttp and event_base objects.
Last month I went down to Wellington to give a joint talk at the WDCNZ conference. The topic was “HTML Media: Where we are and where we need to go”. The talk was shared between Nigel Parker, Mobile and Developer Evangelist for Microsoft, and myself. The conference was excellent with some great speakers and talks.
Nigel and I covered using HTML video and audio elements and how they can be used today across multiple browsers and mobile devices. We also covered upcoming API’s and directions in the web media area. Demo’s were shown on Microsoft Internet Explorer 9, Windows Phone 7 (running the Mango update which has a browser that supports HTML media) and Firefox. Nigel has slides and a summary in his blog post about the talk. I think it took a bit for attendees to get over the shock over a Mozilla and Microsoft representative sharing the stage and working together! Nigel’s post explains how this came about.
I spoke to a few Kiwi developers afterwards about using HTML video and there was a fair bit of interest in using it. The main obstacles seemed to be people unsure what codec to use and wanting support for adaptive streaming. The first is an education issue, getting people aware of what codecs to support for maximum coverage across browsers and how to encode to those formats. With regards to adaptive streaming there has been discussion between interested parties in various mailing lists and groups - it’s definitely something that is wanted.
At the time of the talk Nigel and I weren’t able to find any existing New Zealand based sites that use HTML video. Hopefully this will change in the future and Microsoft New Zealand are leading by example by using HTML video on their own site.
After about 6 years of covering pop songs in my a cappella groups, I really wanted to sing some original music. In part, I was motivated by the US’s aggressively restrictive copyright regime, which always prevented us from freely sharing recordings of our own performances.
I tried to write a song from scratch for a while, but it wasn’t working out, mostly because I don’t have anything interesting to say. Then I struck upon the idea of using the text of an old out-of-copyright poem (which, because of the US’s effectively perpetual copyright, has to be very old indeed). I started browsing through the poetry section of WikiSource, until I stumbled across this brilliant 1895 poem by Langdon Smith. The choice was clear.
I drew up a thoroughly derivative 4-part a cappella arrangement in MuseScore, and VoiceLab indulged me by adding it to the repertoire. We’ve sung it twice so far, but the first time we didn’t have a good recording, and then this time I had to solve this audio-video alignment problem… but now it’s here.
The recordings and sheet music are all CC0 dedicated to the public domain. I would appreciate attribution as the arranger, but I find threats of legal action to be just as distasteful as plagiarism. I wouldn’t want to do anything to discourage people from adopting and adapting the music as they see fit. Maybe someone will make a recording with a soloist who can really sing!
While working on the Firefox HTML5 video and audio support, I've found it extremely useful to have an HTTP server on which the transfer rate is reliably limited. Existing servers are either too heavy weight, like apache, or have inconsistent rate-limiting, like lighttpd which I found to have very "bursty" rate limiting.
I ended up taking the educational route, and implementing a simple HTTP server in C++. It supports the following features:
Support for HTTP1.1 Byte Range Requests. This means you can seek into unbuffered data when watching HTML5 video.
Rate limiting, configurable on a per request basis by passing the "rate=x" HTTP query parameter, where x is the transfer rate of the connection in kilobytes per second. The server will send x/10 KB ten times per second to maintain this rate smoothly.
Simulated live streaming, configurable on a request basis by passing the "live" query parameter. When in "live" mode, no Content-Length header is sent, and the server doesn't advertise or perform byte range requests - so you can't seek into unbuffered video/audio, just like in a live stream.
Cross platform; tested on Windows (runs on port 80) and Linux (runs on port 8080). I haven't test it on MacOS yet.
Simply serves all files in the program's working directory, making it easy to use (and abuse).
For example, if you wanted to simulate a live stream being served at 100KB/s, your test URL might look something like http://localhost:80/video.ogg?rate=100&live.
I've been using it for quite a while, and over the weekend I finally cleaned it up and put it up on GitHub. Check it out.
I spent my last week in Quebec City at the 81th IETF meeting. The most important meeting there for me was the codec WG. The good news is that there's been a lot of progress in that meeting. A few issues with the Opus bit-stream (e.g. padding, frame packing) were resolved and the chairs are planning a second working group last call in four weeks. After that if all goes well, the codec can go to IETF last call and then RFC.
My week at the IETF meeting was also my first week at my new job working for Mozilla. I've been hired specifically to work on Opus and other codec/multimedia development, so I should have a lot more time for that than I used to. First thing on my list: finishing the Ogg mapping for Opus and releasing an Ogg encoder and decoder.
It’s rare to get exactly one recording of an a capella concert. Usually someone’s parents have a fancy but outdated camcorder, someone in the front row has a cell phone video with a great angle but terrible quality, and there’s a beautiful audio-only recording, maybe straight from the mixing board. All the recordings are independent, starting and stopping at different times. Some are only one song long, or are broken into many short pieces.
If you want to combine all these inputs into a video that anyone could watch, you’ll first have to line them up correctly in a video editor. This is a painful process of dragging clips around on the timeline with the mouse, trying to figure out if they’re in sync or not. The usual trick to making this achievable is to look at the audio waveform visualization, but even so, the process can be tedious and irritating.
This year, when I got three recordings from the VoiceLab spring concert, I resolved to solve the problem once and for all. I set about writing an automatic clip alignment algorithm as a patch to PiTiVi, a beautiful (if not mature) free software video editor written in Python.
Today, after about two months of nights and weekends, the result is ready for testing in PiTiVi mainline. Jean-François Fortin Tam has a great writeup explaining how it works from a user’s perspective.
I hadn’t looked into it until after the fact, but of course this is not the first auto-alignment function in a video editor. Final Cut Pro appears to have a similar function built in, and there are also plug-ins such as “Plural Eyes” for many editors. However, to the best of my knowledge, this is the first free implementation, and the first available on Linux. Comparing features in PiTiVi vs. the proprietary giants, I think of this as “one down, 20,000 to go”.
I guess this is as good a place as any to talk about the algorithm, which is almost The Simplest Thing that could Possibly Work. Alignment works by analyzing the audio tracks, relying on every video camera to have a microphone of its own. The most direct approach might be to compute the cross-correlation of these audio tracks and look for the peak … but this could require storing multi-gigabyte audio files in memory, and performing impossibly large FFTs. On computers of today, the direct approach is technologically infeasible.
The algorithm I settled on resembles the method a human uses when looking at the waveform view. First, it breaks each input audio stream into 40 ms blocks and computes the mean absolute value of each block. The resulting 25 Hz signal is the “volume envelope”. The code subtracts the mean volume from each track’s envelope, then performs a cross-correlation between tracks and looks for the peak, which identifies the relative shift. To avoid performing N^2 cross-correlations, one clip is selected as the fixed reference, and all others are compared to it. The peak position is quantized to the block duration (creating an error of +/- 20ms), so to improve accuracy a parabolic fit is used to interpolate the true maximum. I don’t know the exact residual error, but I expect it’s typically less than 5 ms, which should be plenty good enough, seeing as sound travels about 1 foot per ms.
My original intent was to compensate for clock skew as well, because all these recording devices are using independent sample clocks that are running at slightly different rates due to manufacturing variation. There’s even code in the commit for a far more complex algorithm that can measure this clock skew. At the moment, this code is disused, for two reasons: none of our test clips actually showed appreciable skew, and PiTiVi doesn’t actually support changing the speed of clips, especially audio.
If you want to help, just stop by the PiTiVi mailing list or IRC channel. We can use more test clips, a real testing framework, a cancel button, UI improvements, conversion to C for speed, and all sorts of general bug squashing. For this feature, and throughout PiTiVi, there’s always more to be done. I’ve found the developer community to be extremely welcoming of new contributions … come and join us.
People have been asking me lots of questions about WebVTT (Web Video Text Tracks) recently. Questions about its technical nature such as: are the features included in WebVTT sufficient for broadcast captions including positioning and colors? Questions about its standardisation level: when is the spec officially finished and when will it move from the WHATWG to the W3C? Questions about implementation: are any browsers supporting it yet and how can I make use of it now?
I’m going to answer all of these questions in this post to make it more efficient than answering tweets, emails, and skype and other phone conference requests. It’s about time I do a proper post about it.
Implementations
I’m starting with the last area, because it is the simplest to answer.
However, you do not have to despair, because there are now a couple of JavaScript polyfill libraries for either just the track element or for video players with track support. You can start using these while you are waiting for the browsers to implement native support for the element and the file format.
Here are some of the libraries that I’ve come across that will support SRT and/or WebVTT (do leave a comment if you come across more):
Captionator – a polyfill for track and SRT parsing (WebVTT in the works)
js_videosub – a polyfill for track and SRT parsing
I am actually most excited about the work of Ronny Mennerich from LeanbackPlayer on WebVTT, since he has been the first to really attack full support of cue settings and to discuss with Ian, me and the WHATWG about their meaning. His review notes with visual description of how settings are to be interpreted and his demo will be most useful to authors and other developers.
Standardisation
Before we dig into the technical progress that has been made recently, I want to answer the question of “maturity”.
The WebVTT specification is currently developed at the WHATWG. It is part of the HTML specification there. When development on it started (under its then name WebSRT), it was also part of the HTML5 specification of the W3C. However, there was a concern that HTML5 should be independent of the chosen captioning format and thus WebVTT currently only exists at the WHATWG.
In recent months – and particularly since browser vendors have indicated that they will indeed implement support for WebVTT as their implementation of the <track> element – the question of formal standardization of WebVTT at the W3C has arisen. I’m involved in this as a Google contractor and we’ve put together a proposed charter for a WebVTT Working Group at the W3C.
Many of the new features are about making the WebVTT format more useful for authoring and data management. The introduction of comments, inline CSS settings and default cue settings will help authors reduce the amount of styling they have to provide. File-wide metadata will help with the exchange of management information in professional captioning scenarios and archives.
But even without these new features, WebVTT already has all the features necessary to support professional captioning requirements. I’ve prepared a draft mapping of CEA-608 captions to WebVTT to demonstrate these capabilities (CEA-608 is the TV captioning standard in the US).
So, overall, WebVTT is in a great state for you to start implementing support for it in caption creation applications and in video players. There’s no need to wait any longer – I don’t expect fundamental changes to be made, but only new features to be added.
New WebVTT Features
This takes us straight to looking at the recently introduced new features.
Simpler File Magic:
Whereas previously the magic file identifier for a WebVTT file was a single line with “WEBVTT FILE”. This has now been changed to a single line with just “WEBVTT”.
Cue Bold Span:
The <b> element has been introduced into WebVTT, thus aligning it somewhat more with SRT and with HTML.
CSS Selectors:
The spec already allowed to use the names of tags, the classes of <c> tags, and the voice annotations of <v> tags as CSS selectors for ::cue. ID selector matching is now also available, where the cue identifier is used.
text-decoration support:
The spec now also supports the CSS text-decoration property for WebVTT cues, allowing functionality such as blinking text and bold.
Further to this, the email identifies the means in which WebVTT is extensible:
Header area:
The WebVTT header area is defined through the “WEBVTT” magic file identifier as a start and two empty lines as an end. It is possible to add into this area file-wide information header information.
Cues:
Cues are defined to start with an optional identifier, and then a start/end time specification with “–>” separator. They end with two empty lines. Cues that contain a “–>” separator but don’t parse as valid start/end time are currently skipped. Such “cues” can be used to contain inline command blocks.
Inline in cues:
Finally, within cues, everything that is within a “tag”, i.e. between “<" and ">“, and does not parse as one of the defined start or end tags is ignored, so we can use these to hide text. Further, text between such start and end tags is visible even if the tags are ignored, so wen can introduce new markup tags in this way.
Given this background, the following V2 extensions have been discussed:
Metadata:
Enter name-value pairs of metadata into the header area, e.g.
Inline Cue Settings:
Default cue settings can come in a “cue” of their own, e.g.
WEBVTT
DEFAULTS --> D:vertical A:end
00:00.000 --> 00:02.000
This is vertical and end-aligned.
00:02.500 --> 00:05.000
As is this.
DEFAULTS --> A:start
00:05.500 --> 00:07.000
This is horizontal and start-aligned.
Inline CSS:
Since CSS is used to format cue text, a means to do this directly in WebVTT without a need for a Web page and external style sheet is helpful and could be done in its own cue, e.g.
Comments:
Both, comments within cues and complete cues commented out are possible, e.g.
WEBVTT
COMMENT -->
00:02.000 --> 00:03.000
two; this is entirely
commented out
00:06.000 --> 00:07.000
this part of the cue is visible
<! this part isn't >
<and neither is this>
Finally, I believe we still need to add the following features:
Language tags:
I’d like to add a language tag that allows to mark up a subpart of cue text as being in a different language. We need this feature for mixed-language cues (in particular where a different font may be necessary for the inline foreign-language text). But more importantly we will need this feature for cues that contain text descriptions rather than captions, such that a speech synthesizer can pick the correct language model to speak the foreign-language text. It was discussed that this could be done with a <lang jp>xxx</lang> type of markup.
Roll-up captions:
When we use timestamp objects and the future text is hidden, then is un-hidden upon reaching its time, we should allow the cue text to scroll up a line when the un-hidden text requires adding a new line. This is the typical way in which TV live captions have been displayed and so users are acquainted with this display style.
Inline navigation:
For chapter tracks the primary use of cues are for navigation. In other formats – in particular in DAISY-books for blind users – there are hierarchical navigation possibilities within media resources. We can use timestamp objects to provide further markers for navigation within cues, but in order to make these available in a hierarchical fashion, we will need a grouping tag. It would be possible to introduce a <nav> tag that can group several timestamp objects for navigation.
Default caption width:
At the moment, the default display size of a caption cue is 100% of the video’s width (height for vertical directions), which can be overruled with the “S” cue setting. I think it should by default rather be the width (height) of the bounding box around all the text inside the cue.
Aside from these changes to WebVTT, there are also some things that can be improved on the <track> element. I personally support the introduction of the source element underneath the track element, because that allows us to provide different caption files for different devices through the @media media queries attribute and it allows support for more than just one default captioning format. This change needs to be made soon so we don’t run into trouble with the currently empty track element.
I further think a oncuelistchange event would be nice as well in cases where the number of tracks is somehow changed – in particular when coming from within a media file.
Other than this, I’m really very happy with the state that we have achieved this far.
If you want a hardware Ogg Player you should consider buying a TrekstorSamsung (most of their MP3 players support Ogg Vorbis and FLAC formats) product!
Regardless, Xiph.Org has assembled an official comment document, and will be represented in person by at least Dr. Tim Terriberry and possibly a few other core members (I won't be there).
If you're interested in software patents, some of the US Government's thinking on the issue, and participating in the process, have a look at the above two links. Also, feel free to distribute our comments far and wide. It's somewhat more gripping than the usual, dry "Percy Q. Business Leader Advises the Federal Goverment".
I'd mentioned in the previous update that we're (Xiph is) using a chirp estimation algorithm that we published back in 2007, and that the original paper has precious little space to devote to describing in detail how the algorithm actually performed. One of the upshots of not having done extensive characterization tests of our own algorithm was that it has already surprised me a few times this year (in both good and bad ways).
Therefore, Ghost Update 20110604 concerns itself with describing and graphing algorithm behavior in mind-numbing detail.
In the last months, we’ve been working hard at the WHATWG and W3C to spec out new HTML markup and a JavaScript interface for dealing with audio or video content that has more than just one audio and video track.
This is particularly relevant when a Web page author wants to add a sign language track to a video or audio resource for deaf people, or an audio description track (i.e. a sound track in which a speaker explains the key things that can be seen on screen) for blind people. It is also relevant when a Web page author wants to publish a video with multiple audio tracks that are each a different language dub for the video and can be used for less common cases such as a director’s comment track, or making available different camera angles for an event.
Just to be clear: this is not a means to introduce video editing functionality into the Web browser. If you want to do edits, you’re better off with an application that will eventually render a new piece of content and includes fancy transitions etc. Similarly, this is not a means to introduce mixing functionality (as in what DJs do when they play with multiple audio recordings). You’re better off with an actual audio mixing or DJ application that will provide you all sorts of amazing effects and filters.
So, multi-track is squarely focused on synchronizing alternative or additional tracks to a single resource with a single timeline to which all tracks are slaved.
Two means of publishing such multi-track media content are possible:
Of the video file formats that Web browsers support, WebM is currently not defined to contain more than one audio or video track. However, since WebM is using the Matroska container format, which supports multi-track, it is possible to extend WebM for multi-track resources. I have seen multitrack Ogg, MP4 and Matroska files in the wild and most media players support their display.
The specification that has gone into HTML5 to support in-band multi-track looks as follows:
interface HTMLMediaElement : HTMLElement {
[...]
// tracks
readonly attribute MultipleTrackList audioTracks;
readonly attribute ExclusiveTrackList videoTracks;
};
interface TrackList {
readonly attribute unsigned long length;
DOMString getID(in unsigned long index);
DOMString getKind(in unsigned long index);
DOMString getLabel(in unsigned long index);
DOMString getLanguage(in unsigned long index);
attribute Function onchange;
};
interface MultipleTrackList : TrackList {
boolean isEnabled(in unsigned long index);
void enable(in unsigned long index);
void disable(in unsigned long index);
};
interface ExclusiveTrackList : TrackList {
readonly attribute unsigned long selectedIndex;
void select(in unsigned long index);
};
You will notice that every audio and video track gets an index to address them. You can enable() and disable() individual audio tracks and you can select() a single video track for display. This means that one or more audio tracks can be active at the same time (e.g. main audio and audio description), but only one video track will be active at a time (e.g. main video or sign language).
Through the getID(), getKind(), getLabel() and getLanguage() functions you can find out more about what actual content is available in the individual tracks so as to activate/deactivate them correctly and display the right information about them.
getKind() identifies the type of content that the track exposes such as “description” (for audio description), “sign” (for sign language), “main” (for the default displayed track), “translation” (for a dubbed audio track), and “alternative” (for an alternative to the default track).
getLabel() provides a human readable string that describes the content of the track aiming to be used in a menu.
getID() provides a short machine-readable string that can be used to construct a media fragment URI for the track. The use case for this will be discussed later.
getLanguage() provides a machine-readable language code to identify which language is spoken or signed in an audio or sign language video track.
Example 1:
The following uses a video file that has a main video track, a main audio track in English and French, and an audio description track in English and French. (It likely also has caption tracks, but we will ignore text tracks for now.) This code sample switches the French audio tracks on and all other audio tracks off.
The following uses a audio file that has a main audio track in English, no main video track, but sign language video tracks in ASL (American Sign Language), BSL (British Sign Language), and ASF (Australian Sign Language). This code sample switches the Australian sign language track on and all other video tracks off.
If you have more tracks in both examples that conflict with your intentions, you may need to further filter your activation / deactivation code using the getKind() function.
2. Synchronized resources
Sometimes the production process of media creates not a single resource with multiple contained tracks, but multiple resources that all share the same timeline. This is particularly useful for the Web, because it means the user can download only the required resources, typically saving a substantial amount of bandwidth.
For this situation, an attribute called @mediagroup can be added in markup to slave multiple media elements together. This is administrated in the JavaScript API through a MediaController object, which provides events and attributes for the combined multi-track object.
The new IDL interfaces for HTMLMediaElement are as follows:
interface HTMLMediaElement : HTMLElement {
[...]
// media controller
attribute DOMString mediaGroup;
attribute MediaController controller;
};
interface MediaController {
readonly attribute TimeRanges buffered;
readonly attribute TimeRanges seekable;
readonly attribute double duration;
attribute double currentTime;
readonly attribute boolean paused;
readonly attribute TimeRanges played;
void play();
void pause();
attribute double defaultPlaybackRate;
attribute double playbackRate;
attribute double volume;
attribute boolean muted;
attribute Function onemptied;
attribute Function onloadedmetadata;
attribute Function onloadeddata;
attribute Function oncanplay;
attribute Function oncanplaythrough;
attribute Function onplaying;
attribute Function onwaiting;
attribute Function ondurationchange;
attribute Function ontimeupdate;
attribute Function onplay;
attribute Function onpause;
attribute Function onratechange;
attribute Function onvolumechange;
};
You will notice that the MediaController replicates some of the states and events of the slave media elements. In general the approach is that the attributes represent the summary state from all the elements and the writable attributes when set are handed through to all the slave elements.
Importantly, if the individual media elements have @controls activated, then the displayed controls interact with the MediaController thus allowing synchronized playback and interaction with the combined multi-track object.
Example 3:
The following uses a video file that has a main video track, a main audio track in English. There is another video file with the ASL sign language for the video, and an audio file with the audio description in English. This code sample creates controls on the first file, which then also control the audio description and the sign language video, neither of which have controls. Since the audio description doesn’t have controls, it doesn’t get visually displayed. The sign language video will just sit next to the main video without controls.
We now accompany a main video with three sign language video tracks in ASL, BSL and ASF. We could just do this in JavaScript and replace the currentSrc of a second video element with the links to BSL and ASF as required, but then we need to run our own media controls to list the available tracks. So, instead, we create a video element for each one of the tracks and use CSS to remove the inactive ones from the page layout. The code sample activates the ASF track and deactivates the other sign language tracks.
In this final example we look at what to do when we have a in-band multi-track resource with multiple video tracks that should all be displayed on screen. This is not a simple problem to solve because a video element is only allowed to display a single video track at a time. Therefore for this problem you need to use both approaches: in-band and synchronized resources.
We take a in-band multitrack resource with a main video and audio track and three sign language tracks in ASL, BSL and ASF. The second resource will be made up from the URI of the first resource with a media fragment address of the sign language tracks. (If required, these can be discovered using the getID() function on the first resource.) The markup will look as follows:
Note that with multiple video elements you can always style them in the way that you want them displayed on screen. E.g. if you want a picture-in-picture display, you scale the second video down and absolutely position it on top of the first one in the appropriate location. You can even grab the second video into a canvas, chroma-key your sign language speaker on a green or blue screen and remove that background through some canvas processing before popping it on top of the video.
The world is all yours!
HOWEVER: There is one big caveat on all these specs – while they have all found entry into the HTML5 specification, it would be expecting a bit much to have browser support already.
Note: there is something weird going on with the wordpress plugins site, which still shows version 0.7 as the current one, but when you download it, it gets the latest version 0.12. If somebody knows how to fix this, that would be awesome. I think it also stops people from auto-updating this plugin, which is sad with this many improvements.
(I think I fixed it by actually changing the version number in the external-videos.php file – how silly of me – and thanks to the WordPress Forum person who pointed it out to me! Download 0.13 now.)
Not actually last month, but pretty close at this point.
I never publically released the my previous Ghost update delivered internally to Red Hat at the beginning of the month because I hadn't finished some of the diagrams I wanted to do for the last section on chirp coding. Well, the diagrams are done! Here's the latest Ghost demo update, just in time for the next one to almost be due!
The WebM folks have finally finished up their work on the WebM Community Cross-License project and announced the license launch. This is a FOSS defensive license/pool similar to what a couple other groups are trying out (and similar to the defensive patent license that Xiph is already using for our parts of Opus within the IETF).
The basic idea of the cross-license is:
"Everyone is free to use any known or unknown WebM patents. Unless you sue over patents related to WebM. In that case, we all agree to yank your license."
In short, it's sort of a NATO for FOSS patents; a free license with an agreed-upon mutual defense clause that tries to enforce everyone playing nice. This strategy is not a new idea, but it's interesting that several different FOSS groups, Xiph and WebM included, are finally trying the idea for real in practice.
Greg Maxwell has just posted a nice second 'demo' page for Opus. It mostly covers the recent listening testing done by volunteers at Hydrogen Audio. Pretty colors and interactive listening/comaprison scripting!
For those of you new to Opus, it's the FOSS/RF codec we're working
on within the IETF codec
working group. Opus is a collaborative hybrid speech /
high-fidelity audio codec built using primarily Xiph's CELT codec and
the Skype SILK voice codec as inputs. That makes Opus similar in some
ways to what MPEG is trying to achieve with USAC (Unified Speech and
Audio Coding), though Opus is also ultra-low latency, so it looks like
we're considerably ahead of MPEG here.