Planet Xiph

November 06, 2009

Ben Schwartz

Old bugs fixed

In the process of testing Cortado on old operating systems, we discovered that using a recent compiler produced bytecode that wouldn’t run on Sun JDK 1.1. Instead, we got IllegalMonitorState exceptions in an infinite loop.

A little bit of searching made it clear that we weren’t the only ones who’d experienced this problem. There were reports going back to 2001 that Sun had introduced some sort of bug in their compiler in version 1.4. We verified that going back to an old compiler produced code that worked for us, again.

Today Greg Maxwell constructed a minimal test case and printed out the disassembled bytecode produced with old and new compilers. One difference stood out: the new compiler introduced a circular exception handler at the end of a synchronized block. I looked around, and sure enough, this behavior drew complaints when it first appeared over eight years ago.

Rather than attempt to convince the compiler authors that their code has a logical fallacy, or somehow fix ten-year-old versions of closed-source software, we instead decided to add a workaround into ProGuard, a bytecode post-processor that we are already using to shrink Cortado by 30% for faster downloads.

There’s an interesting question here as to what, exactly, the bug is. Is it a code generation bug, in which the compiler produces bytecode that will not run correctly on the Java 1.1 target? Or is it a JVM bug, exposed by newer compilers that make use of previously untested edge cases? This is a case of Software Development Relativity: the number of bugs is conserved, but their precise location depends on your reference frame.

Anyway, I think this is a nice short story about the power of an open development model. We found a bug somewhere in a complex system, and wound up putting a fix in the component whose maintainers, we hope, will be most receptive to it. When one avenue is cut off, open source finds another route.

by Ben at November 06, 2009 05:38 AM

November 03, 2009

Ben Schwartz

Cortado

A project I’ve been playing with recently is Ogg Theora’s Cortado, a free video player designed to be able to run on an extremely wide variety of computers, including old, obsolete systems. How old, you ask?

Really old:

Screenshot of Cortado playing a video in SheepShaver

Screenshot of Cortado playing a video in SheepShaver

This is a picture of Cortado running on Mac OS 7.5.5, in the Macintosh Runtime for Java 2.0, playing the video from the FSF’s freedom testimonials campaign. This operating system was released in 1996. The system is emulated in SheepShaver, which makes playback far too slow to be usable. Someone will have to test on real hardware to see what happens.

Nonetheless, I think this is strong evidence regarding how serious we are about backwards compatibility and inclusive software. Serious, or at least, enthusiastic.

by Ben at November 03, 2009 03:57 AM

November 01, 2009

Silvia Pfeiffer

Best economy flight evva!

Over the years, I have flown a lot – mainly between Sydney and Frankfurt or Sydney and San Francisco. Today, for the first time in a long time, I had a flight with Qantas from Sydney to San Francisco. And I must say: it was the most productive and most comfortable economy flight I had in a long time.

This is gonna feel awkward, since it’s not one of my usual technical posts. But I just have to say “Thank you” to Qantas. When I fly to the US, I tend to catch a US airline because they usually turn up as the cheapest. This time, Qantas was the second cheapest, so I decided to spend the extra hundred bucks on getting a modern airline. Yes, get that US airlines: no matter which of you I take, I always feel like I am thrown back into the last century. Legspace is rare, seats are uncomfortable, food is crap, service is poor, oh … and have you ever heard of personal entertainment screens? Yes, I know, your planes are from the last century. But honestly: I had a personal entertainment screen on my Singapore Airlines flight when coming to Australia for the first time in 1998! Couldn’t you at least upgrade the inside of your planes?

Anyway, back to this flight. It all started with the question: would you like to sit in the centre isle in front of the baby bassinet? Oh, I usually take a window seat to get some peace and quiet – but hey, I’m not going to say “no” to space! And, man did I use it!

I settled in with a good book and a little nap until the first meal and after that felt strengthened and awake enough to start hacking. With my new MacBook Pro, I was bound to get a few hours in before the battery would die on me. Not the 7 hours, that Apple claims, but that’s because I was going to do lots of compiles of Firefox. Anyway – without a seat in front of me, without the personal entertainment screen pulled out, and with the nice thick cushion that Qantas supply on my lap, protecting me from the laptop heat, I almost felt like I was back home in my living room.

On top of that – and unfortunately for Qantas, but fortunately for me – the plane was only two thirds full, so I had the middle seat on my left empty, which I immediately used to extend my table space. I had continuing catering service for the next 4-5 hours of compiling, applying OggK patches to the new Chris Double Firefox codebase, and fixing compile errors (all configuration based – I have yet to get to writing actual code). Ongoing catering service, no need to cook for myself, uninterrupted coding time, good music from the inflight entertainment service – I think I’ll move my office into a Qantas plane! Not been this productive in ages!

Everywhere around me the lights were out, people were watching movies, but I was working and really enjoying it. And then, the battery was empty, half way into the flight. Bummer! But I didn’t give up this easily. Thought it’d be worth asking if there was a way to recharge without occupying a toilet for two hours. And as with everything else, Qantas inflight personnel made an extra effort to please: they found me a empty seat in business class and hooked up the laptop for an hour to recharge. Totally, utterly awesome! I got it back after another nice reading break – cannot start watching movies, since that makes the brain go mash. I got another few hours of compiling in before my body forced me to catch a few hours of sleep.

Now, I’m about an hour away from San Fran and the laptop claims 40min of power left. Funnily, that number seems to go up rather than down, so I’m sure it will last until arrival (uh! It’s now at 1:24min – oh, compilation just finished!). Hopefully I will be able to find out, why some of the Ogg Theora/Vorbis/Kate videos that I created using kateenc and oggz-merge don’t play in the patched Firefox. After all, it would be awesome to be able to show it off in the upcoming HTML5 Video Accessibility workshop!

by silvia at November 01, 2009 04:34 AM

October 29, 2009

Silvia Pfeiffer

New proposal for captions and other timed text for HTML5

The first specification for how to include captions, subtitles, lyrics, and similar time-aligned text with HTML5 media elements has received a lot of feedback – probably because there are several demos available.

The feedback has encouraged me to develop a new specification that includes the concerns and makes it easier to associate out-of-band time-aligned text (i.e. subtitles stored in separate files to the video/audio file). A simple example of the new specification using srt files is this:

<video src="video.ogv" controls>
   <itextlist category="CC">
     <itext src="caption_en.srt" lang="en"/>
     <itext src="caption_de.srt" lang="de"/>
     <itext src="caption_fr.srt" lang="fr"/>
     <itext src="caption_jp.srt" lang="jp"/>
   </itextlist>
 </video>

By default, the charset of the itext file is UTF-8, and the default format is text/srt (incidentally a mime type the still needs to be registered). Also by default the browser is expected to select for display the track that matches the set default language of the browser. This has been proven to work well in the previous experiments.

Check out the new itext specification, read on to get an introduction to what has changed, and leave me your feedback if you can!

The itextlist element
You will have noticed that in comparison to the previous specification, this specification contains a grouping element called “itextlist”. This is necessary because we have to distinguish between alternative time-aligned text tracks and ones that can be additional, i.e. displayed at the same time. In the first specification this was done by inspecting each itext element’s category and grouping them together, but that resulted in much repetition and unreadable specifications.

Also, it was not clear which itext elements were to be displayed in the same region and which in different ones. Now, their styling can be controlled uniformly.

The final advantage is that association of callbacks for entering and leaving text segments as extracted from the itext elements can now be controlled from the itextlist element in a uniform manner.

This change also makes it simple for a parser to determine the structure of the menu that is created and included in the controls element of the audio or video element.

Incidentally, a patch for Firefox already exists that makes this part of the browser. It does not yet support this new itext specification, but here is a screenshot that Felipe Corrêa da Silva Sanches created to demonstrate it:

screenshot of subtitle menu included in Firefox

If several itextlist elements are specified, that menu will receive sub-menus – one each for each itextlist. An example is the following:

<video src="video.ogv" aria-label="test video" controls>
    <itextlist category="SUB" name="subtitles">
      <itext src="sub_en.srt" lang="en"/>
      <itext src="sub_de.srt" lang="de"/>
      <itext src="sub_fr.srt" lang="fr"/>
      <itext src="sub_jp.srt" lang="jp"/>
    </itextlist>
    <itextlist category="TAD" name="spoken transcript">
      <itext id="tad_en" src="tad_en.srt" lang="en"/>
      <itext id="tad_jp" src="tad_jp.srt" lang="jp"/>
    </itextlist>
  </video>

which will result in the following menu structure:

text
- subtitles
-- English
-- German
-- French
-- Japanese
-- none
- spoken transcript
-- English
-- Japanese
-- none

Similarly, a context menu would use the same structure.

Callbacks on timed text segments
This specification further introduces callbacks on time-aligned text segments: onenter and onleave. At this stage this is an idea I am experimenting with, but I believe has lots of potential to allow people to do fancy things when subtitles appear or disappear. Some ideas are: to have a specific picture displayed that relates to the text segment, to have text in another area of the display change e.g. because we have moved into a different part of the full text transcript, or to display Google ads that relate to the text in that particular text segment.

I am curious about feedback on this idea. It relates closely to the idea of cue ranges that was previously part of HTML5.

It is possible to achieve this effect simply through adding a timeupdate event listener, but proper callbacks like these are much more efficient.

Synchronisation adjustments
Another addition to the itext element is the introduction of two attributes that together allow fixing synchronisation issues in the timing between the video (or audio) and the itext track. The two attributes are “delay” and “stretch”.

“delay” allows specification of a negative or positive float value that represents the amount of seconds with which to delay the display of the itext text segments relative to the timing of the video (or audio) element.

“stretch” allows fixing a constant drift that in timing differences between the video (or audio) element and the text segments. It is given in percent, where 100% means no time stretch, 97% means getting the text segments 3% faster than their actual timing, and 108% means 8% slower.

These attributes are relevant since itext files are independent resources to the media resource and can therefore synchronise to a different clock than the media files. It happens frequently with srt files that are being used for differently encoded video files.

Further feedback
I am currently experimenting with creating the same kind of JavaScript API for in-line annotation tracks through extending some Firefox patches. It is exciting to see it all come together.

At the same time, I am sure there is still feedback that will further improve the specification and I encourage you to contribute. I have set up a wiki page where you can leave your feedback. Also feel free to drop me an email or leave a comment on this blog post. Thanks!

UPDATE 30th Oct 2009:
There is now also a working implementation that demonstrates the approach with itextlist. Check out http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html, which will not look much different to the previous version, but does indeed behave very differently.

by silvia at October 29, 2009 01:27 PM

October 28, 2009

Silvia Pfeiffer

Cortado 0.5.0 released

Cortado is a java applet that provides support for Ogg Theora/Vorbis to Web publishers. It’s particularly useful to publishers that want to use Ogg Theora/Vorbis in Browsers that do not yet support the HTML5 video element with Ogg.

Cortado was originally developed by Fluendo SA under a LGPL license and contains a re-implementation of Theora and Vorbis in Java (jheora and jcraft). After a few years of low maintenance, the Wikimedia Foundation took it in their hands to undust the code for their use in the Wikimedia Commons, where only unencumberd open video format are acceptable.

As Ralph states in his announcement of the new release: earlier this year, Xiph.org took over maintenance of the Cortado java applet to help concentrate interest and expertise on this important component of the free media codec infrastructure. Therefore, the official website for Cortado is as now part of the Xiph. [If somebody could update the Wikipedia article - that would be awesome!]

So, I am very happy to point to the first Cortado release in three years. Source and sample builds are available from the Xiph.org download site.

Ralph writes further:

The new version is tagged 0.5.0 to indicate both the change in hosting and the significant new support for files from the new libtheora encoder implementation and Kate embedded subtitles.

In particular, 0.5.0 has:

  • Support for files encoded with Theora 1.1
  • Faster YUV to RGB conversion with better results
  • Basic support for embedded Ogg Kate streams
  • Seeking fixed for files with an Ogg Skeleton track
  • Maintained compatibility with the Microsoft VM

This is an awesome example of the power of open source and what a group of people can achieve. Congratulations to everyone at Xiph, Wikipedia, and anyone else who contributed to the release!

by silvia at October 28, 2009 08:04 AM

October 17, 2009

Silvia Pfeiffer

Dealing with multi-track video (and audio)

We are slowly approaching the stage where we want to make multi-track video of the following type available and accessible:

  • original video track
  • original audio track
  • dubbed audio tracks in n different languages
  • audio description track in n different langauges
  • sign language video tracks in n different sign langauges
  • caption tracks in n different langauges
  • multiple other time-aligned text tracks in different langauges
  • audio and video track from different camera angles
  • music and speech tracks can be separate
  • different quality tracks are available
  • accompanying images, e.g. slides for a presentation

One of the issues with such a sizeable number of tracks is how to display them. Some of them are alternatives, some of them additions. Sign language is typically presented in a PiP (picture-in-picture) approach. If we have a music and a speech (or singing) track, we may want to have control over removing certain tracks – e.g. to be able to do karaoke. Caption and subtitle tracks in the same language are probably alternatives, while in different languages they could be additions. It is not a trivial challenge to handle such complex files in an application.

At this point, I am only trying to solve a sub-challenge. As we talk about a particular track in a multi-track media file, we will want to identify it by name. Should there be a standard for naming the track, so that we can e.g. address them by a URL, e.g. with the intention of only delivering a subset of tracks from the larger file? We could introduce that for Ogg – but maybe there is an opportunity to do this across file formats?

To find some answers to these and related questions, I want to discuss two approaches.

The first approach is a simple numbering approach. In it, the audio, video, and annotation tracks are all ordered and then numbered through. This will result in the following sets of track names: video[0] … [n], audio[0] … [n], timed text[0] … [n], and possibly even timed images[0] … [n]. This approach is simple, easy to understand, and only requires ordering the tracks within their types. It allows addressing of a particular track – e.g. as required by the media fragment URI scheme for track addressing. However, it does not allow identification of alternatives, additions, or presentation styles.

Should alternatives, additions, and presentation styles be encoded in the name of track? Or should this information go into a meta description area of the multi-track video? Something like skeleton in Ogg? Or should it go a step further and be buried in an external information file such as an m3u file (or ROE for Ogg)?

I want to experiment here with the naming scheme and what we would need to specify to be able to decide which tracks to ignore and which to combine for a presentation. And I want to ask for your comments and advice.

This requires listing exactly what types of content tracks we may have to deal with.

In the video space, we have at minimum the following track types:

  • main video content – with alternative camera angles
  • subsidiary video content – with alternative camera angles
  • sign language videos – in alternative languages

Alternatives are defined by camera angle and language. Also, each track can be made available in a different quality. I’d also regard additional image content, such as slides in a presentation, into subsidiary video content. So, here we could use a scheme such as video_[main,side,sign]_language_angle.

In the audio space, we have at minimum the following track types:

  • main audio content – in alternative languages
  • background audio content – e.g.music, SFX, noise
  • foreground speech or singing content – in alternative languages
  • audio descriptions – in alternative languages

Alternatives are defined by language and content type. Again, each track can be made available in a different quality. Here we could use a scheme such as audio_type_language.

In the text space, we have at minimum the following track types:

  • subtitles – in different languages
  • captions – in different languages
  • textual audio descriptions – in different languages
  • other time-aligned text – in different languages

Alternatives are defined by language and content type – e.g. lyrics, captions and subtitles really compete for the same screen space. Here we could use a scheme such as text_type_language.

A generic track naming scheme
It seems, the generic naming scheme of

<content_type>_<track_type>_<language> [_<angle>]

can cover all cases.

Are there further track types, further alternatives I have missed? What do you think?

by silvia at October 17, 2009 12:15 PM

October 13, 2009

Silvia Pfeiffer

MySQL, Snow Leopard and ruby

I got a shiny new MacBook Pro on the weekend, yay! After months of complaining about the slowness and the heat evaporating from my old Macbook, I’m finally off to better grounds.

But then there was the annoying task of setting up the machine with all the software that I’m using. MySQL and ruby turned out to be particular problems. I installed MySQL for 10.5, since MySQL haven’t published one for OS 10.6 yet. I ran “gem install mysql”. And then the pain started.

I got all the errors that were reported elsewhere:
uninitialized constant MysqlCompat::MysqlRes” and “undefined method `real_connect’ for Mysql:Class (NoMethodError)“. I tried all the suggestions – including:
"sudo env ARCHFLAGS="-arch x86" gem install mysql -- --with-mysql-config=/usr/local/mysql-5.1.39-osx10.5-x86/bin/mysql_config -V --debug, but just couldn’t get there.

My laptop reports in the System Software Overview: “64-bit Kernel and Extensions: No”, so I assumed I had to use the 32 bit versions. However, that was a wrong assumption. Even though my kernel seems to be 32 bit, applications seem to be 64 bit.

So, eventually I re-installed MySQL for Mac OS X 10.5 (x86_64) and ran the correct gem install command:
sudo env ARCHFLAGS="-arch x86_64" gem install mysql -- --with-mysql-config=/usr/local/mysql-5.1.39-osx10.5-x86 and things were fine.

Additionally, there was some fighting with the PrefPane and re-starting mysql. I had to kill it manually and I had to install the updated PrefPane of Swoon dot net to make it work.

Hope this helps somebody avoid the same pain!

by silvia at October 13, 2009 01:33 PM