Xiph logo

David Schleef : New Schrödinger Release


I recently added support for 10- and 16-bit encoding and decoding to Schrödinger, so I did a little release. Presenting Schrödinger-1.0.11. Also pushed changes to GStreamer to handle the new features. Although these changes have been in the works for some time, a little prompting from j-b caused me to finish this off, so this will probably appear in VLC soon, too.
This was the last piece needed to create a 10-bit master of Sintel, which I’ve been planning to do for some time.

by admin at January 23, 2012 04:23 PM

Jean-Marc Valin : Back From LCA, Video Available


I just got back from linux.conf.au 2012 in Ballarat. The video for the talk I gave, Opus, the Swiss Army Knife of Audio Codecs, is now available on the Opus presentations page. For the Ogg-impaired, a lower-quality version is also available on YouTube.

For those who are into speech codecs, I also recommend watching David Rowe's presentation: Codec 2 - Open Source Speech Coding at 2400 bit/s and Below. His presentation was selected as one of the four best talks at LCA this year -- well worth watching.

January 22, 2012 06:25 AM

Monty : XiphQT components, MAC OS X and 64 bit iTunes


Camilla forwarded a necessary tip for installing the XiphQT components on a 64 bit Mac OS X so that it works with iTunes. This is a reasonably well known tip, but it wasn't in our FAQ or installation instructions (well it is now as of about ten minutes ago) so I'm passing it along now too...

I upgraded to Lion, and my ogg files stopped being able to play in iTunes (silently). Here's how to make it go:
  1. "show in finder" your iTunes binary (either navigate to the Applications folder, or right/control click on it in the dock, and choose "show in finder")
  2. right/control click on iTunes in the finder, and select "Get Info"
  3. Under General, check the box marked "Open in 32-bit mode"

You should put the above on something linked from: http://www.xiph.org/quicktime/download.html I paraphrased it from roaringapps.com.

If XiphQT can be rebuilt in 64 bit mode, and that shipped that way to Lion users, that would also be a good solution.

That last comment is actually a bit of an embarrassment for us at the moment; neither the XiphQT builds nor code have been updated since 2009 or so, despite multiple releases, fundamental improvements and new features in the Xiph codecs since. There are actually more recent beta builds of updated Mac OS X and Win32 XiphQT components than never got bumped to the official XiphQT download page, but even these builds are from mid 2009.

We don't have any high-powered Mac OS hackers in the core Xiph group at the moment. I have some relatively insignificant amount of experience coding for Mac OS X and Quicktime, but I've been hoping for a volunteer with more chops. Any takers?

by Monty (monty@xiph.org) at January 21, 2012 08:24 AM

Monty : Ghost Update: Demo 4


Turns out I missed blogging about the latest Ghost update... back in November...

Ghost Demo4 is up on the demo list showing the sinusoidal extractor doing some very early sinusoidal tracking frame to frame, and a very early example of the analysis performing real sinusoidal/non-sinusoidal audio splitting. Pictures and interactive listening, oh my!

It looks like I'll be putting a month or two into transOgg before getting back to Ghost work (and demo 5). The work that went into demo4 raised a number of questions I'm not sure how to approach answering yet, so I'm going to let that percolate for a bit.

by Monty (monty@xiph.org) at January 21, 2012 07:19 AM

Monty : Planet Xiph posts now featured on Xiph.Org front page


I made a quick change to the Xiph.Org front page that a few people have suggested now over the past few years.

The top few blog posts aggregated by Planet Xiph now appear as a five-item teaser list near the top of the Xiph.Org home page. The idea is both to get some more live content on the front page as well as to draw more attention to both the Planet and our developer community.

Comments and feedback welcome!

by Monty (monty@xiph.org) at January 19, 2012 02:33 PM

Jean-Marc Valin : Opus quality update


Those who have been following the Opus git repository in the past few weeks probably haven't noticed much work going on. The reason is pretty simple, most of the work has been going on elsewhere in an experimental branch (exp_wip3 names for now) of my private repository. The reason it's in an experimental branch is that its not fully converted to fixed-point and hasn't been tested on any frame size other than 20 ms. Here's an (incomplete) list of changes for now:

  • Really unconstrained VBR (not trying to keep the same average rate)
  • Tonality detection to give highly tonal audio a boost in bit-rate
  • (yet another) rewrite of the transient detection code
  • New dynamic allocation code that boosts the rate of bands that have significant spectral leakage caused by short blocks

Thanks to these changes, the quality has (as far as we can tell) gone up compared to the current master branch. I invite you to judge for yourself by comparing the audio coded with the current master branch with the audio coded with the new exp_wip3 experimental branch. This is 64 kb/s, so fairly low rate for stereo music. The original is here. Let me know what you think.

January 10, 2012 09:39 AM

Chris Pearce : Changes to DOM full-screen API in Firefox 11


We've made some changes to how the HTML full-screen API exits full-screen mode in Firefox 11, which is scheduled to ship in March 2012. Previously Document.mozCancelFullScreen() would fully-exit full-screen and return the browser to "normal" mode. Starting in Firefox 11, Document.mozCancelFullScreen() will restore full-screen state to the element that was previously full-screen. If there is no previous full-screen element in either the document or a parent document (full-screen mode isn't restored to former full-screen elements in child documents), then the browser will "fully-exit full-screen", and return the browser to normal mode.

To see how this is useful, consider the case of a PowerPoint clone or presentation web app that wants to run full-screen. One way to implement such a web app would be to have a full-screen <div> element where the slides are shown. The developer may want to be able to switch full-screen mode seamlessly between the slide deck <div> and (say) a <video>, and then return to having the slide deck <div> as the full-screen element so that the user can carry on with the presentation. Before this change, if the <video> was in a cross-origin subdocument (like a YouTube embedded player in an <iframe>) returning full-screen mode to the slide deck <div> from the <video> was a two-step process; users would have to fully-exit full-screen, and re-request full-screen mode on the slide deck element. Now developers can simply call Document.mozCancelFullScreen() and seamlessly switch back. The browser won't drop out of full-screen mode during the transition.

Note that if users press the escape key they will always fully-exit full-screen, i.e. Firefox won't restore the previous full-screen element to full-screen state on escape key press. So to seamlessly restore full-screen to the previous full-screen element, developers must explicitly call Document.mozCancelFullScreen(), they can't rely on the user pressing the escape key.

We've also added webconsole logging upon full-screen request failures to Firefox 11, to make debugging denied full-screen requests easier.

Another change coming in Firefox 11 is we'll no longer deny full-screen requests in web pages which contain windowed plugins. Now we'll exit full-screen when a windowed plugin is focused instead (on Windows and Linux, MacOSX is unaffected).

by Chris Pearce (noreply@blogger.com) at December 19, 2011 08:48 PM

Chris Double : Pattern Matching Against Linear Objects in ATS


In a project I’m working on I’m using linear lists. This is the list_vt type in the ATS prelude. list_vt is similar to the list types in Lisp and functional programming languages except it is linear. The memory for the list is not managed by the garbage collector and the type system enforces the rule that only one reference to the linear object can exist. This sometimes requires a bit of extra effort when using pattern matching against the list_vt instances.

Pattern Matching

When pattern matching against linear objects you can do a destructive match or a non-destructive match. The former will destroy and free the memory allocated for the object automatically. The latter will not. Destructive matches are done by having the pattern match clause prefixed with a ~. For example, the following will print an integer list and destroy the list while it does it:

fun print_list (l: List_vt (int)): void =
  case+ l of
  | ~list_vt_nil () => printf("nil\n", @())
  | ~list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))

fun test1 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list (a)
}

Things get complicated when doing non-destructive matches. The following won’t typecheck:

fun print_list2 (l: !List_vt (int)): void =
  case+ l of
  | list_vt_nil () => printf("nil\n", @())
  | list_vt_cons (x, xs) => (printf("cons %d\n", @(x)); print_list(xs))

fun test2 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list2 (a)
  val () = list_vt_free (a)
}

The problem with this example is that when the match is made we are effectively taking the linear object out of the variable l. This leaves l with a different type, but we’ve stated in the function signature for print_list2 that the type is not modified or consumed. We need a way of putting the linear object back into l once we’re done using the match. This primitive to do this is fold@ which I briefly introduced in my linear datatypes post. fold@ will change the type of l back to the original and prevent access to the pattern match variables. Usage looks like this:

fun print_list2 (l: !List_vt (int)): void =
  case+ l of
  | list_vt_nil () => (fold@ l; printf("nil\n", @()))
  | list_vt_cons (x, !xs) => (printf("cons %d\n", @(x)); print_list2(!xs); fold@ l)

fun test2 (): void = {
  val a = list_vt_cons {int} (1, list_vt_nil)
  val () = print_list2 (a)
  val () = list_vt_free (a)
}

You’ll notice with this version that the match for list_vt_cons has changed the xs parameter to be !xs. The second argument in the cons constructor is a linear object. If the object itself is matched against xs then it is another example of aliasing the linear object. It is taken out of the l and needs to be put back. The way ATS handles this is to require pattern matching with a ! prefixed. This makes xs be a pointer to the object rather than the object itself. So in this example xs has the type ptr addr where addr is the address of the actual List_vt object. This is why the xs is prefixed by ! in the recursive call to print_list2. The ! means dereference the pointer, so the List_vt it is pointing to is passed as the argument to the recursive call.

In this way the linear object is never taken out, we only access it via its pointer. The fold@ call in this clause will change xs back to the List_vt object. The fold@ call is done after the usage of !xs. If it was done before then we wouldn’t have access to the view for xs to be able to derefence it. print_list2 is still tail recursive as the fold@ call is only used during typechecking and is erased afterwards.

Filtering a linear list

In my project I needed to filter a linear list. Unfortunately ATS doesn’t have a filter implementation in the standard prelude for linear lists (it does for persistent lists). My first attempt at writing a list_vt_filter looked like:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) =
  case+ l of
  | list_vt_nil () => (fold@ l; list_vt_nil)
  | list_vt_cons (x, !xs) when f (x) => let
                                          val r = list_vt_cons (x, list_vt_filter (!xs, f))
                                        in
                                          fold@ l; r
                                        end
  | list_vt_cons (x, !xs) => let
                                val r = list_vt_filter (!xs, f)
                              in
                                fold@ l; r
                              end

This should look familiar since it’s very similar to the print_list2 code shown previously in the way it uses non-destructive matching and fold@. The function list_vt_filter takes a list_vt as an argument and a function to apply to each element in the list. That function returns true if the element should be included in the result list. Usage looks like:

val a  = list_vt_cons (1, list_vt_cons (2, list_vt_cons (3, list_vt_cons (4, list_vt_nil ()))))
val b  = list_vt_filter (a, lam (x) => x mod 2 = 0)
val () = list_vt_foreach_fun<int> (a, lam(x) =<> $effmask_all (printf("Value: %d\n", @(x))))
val () = list_vt_free (b)
val () = list_vt_free (a)

One issue with this implementation is it is not tail recursive. It has stack growth proportional to the size of the result list.

Tail Recursive Filtering

In Lisp code I’d often build the result list tail recursively by passing an accumulator, with each new element in the result being prepended to the accumulator. This builds a list in the reverse order so before returning it the list would be reversed. The ATS code for this is:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
  fun loop (l: !List_vt (int), accum: List_vt (int)):<cloptr1> List_vt (int)  =
    case+ l of
    | list_vt_nil () => (fold@ l; accum)
    | list_vt_cons (x, !xs) when f (x) => let
                                            val r = loop (!xs, list_vt_cons (x, accum))
                                          in
                                            (fold@ l; r)
                                          end
    | list_vt_cons (x, !xs) => let
                                 val r = loop (!xs, accum)
                                in
                                  (fold@ l; r)
                                end
in
  list_vt_reverse (loop (l, list_vt_nil))
end

The cloptr1 function annotation marks the inner function as being a closure where the memory for the closure’s environment is managed by the compiler using malloc and free instead of the garbage collector (which is what cloref1 would signify). See my post on closures in ATS for more about the different closure and function types used by ATS.

Unfortunately the requirement to use fold@ after we’ve finished with using the pattern matched variables makes the code slightly more verbose as we need to do the tail recursion, obtaining the result, then do the fold@ and return the result. Remember that the fold@ is erased at type checking type which is how this code remains tail recursive even though the code structure makes it look like it isn’t.

One downside to this approach is we iterate over the list twice. Once to build the result, and once over the result to reverse it.

Single Pass Tail Recursive Filtering

The creation of the result list can be done in a single pass if we could create a cons with no second argument, and fill in that argument later when we have a result to store there that passes filtering. ATS allows construction of datatypes with a ‘hole’ that can be filled in later. The ‘hole’ is an unintialized type and we get a pointer to it. An example of doing this is:

var x = list_vt_cons {int} {0} (1, ?)

This creates a list_vt_cons with the data set to 1 but no second parameter. Instead of that parameter being of type List_vt (int) it is of type List_vt (int)?, the ? signifying it is uninitialized. For this example we have to pass the universal type parameters explicitly (the {int} {0}) as the ATS type inference algorithm can’t compute them.

To get a pointer to the ‘hole’ we have to pattern match:

val+ list_vt_cons (_, !xs) = x
val () = !xs := list_vt_nil
val () = fold@ x

In this example the xs is a pointer, pointing to the List_vt (int)?. It assigns a list_vt_nil to this, making the tail of the cons a list_vt_nil. Just like in our previous pattern matching examples using case, the code has to do a fold@ to change the type of x back to that containing a linear object once we’ve finished using xs.

Now that we can get pointers to the tail of the list we can implement a single pass tail recursive filter function:

fun list_vt_filter (l: !List_vt (int), f: int -<> bool): List_vt (int) = let
  fun loop (l: !List_vt (int), res: &List_vt (int)? >> List_vt (int)):<cloptr1> void =
    case+ l of
    | list_vt_nil () => (fold@ l; (res := list_vt_nil))
    | list_vt_cons (x, !xs) when f (x) => let
                                            val () = res := list_vt_cons {int} {0} (x, ?)
                                            val+ list_vt_cons (_, !p_xs) = res
                                           in
                                             loop (!xs, !p_xs); fold@ l; fold@ res
                                           end
    | list_vt_cons  (x, !xs) => (loop (!xs, res); fold@ l)

  var res: List_vt (int)?
  val () = loop (l, res)
in
  res
end

The loop function here no longer turns a result. Instead the result is passed via a reference (the & signifies ‘by reference’). When there is something that needs to be stored in the list, a cons is created with a hole in the tail position. This cons is stored in the result we are passing by reference and we tail recursively call with the hole as the new result. ATS converts this to nice C code that is a simple loop rather than recursive function calls.

Miscellaneous

The code examples in this post use List_vt (a). This is actually a typedef for list_vt (a,n) where a is the type and n is the length of the list. The typedef allows shorter examples without needing to specify the sorts for the list length. Using the full type though has the advantage of being able to specifiy a bit more type safety. For example, the original filter function would be declared as:

fun list_vt_filter {n:nat} (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)

This defines the type of the result as having a length equal to or less than that of the original list. This helps prevent errors in the implementation of the filter - it can’t accidentally leave extra items in the list. I cover this type of thing in my post on dependent types.

Another addition to safety that adding the extra sorts can provide is the ability to check that the function terminates. This can be done by adding a termination metric to the function definition:

fun list_vt_filter {n:nat} .<n>. (l: !list_vt (int,n), f: int -<> bool): [r:nat | r <= n] list_vt (int, r)

The compiler checks that n is decreasing on each recursive call. If this fails to happen the recursive calls may not terminate and it becomes a compile error. This is discussed in the Termination-Checking for Recursive Functions section of the Introduction to Programming in ATS book.

A description of how fold@ works is in the ATS/Anairats User’s Guide PDF. It’s in the ‘Dataviewtypes’ section of the ‘Programming with Linear Types’ chapter and is referred to as folding and unfolding a linear type.

It’s the usage of linear types and dealing with their restrictions that makes my examples a bit more complex. If you use ATS mainly with non-linear types and link with the garbage collector then it becomes very much like using any other functional programming language, but with additional features in the type system. My interest has been around avoiding using a garbage collector and having the compiler give errors when memory is not allocated or free’d correctly. Don’t be put off from using ATS by these complex examples if you’re fine with using garbage collection and non-linear datatypes. You might never need to deal with the cases that bring in the extra complexity.

December 16, 2011 05:00 AM

Ben Schwartz : Route9.js


I was really impressed by Michael Bebenita’s Broadway.js, the recent port of an H.264 decoder to pure Javascript using Emscripten, a LLVM-based C-to-JS converter … but of course this is the opposite of what we want! Who needs H.264? We want WebM!

I’ve spent the past few weekends digging into Broadway.js, stripping out the H.264 bits and replacing them with libvpx and libnestegg. Now it’s working, to a degree. You can see it for yourself at the demo page (so far tested only in Firefox 7…).

I’m not going to be able to take this much further … at least not right now. It’s been a fun exercise though. I invite all interested comers to read some more details and then fork the repo.

Take this thing, and make it your own.

Reactions: Hacker News, r/programming, and BadassJS, Twitter.

by Ben at November 30, 2011 05:18 AM

Chris Pearce : Firefox's HTML full-screen API enabled in Nightly builds


A few days ago I enabled the HTML full-screen API in Firefox nightly builds. This enables developers to make an arbitrary HTML element "full-screen", hiding the browser's UI and stretching the element to encompass the entire screen. This will be particularly useful for HTML5 video and games.

If all goes well, this feature will ship in Firefox 10 at the end of January.

The API has changed slightly since I last blogged about it. The current API is Mozilla-specific, but is similar to the W3C's Fullscreen draft specification.

To enter full-screen mode, call the following method on the HTML Element you'd like to enter full-screen:
  • void mozRequestFullScreen() : posts an asynchronous request to make the HTML element the full-screen element. If the request is granted, some time later a bubbling "mozfullscreenchange" event is dispatched to the element which requested full-screen. If the request is denied, a "mozfullscreenerror" event is dispatched to the element's owning document. We only grant requests for full-screen when:
    • mozRequestFullScreen() is called in a user-generated event handler, e.g. a mouse click handler, and
    • the requesting element is in its document, and
    • there are no windowed plugins present in any document/iframe in the current page, and
    • all iframes containing the requesting element (if any) have the mozallowfullscreen attribute.
We added the following method and attributes to HTML Document:
  • void mozCancelFullScreen() : exits the document from full-screen mode. This dispatches a "mozfullscreenchange" event to the document containing the (now former) full-screen element. Note that the "mozfullscreenchange" event which is dispatched when you enter full-screen is targeted at the full-screen element, so if you want to receive the "mozfullscreenchange" on both entering and exiting full-screen in the same listener you should add your listener to the document, rather than the full-screen element.
  • readonly attribute boolean mozFullScreen : true when the document is in full-screen mode.
  • readonly attribute Element mozFullScreenElement : reference to the current full-screen element.
  • readonly attribute boolean mozFullScreenEnabled : returns true if calls to mozRequestFullScreen() would be granted in the current document. This returns false if there are any windowed plugins present in any document/iframe in the current page, or if any iframes containing this document don't have the mozallowfullscreen attribute present, or if the user has disabled the API by preference. If this returns false you may want to not show the user your enter-full-screen button in your page, since you know it won't work!
We also added the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

We added the mozallowfullscreen attribute to iframe elements. Without this, full-screen requests made by script in the iframe's content (i.e embedded ads, or a YouTube player in an iframe for that matter) will be denied.

While in full-screen mode, the user can press the ESC key (or F11) to exit. Alpha-numeric keyboard input while in full-screen mode causes a warning message to pop-up to guard against phishing attacks. The only key input which doesn't cause the warning message to pop up are: left, right, up, down, space, shift, control, alt, page up, page down, end, home, tab, and meta.

Navigating, changing tab, changing app (ALT+TAB) while in full-screen mode will cause full-screen mode to exit.

Here's about a simple example, which will work in current Firefox nightly builds:




(Press ESC to exit full-screen)

The code for that button's onclick handler is simply:
document.getElementById('bruce_video').mozRequestFullScreen();

How is Firefox's full-screen API different from Webkit/Chrome/Safari's full-screen API? Firefox's API adds a "width: 100%; height: 100%;" CSS rule to the element which requests full-screen, so that it's stretched to occupy the entire screen. Chrome's API does not do this, but instead it centers the full-screen element in the window and blacks-out the underlying webpage. So the full-screen element won't occupy the entire screen with Chrome's API unless you specify a "width: 100%; height: 100%;" rule yourself. Conversely if you want to vertically and horizontally center something while in full-screen with Firefox's API, you need to make the containing element of your desired centered element full-screen instead, and apply CSS rules to vertically and horizontally center the contained element.

For a cross-browser full-screen API example, see html5-demos.appspot.com's full-screen demo.

Edit: 11 Nov 2011, clarified Document.mozCancelFullScreen() and Document.mozFullScreenEnabled, fixed typos.

by Chris Pearce (noreply@blogger.com) at November 10, 2011 11:01 PM

Chris Pearce : Mozilla full-screen API progress update


Update 10 November 2011: the full-screen API has been changed slightly and enabled in Firefox Nightly builds, see http://blog.pearce.org.nz/2011/11/firefoxs-html-full-screen-api-enabled.html for details.

I've been working on implementing Robert O'Callahan's HTML full-screen API proposal in Firefox (bug 545812). Support for the base API has landed, disabled by default, in Firefox nightly builds. To enable the full-screen API, set the pref full-screen-api.enabled to true.

We have implemented a general purpose full-screen API which can make any HTML element the full-screen element (it seems WebKit based browsers' full-screen API allow only making <video> elements full-screen).

This feature makes the following API changes to HTML Element:
  1. void mozRequestFullScreen() : makes an HTML element the full-screen element. Causes browser chrome to hide, and expands the element to encompass the entire screen. Upon success, this dispatches a "mozfullscreenchange" event to the requesting full-screen element, or the element's owner document if the element is not in a document. We only grant requests for full-screen when running in user-generated event handlers, e.g. a mouse click handler.
This feature makes the following API changes to HTML Document:
  1. void mozCancelFullScreen() : exits the document from full-screen mode.
  2. readonly attribute mozFullScreen : true when the document is in full-screen mode.
  3. readonly attribute mozFullScreenElement : reference to the current full-screen element, if it's in the current document.
This feature adds the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

For a request for full-screen to be granted in content inside an iframe, the containing iframe needs to have the mozallowfullscreen attribute present. This is a boolean attribute, so the attribute only needs to be present, it doesn't matter what value it's set to.

Keyboard input is restricted in full-screen mode. When alpha-numeric key input occurs in full-screen mode, full-screen mode immediately exits. This is to help protect against phishing attacks.

We also plan to deny requests for full-screen mode when windowed plugins are present (since we can't easily monitor key events to windowed plugins on non-MacOSX platforms). We will exit full-screen mode when a windowed plugin is added to a document as well. I have a patch for this, but its dependencies haven't landed yet.

Work remaining to be done before this can be enabled:
  1. Adding a warning message when we enter DOM full-screen mode (on desktop Firefox, and on Fennec too).
  2. Making the full-screen API work in multi-process Firefox/Fennec (bug 684620). This requires a way of getting the PBrowserParent from C++ in the chrome process to be implemented, there's not a way to do that yet unfortunately.
  3. Make change/open tab cause full-screen mode to exit (bug 685402).
  4. A security review must be completed, and concerns raised there must be addressed. This could involve changing the API.
We also want a clearer transition effect when entering full-screen, to somehow show the full-screen element "stretching out" to encompass the screen.

You can test out our work-in-progress full-screen implementation, by grabbing the latest Firefox nightly build, setting the pref full-screen-api.enabled to true, and pointing your browser at my not-very-exciting full-screen API demo page.

by Chris Pearce (noreply@blogger.com) at November 09, 2011 10:43 PM

Chris Double : Using Records and C structs in ATS


ATS has record types which are like tuples but each item is referenced via a name. They closely approximate C structs and in the generated C code are represented in this way. The following uses a record to hold x and y values representing a point on a 2D plane:

fun print_point (p: @{x= int, y= int}): void =
  printf("%d@%d\n", @(p.x, p.y))

implement main () = {
  val p1 = @{x=10, y=20}
  val () = print_point (p1)
}

A literal record object in this example is created using the @{ ... } syntax. Dereferencing record fields is done using ’.’. The generated C code shows that the representation for the record is a C struct:

typedef struct {
  ats_int_type atslab_x ;
  ats_int_type atslab_y ;
} anairiats_rec_0 ;

The @ syntax used for the record literal and type marks the record as a ‘flat’ record. Flat records have value semantics and variables of this type have a size in bytes equivalent to the size of the underlying C structure. This is shown by the generated C code for the main function above:

ATSlocal (anairiats_rec_0, tmp4) ;

__ats_lab_mainats:
tmp4.atslab_x = 10 ;
tmp4.atslab_y = 20 ;

/* tmp3 = */ print_point_0 (tmp4) ;

Here the variable tmp4 is the p1 in our ATS code. It is an instance of the C struct representing the record and is created on the stack. It is initialized and passed to the print_point function by value:

ats_void_type
print_point_0 (anairiats_rec_0 arg0) {
  ATSlocal (ats_int_type, tmp1) ;
  ATSlocal (ats_int_type, tmp2) ;

  ...
}

Records can also be defined using a '{...} syntax (note the ' instead of @) for boxed records which are heap allocated and the memory managed by the garbage collector. Boxed records have pointer semantics and have a size in bytes equivalent to the size of a pointer:

implement main () = {
  val x = sizeof<@{x=int}>
  val y = sizeof<'{x=int}>
  val a = int_of_size x
  val b = int_of_size y
  val () = printf ("%d %d\n", @(a, b))
}

This outputs “4 8” showing the flat type as having the size of an int and the boxed type having the size of a pointer. Most usage of records in ATS I’ve done is using flat records as I tend to avoid the use of the garbage collector.

To pass a reference to a flat record so it can be modified by a function you need to mark the function argument as ‘by reference’ using &:

typedef point = @{x= int, y= int}

fun print_point (p: point): void =
  printf("%d@%d\n", @(p.x, p.y))

fun add1 (p: &point): void = {
  val () = p.x := p.x + 1
  val () = p.y := p.y + 1
}

implement main () = {
  var p1 = @{x=10, y=20}
  val () = add1 (p1)
  val () = print_point (p1)
}

In this example I use typedef to create a type alias for the record so I can refer to the type as point. The add1 function takes a point by reference (as indicated by the & prefix). This works like C++ reference arguments. The function effectively takes a pointer to an instance of the struct and can modify the instance passed to the function. For this to work the point passed to the function must be an lvalue. That is, it must be mutable. This is done in ATS by making it a var vs a val.

Note that it’s a type error to create an uninitialized point object and pass it to add1. For example, the following gives a type check error:

implement main () = {
  var p1: point?
  val () = add1 (p1)
  val () = print_point (p1)
}

Types that are unintialized have a ? suffix added to it. The type of p1, due to it being unintialized, is point?. Since add1 takes a point this fails type checking. Initializing it allows it to pass:

implement main () = {
  var p1: point
  val () = p1.x := 5
  val () = p1.y := 10
  val () = add1 (p1)
  val () = print_point (p1)
}

As well as passing by reference to functions you can pass a pointer and deal with the pointer management directly. This requires using the proof system and I hope to go through dealing with pointers and their associated proofs in a later post:

fun add1 {l:agz} (pf: !point @ l | p: ptr l): void = {
  val () = p->x := p->x + 1
  val () = p->y := p->y + 1
}

implement main () = {
  var p1 = @{x=10, y=20}
  val () = add1 (view@ p1 | &p1)
  val () = print_point (p1)
}

When interfacing with C API’s you often have to deal with C structures. In ATS you can declare a type as being defined as a struct in C with a matching ATS record definition so that ATS can be used to access the struct. The following example shows a struct declared in C, a function that uses it in C, and how this is wrapped in ATS:

%{^
typedef struct Point {
  int x;
  int y;
} Point;

void print_point (Point* p) {
  printf("%d@%d\n", p->x, p->y);
}
%}

typedef point = $extype_struct "Point" of {x= int, y= int}
extern fun print_point (p: &point): void = "mac#print_point"

implement main () = {
  var p1: point
  val () = p1.x := 10;
  val () = p1.y := 20;
  val () = print_point (p1)
}

The $extype_struct keyword creates a type that is represented by a C struct with the given name. By using the of {x= int, y= int} suffix we define the record layout as seen via ATS. This will stop ATS from creating its own structure to map the type and instead uses the C structure. The generated C code for the main function looks like:

ATSlocal (Point, tmp1) ;

__ats_lab_mainats:
/* Point tmp1 ; */
ats_select_mac(tmp1, x) = 10 ;
ats_select_mac(tmp1, y) = 20 ;
print_point ((&tmp1)) ;

Note it uses the C struct type directly. I use this approach in my 0MQ wrapper to wrap the zmq_msg_t and zmq_pollitem_t structures.

November 01, 2011 05:00 AM

Chris Double : Performing Compatible Updates of Mozilla Central Git Forks


Earlier this year I posted about how I created my git mirror of the mozilla-central mercurial repository. I’ve been keeping a fork on github updated regularly since then that a number of people have started using.

Users of my mirror have wanted to be able to keep up to date with the mercurial mozilla-central repository themselves, or add updates from other branches of the Mozilla source. It takes a large amount of time to start a conversion from scratch but it’s possible to start from the existing git mirror, perform the incremental updates from mercurial yourself, and still stay compatible with the incremental updates I push to my github mirror.

hg-git uses a git-mapfile in the .hg directory of the mercurial clone to keep track of the mapping between mercurial and git commits. By using the same git-mapfile that I use for my mirror you can start your own incremental update from the latest data in the git-mapfile instead of from the beginning. I keep an up-to-date git-mapfile in git-mapfile.bz2. It’s compressed with bzip2 as the uncompressed version is quite large.

The steps to start your own update process are:

  1. Clone the mercurial mozilla-central repository
  2. Clone my git mirror as a bare repository in .hg/git
  3. Place the git-mapfile in .hg
  4. Do an hg bookmark -f -r default master to mark the commit to convert to
  5. Perform hg gexport to update .hg/git with recent mercurial commits

The .hg/git directory should now be up-to-date with respect to the mercurial repository. And the additional git commits will have the same SHA id as any commits I push into my mirror when I perform my own update.

The steps to do an incremental update are the normal:

  1. Pull from the mercurial mozilla-central repository
  2. Run hg bookmark -f -r default master
  3. Run hg gexport
  4. Push or pull to/from .hg/git as needed

As a working example, the following shell commands on a Linux system should set up your own repository ready for incremental updating:

$ hg clone https://hg.mozilla.org/mozilla-central
$ cd mozilla-central/.hg
$ git clone --bare git://github.com/doublec/mozilla-central.git git
$ wget http://www.bluishcoder.co.nz/git-mapfile.bz2
$ bunzip2 git-mapfile.bz2
$ cd ..
$ hg bookmark -f -r default master
$ hg gexport
$ cd .hg/git
$ git push --all ~/some/other/repo.git

Incremental updates just repeat the last few commands above:

$ hg pull -u
$ hg bookmark -f -r default master
$ hg gexport
$ cd .hg/git
$ git push --all ~/some/other/repo.git

Note that I use hg gexport and push from the .hg/git repository using the git command instead of doing an hg push and relying on hg-git to do the conversion and push. In my original article I did the latter. The hg-git push method uses slower python based routines to do the push which can take a long time and large amounts of memory on big repositories like mozilla-central. By splitting this up into gexport and using the native git command I save a lot of time and memory.

October 28, 2011 03:00 AM

Chris Double : Building Rust


Update 2011-11-08 - The steps to build Rust have changed since I wrote this post, there’s now no need to build LLVM - it’s included as a submodule in the Rust git repository and will automatically be cloned and built.

A while ago I wrote a quick look at Rust post, describing how to build it and run some simple examples. Since that post the bootstrap compiler has gone away and the rustc compiler, written in Rust, is the compiler to use.

The instructions for building Rust in the wiki are good but I’ll briefly go through how I installed things on 64-bit Linux. The first step is to build the required version of LLVM from the LLVM source repository. I use a git mirror and install to a local directory so as not to clash with other programs that use older LLVM versions.

$ git clone git://github.com/earl/llvm-mirror.git
$ cd llvm-mirror
$ git reset --hard 9af578
$ CXX='g++ -m32' CC='gcc -m32' CFLAGS=-m32 CXXFLAGS=-m32 LDFLAGS=-m32 ./configure \
    --{build,host,target}=i686-unknown-linux-gnu \
    --enable-targets=x86,x86_64,cbe --enable-optimized --prefix=/home/chris/rust
$ make && make install

Note the use of prefix to ensure this custom LLVM build is a local install. I also do a git reset to go to the commit for SVN revision 142082 which is the current version that works with Rust according to the wiki. After installation add the install bin to the PATH and lib to LD_LIBRARY_PATH:

$ export PATH=~/rust/bin:$PATH
$ export LD_LIBRARY_PATH=~/rust/lib:$LD_LIBRARY_PATH

Next step, clone and build Rust:

$ git clone git://github.com/graydon/rust
$ mkdir build
$ cd build
$ ../rust/configure
$ make

The build process downloads an existing build for your platform to bootstrap from. It uses this to build a stage1 version of the compiler. This stage1 is used to build a stage2, and that is then used to build a stage3. All the built compilers should work exactly the same if things are working correctly. The compiler can be run within the build directory, or outside it if you put the stage3/bin directory in the path:

$ export PATH=~/path/to/build/stage3/bin:$PATH
$ rustc
error: No input filename given.

Test with a simple ‘hello world’ program:

$ cat hello.rs
use std;
import std::io::print;

fn main () {
  print ("hello\n");
}
$ rustc hello.rs
$ ./hello
hello

October 24, 2011 05:00 AM

Chris Double : Overloading Functions in ATS


ATS allows overloading of functions where the function that is called is selected based on number of arguments and the type of the arguments. The following example shows overloading based on type to provide a generic print function:

symintr myprint
fun myprint_integer (a: int)    = printf("%d\n", @(a))
fun myprint_double  (a: double) = printf("%f\n", @(a))
fun myprint_string  (a: string) = printf("%s\n", @(a))

overload myprint with myprint_integer
overload myprint with myprint_double
overload myprint with myprint_string

implement main() = {
  val () = myprint ("hello")
  val () = myprint (10)
  val () = myprint (20.0)
}

The keyword symintr introduces a symbol that can be overloaded. The keyword overload will overload that symbol with an existing function. In this example we overload with three functions that take different types. The overload resolution is performed at compile time. The actual C code generated for the main function includes:

/* tmp4 = */ myprint_string_2 (ATSstrcst("hello")) ;
/* tmp5 = */ myprint_integer_0 (10) ;
/* tmp3 = */ myprint_double_1 (20.0) ;

The ATS standard prelude includes a print function that is overloaded in this manner for most of the standard types. One downside to the way overloading works is the overload resolution sometimes fails in template functions. The following code gives a compile error for example:

fun {a:t@ype} printme (x: a) = print(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

The error given is that the symbol print cannot be resolved. The ATS compiler attempts to resolve the overload by looking up the t@ype sort. There is no overload for this so the resolution fails. This can be worked around using a template function to call the overloaded function and partially specialize the implementation of the new template function. The following code demonstrates this:

extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)

fun {a:t@ype} printme (x: a):void = gprint<a>(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

The print symbol can be overloaded with the new gprint function to allow print to be called over t@ype sorts:

extern fun {a:t@ype} gprint (x: a):void
implement gprint<int> (x) = print_int(x)
implement gprint<double> (x) = print_double(x)

overload print with gprint

fun {a:t@ype} printme (x: a) = print(x)

implement main() = {
  val () = printme (10)
  val () = printme (20.0)
}

This example is contrived in that you could just specialize printme but in real world code this issue comes up occasionally. The most common example for me has been using = in a template function, comparing arguments that are template type parameters. = is an overloaded function and the overload lookup fails in the same manner as above. A workaround is to create an equals template function specialized over the types you plan to compare as above.

October 19, 2011 02:00 AM

Silvia Pfeiffer : Open Media Developers Track at OVC 2011


The Open Video Conference that took place on 10-12 September was so overwhelming, I’ve still not been able to catch my breath! It was a dense three days for me, even though I only focused on the technology sessions of the conference and utterly missed out on all the policy and content discussions.

Roughly 60 people participated in the Open Media Software (OMS) developers track. This was an amazing group of people capable and willing to shape the future of video technology on the Web:

  • HTML5 video developers from Apple, Google, Opera, and Mozilla (though we missed the NZ folks),
  • codec developers from WebM, Xiph, and MPEG,
  • Web video developers from YouTube, JWPlayer, Kaltura, VideoJS, PopcornJS, etc.,
  • content publishers from Wikipedia, Internet Archive, YouTube, Netflix, etc.,
  • open source tool developers from FFmpeg, gstreamer, flumotion, VideoLAN, PiTiVi, etc,
  • and many more.

To provide a summary of all the discussions would be impossible, so I just want to share the key take-aways that I had from the main sessions.

WebRTC: Realtime Communications and HTML5

Tim Terriberry (Mozilla), Serge Lachapelle (Google) and Ethan Hugg (CISCO) moderated this session together (slides). There are activities both at the W3C and at IETF – the ones at IETF are supposed to focus on protocols, while the W3C ones on HTML5 extensions.

The current proposal of a PeerConnection API has been implemented in WebKit/Chrome as open source. It is expected that Firefox will have an add-on by Q1 next year. It enables video conferencing, including media capture, media encoding, signal processing (echo cancellation etc), secure transmission, and a data stream exchange.

Current discussions are around the signalling protocol and whether SIP needs to be required by the standard. Further, the codec question is under discussion with a question whether to mandate VP8 and Opus, since transcoding gateways are not desirable. Another question is how to measure the quality of the connection and how to report errors so as to allow adaptation.

What always amazes me around RTC is the sheer number of specialised protocols that seem to be required to implement this. WebRTC does not disappoint: in fact, the question was asked whether there could be a lighter alternative than to re-use dozens of years of protocol development – is it over-engineered? Can desktop players connect to a WebRTC session?

We are already in a second or third revision of this part of the HTML5 specification and yet it seems the requirements are still being collected. I’m quietly confident that everything is done to make the lives of the Web developer easier, but it sure looks like a huge task.

The Missing Link: Flash to HTML5

Zohar Babin (Kaltura) and myself moderated this session and I must admit that this session was the biggest eye-opener for me amongst all the sessions. There was a large number of Flash developers present in the room and that was great, because sometimes we just don’t listen enough to lessons learnt in the past.

This session gave me one of those aha-moments: it the form of the Flash appendBytes() API function.

The appendBytes() function allows a Flash developer to take a byteArray out of a connected video resource and do something with it – such as feed it to a video for display. When I heard that Web developers want that functionality for JavaScript and the video element, too, I instinctively rejected the idea wondering why on earth would a Web developer want to touch encoded video bytes – why not leave that to the browser.

But as it turns out, this is actually a really powerful enabler of functionality. For example, you can use it to:

  • display mid-roll video ads as part of the same video element,
  • sequence playlists of videos into the same video element,
  • implement DVR functionality (high-speed seeking),
  • do mash-ups,
  • do video editing,
  • adaptive streaming.

This totally blew my mind and I am now completely supportive of having such a function in HTML5. Together with media fragment URIs you could even leave all the header download management for resources to the Web browser and just request time ranges from a video through an appendBytes() function. This would be easier on the Web developer than having to deal with byte ranges and making sure that appropriate decoding pipelines are set up.

Standards for Video Accessibility

Philip Jagenstedt (Opera) and myself moderated this session. We focused on the HTML5 track element and the WebVTT file format. Many issues were identified that will still require work.

One particular topic was to find a standard means of rendering the UI for caption, subtitle, und description selection. For example, what icons should be used to indicate that subtitles or captions are available. While this is not part of the HTML5 specification, it’s still important to get this right across browsers since otherwise users will get confused with diverging interfaces.

Chaptering was discussed and a particular need to allow URLs to directly point at chapters was expressed. I suggested the use of named Media Fragment URLs.

The use of WebVTT for descriptions for the blind was also discussed. A suggestion was made to use the voice tag <v> to allow for “styling” (i.e. selection) of the screen reader voice.

Finally, multitrack audio or video resources were also discussed and the @mediagroup attribute was explained. A question about how to identify the language used in different alternative dubs was asked. This is an issue because @srclang is not on audio or video, only on text, so it’s a missing feature for the multitrack API.

Beyond this session, there was also a breakout session on WebVTT and the track element. As a consequence, a number of bugs were registered in the W3C bug tracker.

WebM: Testing, Metrics and New features

This session was moderated by John Luther and John Koleszar, both of the WebM Project. They started off with a presentation on current work on WebM, which includes quality testing and improvements, and encoder speed improvement. Then they moved on to questions about how to involve the community more.

The community criticised that communication of what is happening around WebM is very scarce. More sharing of information was requested, including a move to using open Google+ hangouts instead of Google internal video conferences. More use of the public bug tracker can also help include the community better.

Another pain point of the community was that code is introduced and removed without much feedback. It was requested to introduce a peer review process. Also it was requested that example code snippets are published when new features are announced so others can replicate the claims.

This all indicates to me that the WebM project is increasingly more open, but that there is still a lot to learn.

Standards for HTTP Adaptive Streaming

This session was moderated by Frank Galligan and Aaron Colwell (Google), and Mark Watson (Netflix).

Mark started off by giving us an introduction to MPEG DASH, the MPEG file format for HTTP adaptive streaming. MPEG has just finalized the format and he was able to show us some examples. DASH is XML-based and thus rather verbose. It is covering all eventualities of what parameters could be switched during transmissions, which makes it very broad. These include trick modes e.g. for fast forwarding, 3D, multi-view and multitrack content.

MPEG have defined profiles – one for live streaming which requires chunking of the files on the server, and one for on-demand which requires keyframe alignment of the files. There are clear specifications for how to do these with MPEG. Such profiles would need to be created for WebM and Ogg Theora, too, to make DASH universally applicable.

Further, the Web case needs a more restrictive adaptation approach, since the video element’s API is already accounting for some of the features that DASH provides for desktop applications. So, a Web-specific profile of DASH would be required.

Then Aaron introduced us to the MediaSource API and in particular the webkitSourceAppend() extension that he has been experimenting with. It is essentially an implementation of the appendBytes() function of Flash, which the Web developers had been asking for just a few sessions earlier. This was likely the biggest announcement of OVC, alas a quiet and technically-focused one.

Aaron explained that he had been trying to find a way to implement HTTP adaptive streaming into WebKit in a way in which it could be standardised. While doing so, he also came across other requirements around such chunked video handling, in particular around dynamic ad insertion, live streaming, DVR functionality (fast forward), constraint video editing, and mashups. While trying to sort out all these requirements, it became clear that it would be very difficult to implement strategies for stream switching, buffering and delivery of video chunks into the browser when so many different and likely contradictory requirements exist. Also, once an approach is implemented and specified for the browser, it becomes very difficult to innovate on it.

Instead, the easiest way to solve it right now and learn about what would be necessary to implement into the browser would be to actually allow Web developers to queue up a chunk of encoded video into a video element for decoding and display. Thus, the webkitSourceAppend() function was born (specification).

The proposed extension to the HTMLMediaElement is as follows:

partial interface HTMLMediaElement {
  // URL passed to src attribute to enable the media source logic.
  readonly attribute [URL] DOMString webkitMediaSourceURL;

  bool webkitSourceAppend(in Uint8Array data);

  // end of stream status codes.
  const unsigned short EOS_NO_ERROR = 0;
  const unsigned short EOS_NETWORK_ERR = 1;
  const unsigned short EOS_DECODE_ERR = 2;

  void webkitSourceEndOfStream(in unsigned short status);

  // states
  const unsigned short SOURCE_CLOSED = 0;
  const unsigned short SOURCE_OPEN = 1;
  const unsigned short SOURCE_ENDED = 2;

  readonly attribute unsigned short webkitSourceState;
};

The code is already checked into WebKit, but commented out behind a command-line compiler flag.

Frank then stepped forward to show how webkitSourceAppend() can be used to implement HTTP adaptive streaming. His example uses WebM – there are no examples with MPEG or Ogg yet.

The chunks that Frank’s demo used were 150 video frames long (6.25s) and 5s long audio. Stream switching only switched video, since audio data is much lower bandwidth and more important to retain at high quality. Switching was done on multiplexed files.

Every chunk requires an XHR range request – this could be optimised if the connections were kept open per adaptation. Seeking works, too, but since decoding requires download of a whole chunk, seeking latency is determined by the time it takes to download and decode that chunk.

Similar to DASH, when using this approach for live streaming, the server has to produce one file per chunk, since byte range requests are not possible on a continuously growing file.

Frank did not use DASH as the manifest format for his HTTP adaptive streaming demo, but instead used a hacked-up custom XML format. It would be possible to use JSON or any other format, too.

After this session, I was actually completely blown away by the possibilities that such a simple API extension allows. If I wasn’t sold on the idea of a appendBytes() function in the earlier session, this one completely changed my mind. While I still believe we need to standardise a HTTP adaptive streaming file format that all browsers will support for all codecs, and I still believe that a native implementation for support of such a file format is necessary, I also believe that this approach of webkitSourceAppend() is what HTML needs – and maybe it needs it faster than native HTTP adaptive streaming support.

Standards for Browser Video Playback Metrics

This session was moderated by Zachary Ozer and Pablo Schklowsky (JWPlayer). Their motivation for the topic was, in fact, also HTTP adaptive streaming. Once you leave the decisions about when to do stream switching to JavaScript (through a function such a wekitSourceAppend()), you have to expose stream metrics to the JS developer so they can make informed decisions. The other use cases is, of course, monitoring of the quality of video delivery for reporting to the provider, who may then decide to change their delivery environment.

The discussion found that we really care about metrics on three different levels:

  • measuring the network performance (bandwidth)
  • measuring the decoding pipeline performance
  • measuring the display quality

In the end, it seemed that work previously done by Steve Lacey on a proposal for video metrics was generally acceptable, except for the playbackJitter metric, which may be too aggregate to mean much.

Device Inputs / A/V in the Browser

I didn’t actually attend this session held by Anant Narayanan (Mozilla), but from what I heard, the discussion focused on how to manage permission of access to video camera, microphone and screen, e.g. when multiple applications (tabs) want access or when the same site wants access in a different session. This may apply to real-time communication with screen sharing, but also to photo sharing, video upload, or canvas access to devices e.g. for time lapse photography.

Open Video Editors

This was another session that I wasn’t able to attend, but I believe the creation of good open source video editing software and similar video creation software is really crucial to giving video a broader user appeal.

Jeff Fortin (PiTiVi) moderated this session and I was fascinated to later see his analysis of the lifecycle of open source video editors. It is shocking to see how many people/projects have tried to create an open source video editor and how many have stopped their project. It is likely that the creation of a video editor is such a complex challenge that it requires a larger and more committed open source project – single people will just run out of steam too quickly. This may be comparable to the creation of a Web browser (see the size of the Mozilla project) or a text processing system (see the size of the OpenOffice project).

Jeff also mentioned the need to create open video editor standards around playlist file formats etc. Possibly the Open Video Alliance could help. In any case, something has to be done in this space – maybe this would be a good topic to focus next year’s OVC on?

Monday’s Breakout Groups

The conference ended officially on Sunday night, but we had a third day of discussions / hackday at the wonderful New York Lawschool venue. We had collected issues of interest during the two previous days and organised the breakout groups on the morning (Schedule).

In the Content Protection/DRM session, Mark Watson from Netflix explained how their API works and that they believe that all we need in browsers is a secure way to exchange keys and an indicator of protection scheme is used – the actual protection scheme would not be implemented by the browser, but be provided by the underlying system (media framework/operating system). I think that until somebody actually implements something in a browser fork and shows how this can be done, we won’t have much progress. In my understanding, we may also need to disable part of the video API for encrypted content, because otherwise you can always e.g. grab frames from the video element into canvas and save them from there.

In the Playlists and Gapless Playback session, there was massive brainstorming about what new cool things can be done with the video element in browsers if playback between snippets can be made seamless. Further discussions were about a standard playlist file formats (such as XSPF, MRSS or M3U), media fragment URIs in playlists for mashups, and the need to expose track metadata for HTML5 media elements.

What more can I say? It was an amazing three days and the complexity of problems that we’re dealing with is a tribute to how far HTML5 and open video has already come and exciting news for the kind of applications that will be possible (both professional and community) once we’ve solved the problems of today. It will be exciting to see what progress we will have made by next year’s conference.

Thanks go to Google for sponsoring my trip to OVC.

UPDATE: We actually have a mailing list for open media developers who are interested in these and similar topics – do join at http://lists.annodex.net/cgi-bin/mailman/listinfo/foms.

by silvia at October 11, 2011 04:12 AM

Silvia Pfeiffer : WebVTT at W3C


Today we started a community group (CG) at the W3C for “Web Media Text Tracks”: http://www.w3.org/community/texttracks/.

The group has been created to work on many aspects of video text tracks of which captioning and the WebVTT format are key parts.

The main reason behind creating this group is to create a forum at the W3C for working on WebVTT to allow all browsers to support this format and be involved in its development.

We’ve not gone the full way to creating a Working Group, although that was the initial intention. We had objections from W3C members for going down that path, so are using the CG path for now.

This is actually a good thing because CGs are open for anyone to join, while WGs are only open to W3C members. The key difference is that specs coming out of WGs can become RECs (“standards”), while CG’s specs cannot.

If we eventually see a need to move WebVTT to a REC, that move will be straight forward, since there is a clear path for work to transition from a CG to a WG.

by silvia at September 30, 2011 06:59 AM

Silvia Pfeiffer : 3rd W3C Web and TV Workshop, Hollywood


Curious about any new requirements that the TV community may have for HTML5 video, I attended the W3C Web and TV Workshop in Hollywood last week. It’s already the third of its kind and was also the largest to date showing an increasing interest of the TV community to converge with the Web community.

The Workshop Aim

I went into the Workshop not quite knowing what to expect. My previous contact with members of this community was restricted to email exchanges on the W3C Web and TV Interest Group (IG) mailing list. I knew there was some interest in video accessibility (well: particularly captions) and little knowledge of existing HTML5 specifications around text tracks and why the browsers were going with WebVTT. So I had decided to attend the workshop to get a better understanding of the community, it’s background, needs, and issues, and to hopefully teach some of the ways of HTML5. For that reason I had also submitted a WebVTT presentation/demo.

As it turned out, the workshop had as its key target the facilitation of communication between the TV and the HTML5 community. The aim was to identify features that need to be added to the HTML5 video element to satisfy the needs of the TV community. I obviously came to the right workshop.

The process that is being used by the W3C in the Interest Group is to have TV community members express their needs, then have HTML5 experts express how these needs can be satisfied with existing HTML5 features, then make trial implementations and identify any shortcomings, then move forward to progress these through HTML5 or HTML.next. This workshop clearly focused on the first step: expressing needs.

Often times it was painful for me to watch presenters defending their requirements and trying to impress on the audience how important a certain feature is to them when that features actually already has a HTML5 specification, but just not yet a browser implementations. That there were so few HTML5 video experts present and that they were given very little space to directly reply to the expressed needs and actually explain what is already possible (or specified to be possible) was probably one of the biggest drawbacks of the workshop.

To be fair, detailed technical discussions were not possible in a room with 150 attendees with a panel sitting at the front discussing topics and taking questions. Solving a use case with existing HTML5 markup and identifying the gaps requires smaller break-out groups of a maximum of maybe 20 people and sufficient HTML5 knowledge in the room. Ultimately they require a single person to try to implement it using JavaScript alone, and, failing that, writing browser extensions. Only such code actually proves that a feature is missing.

Now, the video features of HTML5 are still continuing to change almost on a daily basis. Much development is, for example, happening around real-time communication features and around the track element as we speak. So, focusing on further requirements finding around HTML5 video for now is probably a good thing.

The TV Community Approach

Before I move on to some of the topics covered by the workshop, I have to express some concern about the behaviour that I observed with lots of the TV community folks. Many people tried pushing existing solutions from other spaces into the Web unchanged with a claim of not re-inventing the wheel and following paved cowpaths, which are some of the underlying design principles for HTML5. I can understand where such behaviour originates thinking that having solved the same problems elsewhere before, those solutions should apply here, too. But I would like to warn people of this approach.

If we blindly apply solutions that were not developed for HTML5 into HTML we will end up with suboptimal solutions that will hurt us further down the track. The principles of not re-inventing the wheel and following paved cowpaths were introduced for features that were already implemented by browsers or in de-facto standard use by JavaScript libraries. They were not created for new features in HTML. The video element is a completely new feature in HTML thus everything around it is new.

I would therefore like to see some more respect given to HTML5 and the complexities involved in finding the best possible technical solutions for the Web given that the video element does not stand alone in HTML5, but is part of a much larger picture of technical capabilities on the Web where many of the requested features for TV applications may already be solved by existing HTML markup that is not part of the video element.

Also, HTML5 is not just about the HTML markup, but also about CSS and JavaScript and HTTP. There are several layers of technology involved in creating a Web application: not only a separation of work between client and servers, but also between the Operating System, the media framework, the browser, browser plugins, and JavaScript has to be balanced. To get this balance right is a fine art that will take many discussion, many experiments and sometimes several design approaches. We need patience and calm to work through this, not a rushed adoption of existing solutions from other spaces.

New Requirements

Now let’s get to the take-aways I had from the workshop’s sessions:

Session 1 / Content Provider and Consumer Perspective:

The sessions participants postulate that we will see the creation of application stores for TV applications similar to how we have experienced this for mobile phones and tablets. People enjoy collecting apps like they collect badges. Right now, the app store domain is dominated by native apps and now Web apps. The reason is that we haven’t got a standard platform for setting up Web app stores with Web apps that work in all browsers on all operating systems. Thus, developers have to re-deploy their app for many environments.

While essentially an orthogonal need to HTML standardisation, this seems to be one of the key issues that keep Web apps back from making big market inroads and W3C may do well in setting up a new WG to define a standard Web app manifest format and JS APIs.

Session 2+3 / Multi-screen TV in the Home Network:

Several technologies of hybrid TV broadcast and set-top-box Web content delivery were being pointed out, including the European HbbTV and the Japanese Hybridcast, the latter of which gave an in-depth demo.

Web purists would probably say that it would be simpler to just deliver all content over the Web and not have to worry about any further technical challenges encountered by having to synchronize content received via two vastly different delivery mechanisms. I personally believe this development is one of business models: we don’t yet know exactly how to earn money from TV content delivered over the Internet, but we do know how to do so with TV content. So, hybrids allow the continuation of existing income streams while allowing the features to be augmented with those people enjoy from the Internet.

Should requirements that emerge from such a use case for HTML5 video be taken seriously? I think they absolutely should. What I see happening is that a new way of using the Web is starting to emerge. The new way is video-focused rather than text-focused. We receive our Web content by watching video programming online – video channels, not Web pages are the core content that we consume in the living room. Video channels are where we start our browsing experience from. Search may still be our first point of call, but it will be search for video content or a video-centric app rather than search for a Web site.

And it will be a matter of many interconnected devices in the house that contribute to the experience: the 5.1 stereos that are spread all over the house and should receive our video’s sound, the different screens in the different areas of our house between which we move around, and remote controls, laptops or tablets that function as remote controls and preview stations and are used to determine our viewing experience and provide a back-channel to the publishers.

We have barely begun to identify how such interconnected devices within a home fit within the server-client-based view of the Web world, and the new Web Sockets functionality. The Home Networking Task Force of the Web and TV IG is looking at the issues and analysing existing protocols and standards that solve this picture. But I have a gnawing feeling that the best solution will be something new that is more Web-specific and fits better with the technology layers of the Web.

Session 4 / Synchronized Metadata:

The TV environment offers many data services, some of which have been legally prescribed. This session analysed TV needs and how they can be satisfied with current HTML5.

Subtitles and closed captioning support are one of the key requirements that have been legally prescribed to allow for equal access of non-native speakers, and blind and vision-impaired users to TV content. After demonstration of some key features defined into the HTML5 track element and the WebVTT format, it was generally accepted that HTML5 is making big progress in this space, in particular that browsers are in the process of implementing support for the track element. A concern still exists for complete coverage of all the CEA-608/708 features in WebVTT.

Further concern was raised for support of audio descriptions and audio translations, in particular since no browser has as yet committed to implementing the HTML5′s media multitrack API with the @mediagroup attribute. In this context I am excited to see first JavaScript polyfills emerge (see captionator.js & mediagroup.js).

Another concern was that many captions are actually delivered as raster images (in particular DVD captions) and how that would work in the Web context. The proposal was to use WebVTT and encode the raster images as data-URIs included in timed cues, then render them by JavaScript as an overlay. This is something to explore further.

Demos were shown using WebVTT to synchronize ads with videos, to display related metadata from a user’s life log with videos, to display thumbnails along a video’s timeline, and to show the rendering of text descriptions through screen readers. General agreement by the panel was that WebVTT offers many opportunities and that this area will continue to need further development and that we will see new capabilities on the Web around metadata that were not previously possible on TV.

Session 5 / Content Format and Codecs: DASH and Codec standards

The introduction of HTTP adaptive streaming into HTML5 was one of the core issues that kept returning in the discussions. This panel focused on MPEG DASH, but also mentioned the need for programmatic implementation of adaptive streaming functionality.

The work around MPEG DASH would require specifications of how to use DASH with WebM and Ogg Theora, as well as a specification of a HTML5 profile for DASH, which would limit the functionality possible in DASH files to the ones needed in a HTML5 video element. One criticism of DASH was its verbosity. Another was its unclear patent position. Panel attendees with included Qualcomm, Apple and Microsoft made very clear that their position is pro a royalty-free use of DASH.

The work around a programmatic implementation for adaptive streaming would require at least a JavaScript API to measure the quality of service of a presented video element and a JavaScript API to feed the video element with chunks of (encrypted) video content on the fly. Interestingly enough, there are existing experiments both around Video metrics and MediaSource extensions, so we can expect some progress in this space, even if these are not yet a strong focus of the HTML WG.

I would personally support the creation of Community Group at the W3C around HTTP adaptive streaming and DASH. I think it would work towards alleviating the perceived patent issues around DASH and allow the right members of the community to participate in preparing a specification for HTML5 without requiring them to become W3C members.

Session 6 / Content Protection and DRM

A core concern of the TV community is around content protection. The requirements in this space seem, however, very confused.

The key assumption here is that Web browsers should support the decoding of DRM-protected content in the HTML5 video element because the video element provides a desirable JavaScript API, accessibility features (the track element), default controls, and the possibility to synchronize multiple media elements. However, at the same time, the video element is part of the core content of a Web page and thus allows direct access to the image content in a canvas etc, so some of its functionality is not desirable.

The picture is further confused by requests for authentication, authorization, encryption, obfuscation, same-origin, secure transmission, secure decryption key delivery, unique content identification and other “content protection” techniques without a clear understanding of what is already possible on the Web and what requirements to content publishers actually have for delivering their content on the Web. This is further complicated by the fact that there are many competing solutions for DRM systems in the market with no clear standard that all browsers could support.

A thorough analysis of the technologies and solutions available in this space as well as an analysis of the needs for HTML5 is required before it becomes clear what solution HTML5 browsers may need to support. There seemed to be agreement in the group, though, that browsers would not need to implement DRM solutions, but rather only hand through the functionality of the platform on which they are running (including the media frameworks and operating system functionalities). How this is supposed to work was, however, unclear.

Session 7 / Web & TV: Additional Device & User Requirements

This was a catch-all session for topics that had not been addressed in other sessions. Among the topics addressed in this group were:

  • Parental Guidance: how to deal with ratings in an internationally inconsistent ratings landscape, how to deliver the ratings with the content, and how to enforce the viewing restrictions
  • Emergency Notifications: how to replicate on the Web the emergency notification functionality of TV by providing text overlays to alert users
  • TV channels: how to detect what channels of programming are available to users

Overall, the workshop was a worthwhile experience. It seems there is a lot of work still ahead for making HTML5 video the best it can be on the Web.

by silvia at September 29, 2011 08:16 AM

Conrad Parker : Iteratees at Tsuru Capital


Tsuru Capital is a small company. We build our internal systems for live trading and offline analysis in Haskell, and we're proud to be sponsoring ICFP 2011. We use iteratees throughout our systems, and have actively encouraged all our staff to contribute changes upstream and participate in community design discussions. By being part of the open source community and taking part in peer-review, we all end up with better software.

Over time various Tsuru staff members have worked on tools using iteratees, including (grepping the CONTRIBUTORS files): Bryan Buecking, Michael Baikov, Elliott Pace, Conrad Parker, Akio Takano, and Maciej Wos. There's been some lively discussions and many small patches providing functions that we use in production every day.

Last year Conal Elliott provided some mentoring to Tsuru staff, during which we worked through a denotational semantics for iteratees. This resulted in discussions on both the iteratee project list and haskell-cafe about Semantics of iteratees, enumerators, enumeratees.

By using iteratees in production we've contributed various simple but practical functions, including:

  • enumFdFollow, an enumerator (data source) which allows you to process the growing tail of a log file as it is being written.
  • ioIter, an iteratee that uses an IO action to determine what to do. Typically this is action involves some user interaction, such as a user issuing commands like play/pause/next/prev.
  • ListLike functions last (an iteratee that efficiently returns the last element of a stream), mapM_ and foldM.
  • mapChunksM_, a more efficient version of mapM_ that operates on the underlying chunks, eg. logger = mapChunksM_ (liftIO . print).
  • takeWhile, and its enumeratee variant takeWhileE


  • endianRead8, an iteratee for reading 64bit values with a given endianness. I've used this in ght as well as an internal project.

Stream conversion We've done quite a bit of work on stream conversion, as we use a few different layers of data processing. The iteratee architecture allows you to isolate the data source, conversion and processing functions; much of what we've worked on involves ensuring the converters (enumeratees) can control or translate control messages, so that commands like "seek" do not get lost. We've also built combinators to simplify the task of creating new stream converters.
  • convStateStream, which converts one stream into another while continually updating an internal state. Importantly for variable bitrate binary data, it can produce elements of the output stream from data that spans stream chunks.
  • (>) and (<). These allow stream converters to be composed without rewriting boilerplate. Jon Lato gives a good example using these in the StackOverflow answer to Attoparsec Iteratee.
  • zip, zip[345], sequence_ for using multiple iteratees to process a single stream instance, and (for zip*) collecting the results.
  • eneeCheckIfDone*: This family of functions (eneeCheckIfDoneHandle, eneeCheckIfDonePass, eneeCheckIfDoneIgnore) can be used with
    unfoldConvStreamCheck to make a version of unfoldConvStream which respects seek messages.


Parallel stream processing We often want to do multiple unrelated analysis tasks on a data stream. Whereas sequence_ takes a list of iteratees to run simultaneously and handles each input chunk by mapM across that list, psequence_ runs each input iteratee in a separate forkIO thread. For a real-world example, see Michael Baikov's post about psequence, psequence_, parE, parI.

Thanks

Thanks to John Lato for consistently and reliably maintaining the iteratee package, providing thoughtful feedback and graciously suggesting improvements.

<script src="http://reddit.com/static/button/button1.js" type="text/javascript"></script>

by Conrad Parker (noreply@blogger.com) at September 18, 2011 09:10 AM

Jean-Marc Valin : Opus, the Swiss Army Knife of Audio Codecs


I just got the news today that LCA 2011 has accepted my talk proposal: "Opus, the Swiss Army Knife of Audio Codecs". I'll be presenting it in Ballarat, Australia in January. If there's any specific topic you'd like me to include in the talk, please let me know (by email or comment on this post).

September 08, 2011 01:46 AM

Jean-Marc Valin : Codec Requirements Go RFC, IETF Update


Since yesterday, the IETF audio codec requirements are now published as RFC 6366. While the requirements aren't by themselves interesting (why discuss abstract requirements when you can discuss actual running code?), it's an important milestone in that it's the first document published by the Working Group. It also means one less source of pointless arguments. The guidelines document is now next in line and should go to IETF last call soon.

Now the interesting part of the Opus codec itself. That's the only document that really matters. That one should go to Working Group Last Call (WGLC) pretty soon (possibly next week or two). In the mean time, we're working on improving the clarity of the draft, cleaning up the code and fixing all the last few issues that have been reported since the first WGLC. Stay tuned.

September 01, 2011 04:39 PM

Chris Pearce : New media element APIs and better media seeking resolution


French intern Paul Adenot has recently implemented the seekable and played attributes on the HTML5 video and audio elements in Firefox. The seekable attribute enables script to see what regions of the media can be seeked into (particularly handy with live streams), and the played attribute enables script to see what regions of the media has already been played. Paul has also done some work improving the built in controls on media elements. Thanks for your hard work Paul! These should be available in release builds in November (Firefox 8).

Also in Firefox 8 are my changes to media seeking resolution. Now media seeking should be accurate to the nearest microsecond. It's been reported elsewhere how important accurate seeking for video is. We were previously accurate to the nearest video frame, but we could still be up to one audio packet off (often between 4 and 8 ms out). Now we prune audio samples when seeking so we're down to microsecond resolution.

by Chris Pearce (noreply@blogger.com) at August 24, 2011 11:08 PM

Silvia Pfeiffer : The new FOMS: Open Media Developers at OVC


Since 2007 I have organised the annual Foundations of Open Media Software (FOMS) developers workshop. Last year it was held for the first time in the northern hemisphere, in fact on the two days straight after the Open Video Conference (OVC).

This year I’m really excited to announce that the workshop will be an integral part of the Open Video Conference on 10-12 September 2011.

FOMS 2011 will take place as the Open Media Developers track at OVC and I would like to see as many if not more open media software developers attend as we had in last year’s FOMS.

Why should you go?

Well, firstly of course the people. As in previous years, we will have some of the key developers in open media software attend – not as celebrities, but to work with other key developers on hard problems and to make progress.

Then, secondly we believe we have some awesome sessions in preparation:

How we run it

I’m actually not quite satisfied with just these sessions. I’d like to be more flexible on how we make the three days a success for everyone. And this implies that there will continue to be room to add more sessions, even while at the conference, and create breakout groups to address really hard issues all the way through the conference.

I insist on this flexibility because I have seen in past years that the most productive outcomes are created by two or three people breaking away from the group, going into a corner and hacking up some demos or solutions to hard problems and taking that momentum away after the workshop.

To allow this to happen, we will have a plenary on the first day during which we will identify who is actually present at the workshop, what they are working on, what sessions they are planning on a attending, and what other topics they are keen to learn about during the conference that may not yet be addressed by existing sessions.

We’ll repeat this exercise on the Monday after all the rest of the conference is finished and we get a quieter day to just focus on being productive.

But is it worth the effort?

As in the past years, whether the workshop is a success for you depends on you and you alone. You have the power to direct what sessions and breakout groups are being created, and you have the possibility to find others at the workshop that share an interest and drag them away for some productive brainstorming or coding.

I’m going to make sure we have an adequate number of rooms available to actually achieve such an environment. I am very happy to have the support of OVC for this and I am assured we have the best location with plenty of space.

Trip sponsorships

As in previous FOMSes, we have again made sure that travel and conference sponsorship is available to community software developers that would otherwise not be able to attend FOMS. We have several such sponsorships and I encourage you to email the FOMS committee or OVC about it. Mention what you’re working on and what you’re interested to take away from OVC and we can give you free entry, hotel and flight sponsorship.

Oh, and don’t forget to Register for OVC!

by silvia at August 13, 2011 05:12 AM

Chris Double : Temporal Media Fragment Support in Firefox


The W3C has a Media Fragments Working Group whose mission is to specify temporal and media fragments in the Web using URI’s. The draft specification goes through in detail how these fragments work. I recently became a member of the working group and I’ve been working on adding support for the temporal dimension portion of the specification to Firefox.

In the most basic form you can specify a start time and an end time in the fragment part of a URI in an HTML video or audio element. For example:

<video src='/mediafrag/AudioAPI.webm#t=50,100'></video>

The ‘#t’ portion of the URI identifies the fragment as being a temporal media fragment. In this example ‘50,100’ means start the video at a current time of 50 seconds, and stop playing at 100 seconds. There are various other formats for the temporal media fragment defined in the specification. Examples can be seen in the UA Test Cases.

Development of this feature is being done in bug 648595. I’ve done test builds of the first iteration of the patch and they are available at my Firefox media fragment test builds page. The page has builds, an example, and a list of limitations which are currently:

  • Temporal syntax only. This means no spatial or track dimensions.
  • NPT time support only. No SMPTE time codes or Wall-clock time codes.
  • When changing the media fragment portion of a URL the media is not immediately updated. You need to refresh the page to see the change. This is most noticeable when directly navigating to the video and adding or changing a fragment.
  • The user interface for identifying the fragment in the standard controls is ugly and needs polish.
  • The HTML standard includes an ‘initialTime’ attribute for obtaining the start time. There is no way to obtain an end time so I’ve exposed a ‘mozFragmentEnd’ attribute on the video DOM object.

August 08, 2011 12:00 AM

Chris Double : Safe destruction in the presence of sharing in ATS


While using the libevent API from ATS I came across a scenario where it was important to call a function to release objects in a particular order. I wanted to have ATS enforce at compile time that the destruction occurs safely in the right order. The following example uses the built in libevent HTTP API for creating simple web servers. It uses the ATS libevent wrapper I wrote about previously.

fn cb {l1,l2:agz} (request: !evhttp_request l1, arg: !event_base l2): void = let
  val () = printf("here\n", @())
in
 ()
end

fn server () = let
  val _ = signal (SIGPIPE, SIG_IGN)
  val base = event_base_new ()
  val () = assert_errmsg (~base, "event_base_new failed")

  val http = evhttp_new (base)
  val () = assert_errmsg (~http, "evhttp_new failed")

  val r = evhttp_set_cb_with_base (http, "/", cb, base)
  val () = assert_errmsg (r = 0, "evhttp_set_cb_with_base failed")

  val r = evhttp_bind_socket (http, "0.0.0.0", uint16_of_int(8080))
  val () = assert_errmsg (r = 0, "evhttp_bind_socket failed")

  val r = event_base_dispatch (base)
  val () = assert_errmsg (r >= 0, "event_base_dispatch failed")

  val () = evhttp_free (http)
  val () = event_base_free (base)
in
  ()
end

implement main () = server ()

The call to http_new requires an event_base object. Internally, inside the C API, the returned evhttp object holds a pointer to this event_base. This results in the pointer being shared in two places and requires careful destruction to prevent using a destroyed object.

Later I call evhttp_free to destroy and release resources associated with the evhttp object, followed by event_base_free to do the same with the event_base object. I do it in this order to prevent a dangling reference to the event_base inside the evhttp object. The evhttp object is associated with that particular event_base and it’s important not to pass an incorrect base to functions that use the http object. Ideally code like the following shouldn’t compile:

val base = event_base_new ()
val base2 = event_base_new ()
val http = evhttp_new (base)

// Wrong event_base passed, fail compilation
val r = evhttp_set_cb_with_base (http, "/", cb, base2)

// Destruction out of order, fail compilation
val () = event_base_free (base)
val () = evhttp_free (http)

To model this in ATS I changed the ATS definition of evhttp to have an event_base associated with it at the type level and modified evhttp_new to return this relationship (agz is an address that cannot be NULL, agez is an address that can be NULL):

absviewtype evhttp (l:addr, l2:addr)
fun http_new {l1:agz} (base: !event_base l1): [l2:agez] evhttp (l2,l1) = "mac#evhttp_new"

With this change the type for evhttp depends on the value of the pointer to the event_base given as an argument. Now we can use this in the definition of evhttp_set_cb_with_base to ensure that the correct event_base object is used:

fun evhttp_set_cb_with_base {l1,l2:agz}
       (http: !evhttp (l1, l2),
        path: string,
        callback: {l3:agz} (!evhttp_request l3, !event_base l2) -> void,
        arg: !event_base l2): int = "mac#evhttp_set_cb"

This definition states that the event_base given as the arg paramater, and as the second argument to the callback, must be the same as that associated with the http object. This is done by using the same l2 dependent type argument with these types. This gives us the desired error checking in the first ‘fail compilation’ test above.

The next step is to prevent out of order destruction. This is done by passing the event_base object as a proof argument to the evhttp_free function:

fun evhttp_free {l1,l2:agz} (base: !event_base l1 | http: evhttp (l2, l1)): void = "mac#evhttp_free"

By adding this as proof argument I’m stating that the caller of evhttp_free must have a usable event_base object that was used to create this evhttp object to prove we can safely destroy it. The following is now a compile time error:

val () = event_base_free (base)
val () = evhttp_free (base | http)

This is due to event_base_free consuming the base linear object. The type of base following this call is no longer defined. The evhttp_free call is now an error due to using an undefined base. The following will work:

val () = evhttp_free (base | http)
val () = event_base_free (base)

This method of ensuring correct order of destruction and resource usage requires no changes to the libevent C code. Everything occurs during type checking. The generated C code looks exactly like normal libevent C usage with no runtime overhead for tracking the association between evhttp and event_base objects.

August 07, 2011 11:00 PM

Chris Double : WDCNZ HTML Media Presentation


Last month I went down to Wellington to give a joint talk at the WDCNZ conference. The topic was “HTML Media: Where we are and where we need to go”. The talk was shared between Nigel Parker, Mobile and Developer Evangelist for Microsoft, and myself. The conference was excellent with some great speakers and talks.

Nigel and I covered using HTML video and audio elements and how they can be used today across multiple browsers and mobile devices. We also covered upcoming API’s and directions in the web media area. Demo’s were shown on Microsoft Internet Explorer 9, Windows Phone 7 (running the Mango update which has a browser that supports HTML media) and Firefox. Nigel has slides and a summary in his blog post about the talk. I think it took a bit for attendees to get over the shock over a Mozilla and Microsoft representative sharing the stage and working together! Nigel’s post explains how this came about.

I spoke to a few Kiwi developers afterwards about using HTML video and there was a fair bit of interest in using it. The main obstacles seemed to be people unsure what codec to use and wanting support for adaptive streaming. The first is an education issue, getting people aware of what codecs to support for maximum coverage across browsers and how to encode to those formats. With regards to adaptive streaming there has been discussion between interested parties in various mailing lists and groups - it’s definitely something that is wanted.

At the time of the talk Nigel and I weren’t able to find any existing New Zealand based sites that use HTML video. Hopefully this will change in the future and Microsoft New Zealand are leading by example by using HTML video on their own site.

August 07, 2011 09:30 PM

Ben Schwartz : Evolution


So I wrote this song, sort of. Maybe you’ll like it.

YouTube version
Sheet Music
Reference files at Archive.org

After about 6 years of covering pop songs in my a cappella groups, I really wanted to sing some original music. In part, I was motivated by the US’s aggressively restrictive copyright regime, which always prevented us from freely sharing recordings of our own performances.

I tried to write a song from scratch for a while, but it wasn’t working out, mostly because I don’t have anything interesting to say. Then I struck upon the idea of using the text of an old out-of-copyright poem (which, because of the US’s effectively perpetual copyright, has to be very old indeed). I started browsing through the poetry section of WikiSource, until I stumbled across this brilliant 1895 poem by Langdon Smith. The choice was clear.

I drew up a thoroughly derivative 4-part a cappella arrangement in MuseScore, and VoiceLab indulged me by adding it to the repertoire. We’ve sung it twice so far, but the first time we didn’t have a good recording, and then this time I had to solve this audio-video alignment problem… but now it’s here.

The recordings and sheet music are all CC0 dedicated to the public domain. I would appreciate attribution as the arranger, but I find threats of legal action to be just as distasteful as plagiarism. I wouldn’t want to do anything to discourage people from adopting and adapting the music as they see fit. Maybe someone will make a recording with a soloist who can really sing!

by Ben at August 05, 2011 05:12 PM

Chris Pearce : Simple rate limited HTTP server for testing HTML5 media/streaming


While working on the Firefox HTML5 video and audio support, I've found it extremely useful to have an HTTP server on which the transfer rate is reliably limited. Existing servers are either too heavy weight, like apache, or have inconsistent rate-limiting, like lighttpd which I found to have very "bursty" rate limiting.

I ended up taking the educational route, and implementing a simple HTTP server in C++. It supports the following features:

  1. Support for HTTP1.1 Byte Range Requests. This means you can seek into unbuffered data when watching HTML5 video.
  2. Rate limiting, configurable on a per request basis by passing the "rate=x" HTTP query parameter, where x is the transfer rate of the connection in kilobytes per second. The server will send x/10 KB ten times per second to maintain this rate smoothly.
  3. Simulated live streaming, configurable on a request basis by passing the "live" query parameter. When in "live" mode, no Content-Length header is sent, and the server doesn't advertise or perform byte range requests - so you can't seek into unbuffered video/audio, just like in a live stream.
  4. Cross platform; tested on Windows (runs on port 80) and Linux (runs on port 8080). I haven't test it on MacOS yet.
  5. Simply serves all files in the program's working directory, making it easy to use (and abuse).
  6. Open source! Get the code at https://github.com/cpearce/HttpMediaServer, or download a pre-built win32 binary.
For example, if you wanted to simulate a live stream being served at 100KB/s, your test URL might look something like http://localhost:80/video.ogg?rate=100&live.

I've been using it for quite a while, and over the weekend I finally cleaned it up and put it up on GitHub. Check it out.

by Chris Pearce (noreply@blogger.com) at August 03, 2011 07:03 AM

Jean-Marc Valin : IETF update, Mozilla


I spent my last week in Quebec City at the 81th IETF meeting. The most important meeting there for me was the codec WG. The good news is that there's been a lot of progress in that meeting. A few issues with the Opus bit-stream (e.g. padding, frame packing) were resolved and the chairs are planning a second working group last call in four weeks. After that if all goes well, the codec can go to IETF last call and then RFC.

My week at the IETF meeting was also my first week at my new job working for Mozilla. I've been hired specifically to work on Opus and other codec/multimedia development, so I should have a lot more time for that than I used to. First thing on my list: finishing the Ogg mapping for Opus and releasing an Ogg encoder and decoder.

August 01, 2011 04:40 PM

Ben Schwartz : An Auto-Aligner for PiTiVi


It’s rare to get exactly one recording of an a capella concert. Usually someone’s parents have a fancy but outdated camcorder, someone in the front row has a cell phone video with a great angle but terrible quality, and there’s a beautiful audio-only recording, maybe straight from the mixing board. All the recordings are independent, starting and stopping at different times. Some are only one song long, or are broken into many short pieces.

If you want to combine all these inputs into a video that anyone could watch, you’ll first have to line them up correctly in a video editor. This is a painful process of dragging clips around on the timeline with the mouse, trying to figure out if they’re in sync or not. The usual trick to making this achievable is to look at the audio waveform visualization, but even so, the process can be tedious and irritating.

This year, when I got three recordings from the VoiceLab spring concert, I resolved to solve the problem once and for all. I set about writing an automatic clip alignment algorithm as a patch to PiTiVi, a beautiful (if not mature) free software video editor written in Python.

Today, after about two months of nights and weekends, the result is ready for testing in PiTiVi mainline. Jean-François Fortin Tam has a great writeup explaining how it works from a user’s perspective.

I hadn’t looked into it until after the fact, but of course this is not the first auto-alignment function in a video editor. Final Cut Pro appears to have a similar function built in, and there are also plug-ins such as “Plural Eyes” for many editors. However, to the best of my knowledge, this is the first free implementation, and the first available on Linux. Comparing features in PiTiVi vs. the proprietary giants, I think of this as “one down, 20,000 to go”.

I guess this is as good a place as any to talk about the algorithm, which is almost The Simplest Thing that could Possibly Work. Alignment works by analyzing the audio tracks, relying on every video camera to have a microphone of its own. The most direct approach might be to compute the cross-correlation of these audio tracks and look for the peak … but this could require storing multi-gigabyte audio files in memory, and performing impossibly large FFTs. On computers of today, the direct approach is technologically infeasible.

The algorithm I settled on resembles the method a human uses when looking at the waveform view. First, it breaks each input audio stream into 40 ms blocks and computes the mean absolute value of each block. The resulting 25 Hz signal is the “volume envelope”. The code subtracts the mean volume from each track’s envelope, then performs a cross-correlation between tracks and looks for the peak, which identifies the relative shift. To avoid performing N^2 cross-correlations, one clip is selected as the fixed reference, and all others are compared to it. The peak position is quantized to the block duration (creating an error of +/- 20ms), so to improve accuracy a parabolic fit is used to interpolate the true maximum. I don’t know the exact residual error, but I expect it’s typically less than 5 ms, which should be plenty good enough, seeing as sound travels about 1 foot per ms.

My original intent was to compensate for clock skew as well, because all these recording devices are using independent sample clocks that are running at slightly different rates due to manufacturing variation. There’s even code in the commit for a far more complex algorithm that can measure this clock skew. At the moment, this code is disused, for two reasons: none of our test clips actually showed appreciable skew, and PiTiVi doesn’t actually support changing the speed of clips, especially audio.

If you want to help, just stop by the PiTiVi mailing list or IRC channel. We can use more test clips, a real testing framework, a cancel button, UI improvements, conversion to C for speed, and all sorts of general bug squashing. For this feature, and throughout PiTiVi, there’s always more to be done. I’ve found the developer community to be extremely welcoming of new contributions … come and join us.

by Ben at July 26, 2011 04:12 AM

Silvia Pfeiffer : Recent developments around WebVTT


People have been asking me lots of questions about WebVTT (Web Video Text Tracks) recently. Questions about its technical nature such as: are the features included in WebVTT sufficient for broadcast captions including positioning and colors? Questions about its standardisation level: when is the spec officially finished and when will it move from the WHATWG to the W3C? Questions about implementation: are any browsers supporting it yet and how can I make use of it now?

I’m going to answer all of these questions in this post to make it more efficient than answering tweets, emails, and skype and other phone conference requests. It’s about time I do a proper post about it.

Implementations

I’m starting with the last area, because it is the simplest to answer.

No, no browser has as yet shipped support for the <track> element and therefore there is no support for WebVTT in browsers yet. However, implementations are in progress. For example, Webkit has recently received first patches for the track element, but there is still an open bug for a WebVTT parser. Similarly, Firefox can now parse the track element, but is still working on the element’s actual functionality.

However, you do not have to despair, because there are now a couple of JavaScript polyfill libraries for either just the track element or for video players with track support. You can start using these while you are waiting for the browsers to implement native support for the element and the file format.

Here are some of the libraries that I’ve come across that will support SRT and/or WebVTT (do leave a comment if you come across more):

  • Captionator – a polyfill for track and SRT parsing (WebVTT in the works)
  • js_videosub – a polyfill for track and SRT parsing
  • jscaptions – a polyfill for track and SRT parsing
  • LeanBack player – a video player with track and SRT, SUB, DFXP, and soon full WebVTT parsing support
  • playr – a video player that includes track and WebVTT parsing
  • MediaElementJS – a video player that includes track and SRT parsing
  • Kaltura’s video player – a video player that includes track and SRT parsing

I am actually most excited about the work of Ronny Mennerich from LeanbackPlayer on WebVTT, since he has been the first to really attack full support of cue settings and to discuss with Ian, me and the WHATWG about their meaning. His review notes with visual description of how settings are to be interpreted and his demo will be most useful to authors and other developers.

Standardisation

Before we dig into the technical progress that has been made recently, I want to answer the question of “maturity”.

The WebVTT specification is currently developed at the WHATWG. It is part of the HTML specification there. When development on it started (under its then name WebSRT), it was also part of the HTML5 specification of the W3C. However, there was a concern that HTML5 should be independent of the chosen captioning format and thus WebVTT currently only exists at the WHATWG.

In recent months – and particularly since browser vendors have indicated that they will indeed implement support for WebVTT as their implementation of the <track> element – the question of formal standardization of WebVTT at the W3C has arisen. I’m involved in this as a Google contractor and we’ve put together a proposed charter for a WebVTT Working Group at the W3C.

In the meantime, standardization progresses at the WHATWG productively. Much feedback has recently been brought together by Ian and changes have been applied or at least prepared for a second feature set to be added to WebVTT once the first lot is implemented. I’ve captured the potentially accepted and rejected new features in a wiki page.

Many of the new features are about making the WebVTT format more useful for authoring and data management. The introduction of comments, inline CSS settings and default cue settings will help authors reduce the amount of styling they have to provide. File-wide metadata will help with the exchange of management information in professional captioning scenarios and archives.

But even without these new features, WebVTT already has all the features necessary to support professional captioning requirements. I’ve prepared a draft mapping of CEA-608 captions to WebVTT to demonstrate these capabilities (CEA-608 is the TV captioning standard in the US).

So, overall, WebVTT is in a great state for you to start implementing support for it in caption creation applications and in video players. There’s no need to wait any longer – I don’t expect fundamental changes to be made, but only new features to be added.

New WebVTT Features

This takes us straight to looking at the recently introduced new features.

  • Simpler File Magic:
    Whereas previously the magic file identifier for a WebVTT file was a single line with “WEBVTT FILE”. This has now been changed to a single line with just “WEBVTT”.
  • Cue Bold Span:
    The <b> element has been introduced into WebVTT, thus aligning it somewhat more with SRT and with HTML.
  • CSS Selectors:
    The spec already allowed to use the names of tags, the classes of <c> tags, and the voice annotations of <v> tags as CSS selectors for ::cue. ID selector matching is now also available, where the cue identifier is used.
  • text-decoration support:
    The spec now also supports the CSS text-decoration property for WebVTT cues, allowing functionality such as blinking text and bold.

Further to this, the email identifies the means in which WebVTT is extensible:

  • Header area:
    The WebVTT header area is defined through the “WEBVTT” magic file identifier as a start and two empty lines as an end. It is possible to add into this area file-wide information header information.
  • Cues:
    Cues are defined to start with an optional identifier, and then a start/end time specification with “–>” separator. They end with two empty lines. Cues that contain a “–>” separator but don’t parse as valid start/end time are currently skipped. Such “cues” can be used to contain inline command blocks.
  • Inline in cues:
    Finally, within cues, everything that is within a “tag”, i.e. between “<" and ">“, and does not parse as one of the defined start or end tags is ignored, so we can use these to hide text. Further, text between such start and end tags is visible even if the tags are ignored, so wen can introduce new markup tags in this way.

Given this background, the following V2 extensions have been discussed:

  • Metadata:
    Enter name-value pairs of metadata into the header area, e.g.

    WEBVTT
    Language=zh
    Kind=Caption
    Version=V1_ABC
    License=CC-BY-SA
    
    1
    00:00:15.000 --> 00:00:17.950
    first cue
  • Inline Cue Settings:
    Default cue settings can come in a “cue” of their own, e.g.

    WEBVTT
    
    DEFAULTS --> D:vertical A:end
    
    00:00.000 --> 00:02.000
    This is vertical and end-aligned.
    
    00:02.500 --> 00:05.000
    As is this.
    
    DEFAULTS --> A:start
    
    00:05.500 --> 00:07.000
    This is horizontal and start-aligned.
    
  • Inline CSS:
    Since CSS is used to format cue text, a means to do this directly in WebVTT without a need for a Web page and external style sheet is helpful and could be done in its own cue, e.g.

    WEBVTT
    
      STYLE -->
      ::cue(v[voice=Bob]) { color: green; }
      ::cue(c.narration) { font-style: italic; }
      ::cue(c.narration i) { font-style: normal; }
    
      00:00.000 --> 00:02.000
      <v Bob>Welcome.
    
      00:02.500 --> 00:05.000
      <c .narration>To <i>WebVTT</i>.
    
  • Comments:
    Both, comments within cues and complete cues commented out are possible, e.g.

    WEBVTT
    
     COMMENT -->
     00:02.000 --> 00:03.000
     two; this is entirely
     commented out
    
     00:06.000 --> 00:07.000
     this part of the cue is visible
     <! this part isn't >
     <and neither is this>
    

Finally, I believe we still need to add the following features:

  • Language tags:
    I’d like to add a language tag that allows to mark up a subpart of cue text as being in a different language. We need this feature for mixed-language cues (in particular where a different font may be necessary for the inline foreign-language text). But more importantly we will need this feature for cues that contain text descriptions rather than captions, such that a speech synthesizer can pick the correct language model to speak the foreign-language text. It was discussed that this could be done with a <lang jp>xxx</lang> type of markup.
  • Roll-up captions:
    When we use timestamp objects and the future text is hidden, then is un-hidden upon reaching its time, we should allow the cue text to scroll up a line when the un-hidden text requires adding a new line. This is the typical way in which TV live captions have been displayed and so users are acquainted with this display style.
  • Inline navigation:
    For chapter tracks the primary use of cues are for navigation. In other formats – in particular in DAISY-books for blind users – there are hierarchical navigation possibilities within media resources. We can use timestamp objects to provide further markers for navigation within cues, but in order to make these available in a hierarchical fashion, we will need a grouping tag. It would be possible to introduce a <nav> tag that can group several timestamp objects for navigation.
  • Default caption width:
    At the moment, the default display size of a caption cue is 100% of the video’s width (height for vertical directions), which can be overruled with the “S” cue setting. I think it should by default rather be the width (height) of the bounding box around all the text inside the cue.

Aside from these changes to WebVTT, there are also some things that can be improved on the <track> element. I personally support the introduction of the source element underneath the track element, because that allows us to provide different caption files for different devices through the @media media queries attribute and it allows support for more than just one default captioning format. This change needs to be made soon so we don’t run into trouble with the currently empty track element.

I further think a oncuelistchange event would be nice as well in cases where the number of tracks is somehow changed – in particular when coming from within a media file.

Other than this, I’m really very happy with the state that we have achieved this far.

by silvia at June 27, 2011 08:23 AM

Cristian Adam : Maintainer for Directshow Filters for Ogg Vorbis, Speex, Theora and FLAC


I probably should have blogged sooner, but here it is: I'm the current maintainer for Directshow Filters for Ogg Vorbis, Speex, Theora and FLAC.

If you want a hardware Ogg Player you should consider buying a Trekstor Samsung (most of their MP3 players support Ogg Vorbis and FLAC formats) product!

by Cristian Adam (noreply@blogger.com) at June 21, 2011 06:06 AM

Monty : FTC "Patent Hold-Up" Workshop


Quite a few otherwise interested people may not have heard that the FTC (Federal Trade Commission) is holding a panel and workshop next week concerning how patent trolls are abusing standards body processes. This is our field, and we didn't find out about it until end of last week.

Regardless, Xiph.Org has assembled an official comment document, and will be represented in person by at least Dr. Tim Terriberry and possibly a few other core members (I won't be there).

If you're interested in software patents, some of the US Government's thinking on the issue, and participating in the process, have a look at the above two links. Also, feel free to distribute our comments far and wide. It's somewhat more gripping than the usual, dry "Percy Q. Business Leader Advises the Federal Goverment".

by Monty (monty@xiph.org) at June 15, 2011 09:36 AM

Monty : Death By Graphs (a new Ghost update/demo)


I'd mentioned in the previous update that we're (Xiph is) using a chirp estimation algorithm that we published back in 2007, and that the original paper has precious little space to devote to describing in detail how the algorithm actually performed. One of the upshots of not having done extensive characterization tests of our own algorithm was that it has already surprised me a few times this year (in both good and bad ways).

Therefore, Ghost Update 20110604 concerns itself with describing and graphing algorithm behavior in mind-numbing detail.

Death! By! Graphs!

by Monty (monty@xiph.org) at June 05, 2011 03:28 AM

Chris Double : Namecoin - A DNS alternative based on Bitcoin


Namecoin is a domain name system based on Bitcoin. It extends Bitcoin to add transactions for registering, updating and transferring names. The idea behind this is to provide an alternative to the existing DNS system where names can be taken from their owners by groups that control the DNS servers.

The project was originally announced in the bitcoin forums and has seen some uptake. The namecoin author, vinced, states in the post:

  • This is a new blockchain, separate from the main Bitcoin chain
  • Name/value pairs are stored in the blockchain attached to coins
  • Names are acquired through new transaction types - new, first-update and update
  • Names expire after 12000 blocks unless renewed with an update
  • No two unexpired names can be identical
  • Block validation is extended to reject transactions that do not follow the above rules
  • The code is here: https://github.com/vinced/namecoin

A number of projects have been created around this to provide a mapping from namecoin names to standard DNS. This allows resolving namecoin names to a ‘.bit’ suffixed domain. I’ll go through building the namecoin software, registering and updating names, then the software to use these names.

Building Namecoin

Namecoin needs to be built from source. The following steps on a Linux based system will build without UPNP support:

$ git clone git://github.com/vinced/namecoin.git
$ cd namecoin
namecoin $ make -f makefile.unix USE_UPNP=

Once built you’ll need to create a ~/.namecoin/bitcoin.conf file that contains entries for a username and password used for the JSON-RPC server that namecoind runs. Notice the name of the .conf file is bitcoin.conf even though this is namecoind. It won’t clash with an existing bitcoin installation as it is in a ~/.namecoin directory. To prevent conflict with an existing bitcoin install I suggest running namecoind on a different port. An example ~/.namecoin/bitcoin.conf is:

rpcuser=me
rpcpassword=password
rpcport=9332

Running namecoind will start the daemon and you can then use namecoind to execute commands:

$ ./namecoind
bitcoin server starting
$ ./namecoind getblockcount
2167

Yes, it prints out ‘bitcoin server starting’. There are still bitcoin references in the code that need to be changed apparently.

Getting Namecoins

To register a name you need to have some namecoins. These can be obtained via mining, just like bitcoins. Or you can buy them. To mine namecoins you can run any of the standard bitcoin miners and point them to the server and port that is running namecoind. The difficulty level for namecoin mining is currently very low (about 290 at the time of writing) so even CPU miners have a chance. Generating a block gets you 50 namecoins.

You can also buy namecoins as described here. The going rate seems to be about 1BTC for 50 namecoins.

Registering a name

The name_new command will register a name. An example invocation is:

$ ./namecoind name_new d/myname
[
    "1234567890123456789012345678901234567890",
    "0987654321"
]   

This will start the registration process for the name myname. Note the two hash values returned. Once this is done you need to wait for 12 blocks to be generated by the namecoin network. You then need to run a name_firstupdate command:

$ ./namecoind name_firstupdate d/myname 0987654321 '{"map":{"":"1.2.3.4"}}'

We pass to name_firstupdate the domain name we are updating, the shorter hash that we got from name_new and a JSON value that defines how that name is mapped to an IP address.

In this case the name is mapped to the IP address 1.2.3.4. Using the existing systems for mapping names this would make myname.bit resolve to 1.2.3.4. You can also do subdomains (See the update example later).

The cost to do a name_new, followed by a name_firstupdate, varies depending on how many blocks there are in the namecoin block chain. It started at 50 namecoins and slowly reduces. The formula is defined in the namecoin design document as:

  • Network fees start out at 50 NC per operation at the genesis block

  • Every block, the network fees decreases based on this algorithm, in 1e-8 NC:

      res = 500000000 >> floor(nBlock / 8192)
      res = res - (res >> 14)*(nBlock % 8192)
  • nBlock is zero at the genesis block

  • This is a decrease of 50% every 8192 blocks (about two months)

  • As 50 NC are generated per block, the maximum number of registrations in the first 8192 blocks is therefore 2/3 of 8192, which is 5461

  • Difficulty starts at 512

Updating a name

To update the domain mapping you use name_update:

$ ./namecoind name_update d/myname '{"map":{"":"1.2.3.4","www":"5.6.7.8"}}'

This example updates the value of myname so it includes a www subdomain. The name www.myname.bit will now map to 5.6.7.8.

There are other possibilities for the JSON mapping. See the namecoin README for details. Note that the JSON code must be valid JSON (ie. use double quotes, unlike the examples currently shown in the README unfortunately).

Transferring a name

To transfer a name to another person you need to get their namecoin address and do an update passing that address:

$ ./namecoind name_update d/myname '{"map":{"":"1.2.3.4"}}' NGZs7UndoWgpfTstoxryYEW8b1GtDLPwMa

Addresses can be generated with:

$ ./namecoind getnewaddress
N9jzzaptnQ28uiLgWm19WZAqrGqRVVGkFX

Transferring namecoins

You can transfer namecoins to other people by sending coins to their address just like bitcoin:

$ ./namecoind sendtoaddress N9jzzaptnQ28uiLgWm19WZAqrGqRVVGkFX 50

This will send 50 namecoins to N9jzzaptnQ28uiLgWm19WZAqrGqRVVGkFX.

Listing registered names

You can list all registered namecoin names using name_scan:

$ ./namecoind name_scan
{
    "name" : "d/bluishcoder",
    "value" : "{\"map\":{\"\":\"69.164.206.88\"}}",
    "txid" : "....",
    "expires_in" : 10874
},

You can also list only the names you’ve registered using name_list:

$ ./namecoind name_list
{
    "name" : "d/bluishcoder",
    "value" : "{\"map\":{\"\":\"69.164.206.88\"}}",
    "expires_in" : 10874
}

Using namecoin names

Software needs to be modified to use namecoind to lookup the name, or you can run DNS software that connects to namecoin to do lookups. To be able to try out namecoin I modified an HTTP proxy and later tried using DNS software.

HTTP Proxy

I modified the Polipo web proxy to use namecoin for lookups. The modified source is available at https://github.com/doublec/namecoin-polipo. This can be built and run with:

$ git clone https://github.com/doublec/namecoin-polipo
$ cd namecoin-polipo
$ make
$ ./polipo namecoindServer="127.0.0.1:9332" namecoindUsername=rpcuser namecoindPassword=rpcpassword

Changing your browser to point to the proxy on localhost, port 8123, will allow .bit domains to be used. See my forum post about it for more details.

dnsmasq

Another approach I tried was to write a program that generates a ‘host file’ from namecoind and uses dnsmasq to run a local DNS server that serves domains from this host file, falling back to the standard DNS server. The ‘quick and dirty’ code to generate the hosts file is in namecoin-hosts.c and uses libcurl and libjansson to build:

$ gcc -o namecoin-hosts namecoin-hosts.c -lcurl -ljansson

I added the following to my dnsmasq.conf:

local=/.bit/
local-ttl=300
addn-hosts=/tmp/hosts.txt

And created a shell script to update /tmp/hosts.txt with the namecoin related data:

while true; do
  ./namecoin-hosts 127.0.0.1:9332 rpcuser rpcpassword >/tmp/hosts.txt
  kill -HUP `cat /var/run/dnsmasq/dnsmasq.pid`
  echo `date`
  sleep 300
done

Pointing my OS DNS resolver to the dnsmasq IP address and port allowed .bit names to resolve.

Public .bit DNS servers

Details of a public .bit DNS server that doesn’t require you to run namecoin are available at namecoin.bitcoin-contact.org. That site also provides details on using namecoin.

More Information

Namecoin seems to be very much an experiment in having an alternative DNS like system. The developer has taken the approach of ‘release early’ and iterate towards a solution. As such it may fizzle out and go nowhere. Or it may prove a useful test-bed for ideas that make it into a successful DNS alternative.

More details about Namecoin can be obtained from:

Are there any other alternatives to DNS around with similar ideas?

May 12, 2011 06:00 AM

Chris Double : Building Mozart/Oz on Windows


The Mozart Programming System is an implementation of the Oz programming language. It’s the language used in the book Concepts, Techniques, and Models of Computer Programming by Peter Van Roy and Seif Haridi. From the Mozart website:

Mozart is based on the Oz language, which supports declarative programming, object-oriented programming, constraint programming, and concurrency as part of a coherent whole. For distribution, Mozart provides a true network transparent implementation with support for network awareness, openness, and fault tolerance. Mozart supports multi-core programming with its network transparent distribution and is an ideal platform for both general-purpose distributed applications as well as for hard problems requiring sophisticated optimization and inferencing abilities.

The last release of Mozart was a couple of years ago and the steps to build on Windows no longer seem to work. It required Cygwin to build but used the MingW compiler to get a native Windows build.

Mozart/Oz has recently seen a bit of a life with activity in the mailing lists and a move to github for source control and issue tracking. I was working on a project that needed Windows and Linux support so thought I’d have a try at converting Mozart to compile using MingW on Windows.

I’ve got a basic build working and have patches that I hope will eventually be merged. In the meantime I thought I’d post about how to build using MingW with my patches. The following steps will build Mozart, the standard library, Emacs (used as the IDE) and Tcl/Tk (for GUI).

  1. Download and install mingw-get-inst. Install the ‘MSYS Basic System’ and the ‘C++ Compiler’.

  2. From the ‘MingW Shell’, run the following commands to install the required support packages:

     $ mingw-get install mingw32-libgmp
     $ mingw-get install mingw32-gmp
     $ mingw-get install mingw32-libz
     $ mingw-get install msys-libgdbm
     $ mingw-get install msys-libregex
     $ mingw-get install mingw32-autoconf2.1
     $ mingw-get install mingw32-autoconf
     $ mingw-get install msys-flex
     $ mingw-get install msys-bison
     $ mingw-get install mingw32-gcc-g++
     $ mingw-get install msys-wget
  3. Download and install the msysgit package. I installed it in C:/git. It’s best to install it in a directory that doesn’t have spaces in the name.

  4. Make a directory to store the resulting ‘Mozart/Oz’ binaries. I used /p to keep my command lines short but you could also use /mingw to install alongside the existing MingW programs:

     $ mkdir /p
  5. Download and build Tcl/Tk 8.5 from source, configured to install in the directory above:

     $ wget http://prdownloads.sourceforge.net/tcl/tcl8.5.9-src.tar.gz
     $ wget http://prdownloads.sourceforge.net/tcl/tk8.5.9-src.tar.gz
     $ tar xvf tcl8.5.9-src.tar.gz
     $ tar xvf tk8.5.9-src.tar.gz
     $ cd tcl8.5.9
     tcl8.5.9 $ ./win/configure --prefix=/p
     tcl8.5.9 $ make && make install
     tcl8.5.9 $ cd ../tk8.5.9
     tk8.5.9 $ ./win/configure --prefix=/p --with-tcl=`pwd`/../tcl8.5.9
     tk8.5.8 $ make && make install
  6. Download and build emacs from source:

     $ wget http://ftp.gnu.org/gnu/emacs/emacs-23.3.tar.gz
     $ tar xvf emacs-23.3.tar.gz
     $ cd emacs-23.3
     emacs-23.3 $ cd nt
     emacs-23.3/nt $ cmd /c "configure --prefix=/p --without-xpm \
                      --without-png --without-jpeg --without-tiff --without-gif"
     emacs-23.3/nt $ make && make install
  7. Build my win32_build branch of Mozart. Note that I add the directory where I installed Git into the path. I also add the ‘bin’ directory of where I set Mozart to be installed.

     $ export PATH=$PATH:/c/git/bin:/p/bin
     $ git clone git://github.com/doublec/mozart.git
     $ cd mozart
     mozart $ git checkout -b win32_build origin/win32_build
     mozart $ cd ..
     $ mkdir build
     $ cd build
     build $ windlldir=/p ../mozart/configure --prefix=/p \
              --with-inc-dir=/p/include --with-lib-dir=/p/lib \
              --with-tcl=/p --with-tk=/p --disable-contrib-compat \
              --disable-contrib --enable-modules-static \
              --disable-doc --disable-chm
     build $ make && make install
  8. Build my win32_build branch of the Mozart standard library:

     $ git clone git://github.com/doublec/mozart-stdlib.git
     $ cd mozart-stdlib
     mozart-stdlib $ git checkout -b win32_build origin/win32_build
     mozart-stdlib $ cd ..
     $ mkdir build-stdlib
     $ cd build-stdlib
     build-stdlib $ ../mozart-stdlib/configure --prefix=/p
     build-stdlib $ make && make install
  9. You can now test Mozart by running the comand oz. This should start emacs with a Mozart/Oz system running. You can evaluate an example by entering {Browse 1+1} into the topmost emacs pane and evaluating with Ctrl+. Ctrl+l. You’ll need to set the OZEMACS environment variable to point to the location of emacs:

     $ export OZEMACS=/p/bin/emacs.exe
     $ oz

That’s a large number of steps but it gives a complete Mozart/Oz environment. From there you can work through the tutorial. There’s work to be done to make it easier and get more testing. One contributor is looking at creating a Visual Studio project to do the builds as well as to improve on the basic MingW support I’ve got working, so there’s hope for less steps in the future.

The Mozart/Oz system is interesting and there’s some neat projects written in it. A short list of some of them:

  • BeerNet, a P2P network.
  • TransDraw, a distributed shared drawing program. I run a live transdraw instance which you can connect too. Instructions here.
  • Roads, a web application framework.
  • EBL/Tk, an UI library that can do migration of user interface elements across the network.

May 11, 2011 08:00 AM

Silvia Pfeiffer : HTML5 multi-track audio or video


In the last months, we’ve been working hard at the WHATWG and W3C to spec out new HTML markup and a JavaScript interface for dealing with audio or video content that has more than just one audio and video track.

This is particularly relevant when a Web page author wants to add a sign language track to a video or audio resource for deaf people, or an audio description track (i.e. a sound track in which a speaker explains the key things that can be seen on screen) for blind people. It is also relevant when a Web page author wants to publish a video with multiple audio tracks that are each a different language dub for the video and can be used for less common cases such as a director’s comment track, or making available different camera angles for an event.

Just to be clear: this is not a means to introduce video editing functionality into the Web browser. If you want to do edits, you’re better off with an application that will eventually render a new piece of content and includes fancy transitions etc. Similarly, this is not a means to introduce mixing functionality (as in what DJs do when they play with multiple audio recordings). You’re better off with an actual audio mixing or DJ application that will provide you all sorts of amazing effects and filters.

So, multi-track is squarely focused on synchronizing alternative or additional tracks to a single resource with a single timeline to which all tracks are slaved.

Two means of publishing such multi-track media content are possible:

  • In-band multi-track
  • Synchronized resources

1. In-band multi-track

In in-band multi-track, there is a single file that has all all the tracks inside it. For this single file, there is now an API in HTML5 that allows addressing and controlling these tracks.

Of the video file formats that Web browsers support, WebM is currently not defined to contain more than one audio or video track. However, since WebM is using the Matroska container format, which supports multi-track, it is possible to extend WebM for multi-track resources. I have seen multitrack Ogg, MP4 and Matroska files in the wild and most media players support their display.

The specification that has gone into HTML5 to support in-band multi-track looks as follows:

interface HTMLMediaElement : HTMLElement {
  [...]
  // tracks
  readonly attribute MultipleTrackList audioTracks;
  readonly attribute ExclusiveTrackList videoTracks;
};

interface TrackList {
  readonly attribute unsigned long length;
  DOMString getID(in unsigned long index);
  DOMString getKind(in unsigned long index);
  DOMString getLabel(in unsigned long index);
  DOMString getLanguage(in unsigned long index);

           attribute Function onchange;
};

interface MultipleTrackList : TrackList {
  boolean isEnabled(in unsigned long index);
  void enable(in unsigned long index);
  void disable(in unsigned long index);
};

interface ExclusiveTrackList : TrackList {
  readonly attribute unsigned long selectedIndex;
  void select(in unsigned long index);
};

You will notice that every audio and video track gets an index to address them. You can enable() and disable() individual audio tracks and you can select() a single video track for display. This means that one or more audio tracks can be active at the same time (e.g. main audio and audio description), but only one video track will be active at a time (e.g. main video or sign language).

Through the getID(), getKind(), getLabel() and getLanguage() functions you can find out more about what actual content is available in the individual tracks so as to activate/deactivate them correctly and display the right information about them.

getKind() identifies the type of content that the track exposes such as “description” (for audio description), “sign” (for sign language), “main” (for the default displayed track), “translation” (for a dubbed audio track), and “alternative” (for an alternative to the default track).

getLabel() provides a human readable string that describes the content of the track aiming to be used in a menu.

getID() provides a short machine-readable string that can be used to construct a media fragment URI for the track. The use case for this will be discussed later.

getLanguage() provides a machine-readable language code to identify which language is spoken or signed in an audio or sign language video track.

Example 1:

The following uses a video file that has a main video track, a main audio track in English and French, and an audio description track in English and French. (It likely also has caption tracks, but we will ignore text tracks for now.) This code sample switches the French audio tracks on and all other audio tracks off.

<video id="v1" poster=“video.png” controls>
 <source src=“video.ogv” type=”video/ogg”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
video = document.getElementsByTagName("video")[0];

for (i=0; i< video.audioTracks.length; i++) {
  if (video.audioTracks.getLanguage(i) == "fr") {
    video.audioTracks.enable(i);
  } else {
    video.audioTracks.disable(i);
  }
}
</script>

Example 2:

The following uses a audio file that has a main audio track in English, no main video track, but sign language video tracks in ASL (American Sign Language), BSL (British Sign Language), and ASF (Australian Sign Language). This code sample switches the Australian sign language track on and all other video tracks off.

<video id="a1" controls>
 <source src=“audio_sign.ogg” type=”video/ogg”>
 <source src=“audio_sign.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
video = document.getElementsByTagName("video")[0];

for (i=0; i< video.videoTracks.length; i++) {
  if (video.videoTracks.getLanguage(i) == "asf") {
    video.videoTracks.select(i);
    break;
  }
}
</script>

If you have more tracks in both examples that conflict with your intentions, you may need to further filter your activation / deactivation code using the getKind() function.

2. Synchronized resources

Sometimes the production process of media creates not a single resource with multiple contained tracks, but multiple resources that all share the same timeline. This is particularly useful for the Web, because it means the user can download only the required resources, typically saving a substantial amount of bandwidth.

For this situation, an attribute called @mediagroup can be added in markup to slave multiple media elements together. This is administrated in the JavaScript API through a MediaController object, which provides events and attributes for the combined multi-track object.

The new IDL interfaces for HTMLMediaElement are as follows:

interface HTMLMediaElement : HTMLElement {
  [...]
  // media controller
           attribute DOMString mediaGroup;
           attribute MediaController controller;
};

interface MediaController {
  readonly attribute TimeRanges buffered;
  readonly attribute TimeRanges seekable;
  readonly attribute double duration;
           attribute double currentTime;

  readonly attribute boolean paused;
  readonly attribute TimeRanges played;
  void play();
  void pause();

           attribute double defaultPlaybackRate;
           attribute double playbackRate;

           attribute double volume;
           attribute boolean muted;

           attribute Function onemptied;
           attribute Function onloadedmetadata;
           attribute Function onloadeddata;
           attribute Function oncanplay;
           attribute Function oncanplaythrough;
           attribute Function onplaying;
           attribute Function onwaiting;
           attribute Function ondurationchange;
           attribute Function ontimeupdate;
           attribute Function onplay;
           attribute Function onpause;
           attribute Function onratechange;
           attribute Function onvolumechange;
};

You will notice that the MediaController replicates some of the states and events of the slave media elements. In general the approach is that the attributes represent the summary state from all the elements and the writable attributes when set are handed through to all the slave elements.

Importantly, if the individual media elements have @controls activated, then the displayed controls interact with the MediaController thus allowing synchronized playback and interaction with the combined multi-track object.

Example 3:

The following uses a video file that has a main video track, a main audio track in English. There is another video file with the ASL sign language for the video, and an audio file with the audio description in English. This code sample creates controls on the first file, which then also control the audio description and the sign language video, neither of which have controls. Since the audio description doesn’t have controls, it doesn’t get visually displayed. The sign language video will just sit next to the main video without controls.

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.webm” type=”video/webm”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<video id="v2" poster=“sign.png” mediagroup="a11y_vid">
 <source src=“sign.webm” type=”video/webm”>
 <source src=“sign.mp4” type=”video/mp4”>
</video>

<audio id="a1" mediagroup="a11y_vid">
 <source src=“audio.ogg” type=”audio/ogg”>
 <source src=“audio.mp3” type=”audio/mp3”>
</video>

Example 4:

We now accompany a main video with three sign language video tracks in ASL, BSL and ASF. We could just do this in JavaScript and replace the currentSrc of a second video element with the links to BSL and ASF as required, but then we need to run our own media controls to list the available tracks. So, instead, we create a video element for each one of the tracks and use CSS to remove the inactive ones from the page layout. The code sample activates the ASF track and deactivates the other sign language tracks.

<style>
  video.inactive { display: none; }
</style>

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.webm” type=”video/webm”>
 <source src=“video.mp4” type=”video/mp4”>
</video>

<video id="v2" poster=“sign_asl.png” mediagroup="a11y_vid" class="active">
 <source src=“sign_asl.webm” type=”video/webm”>
 <source src=“sign_asl.mp4” type=”video/mp4”>
</video>

<video id="v3" poster=“sign_bsl.png” mediagroup="a11y_vid" class="inactive">
 <source src=“sign_bsl.webm” type=”video/webm”>
 <source src=“sign_bsl.mp4” type=”video/mp4”>
</video>

<video id="v4" poster=“sign_asf.png” mediagroup="a11y_vid" class="inactive">
 <source src=“sign_asf.webm” type=”video/webm”>
 <source src=“sign_asf.mp4” type=”video/mp4”>
</video>

<script type="text/javascript">
videos = document.getElementsByTagName("video");

for (i=0; i< videos.length; i++) {
  if (video[i].videoTracks.getLanguage(0) == "asf") {
    video[i].setAttribute("class", "active");
  } else {
    video[i].setAttribute("class", "inactive");
  }
}
</script>

Example 5:

In this final example we look at what to do when we have a in-band multi-track resource with multiple video tracks that should all be displayed on screen. This is not a simple problem to solve because a video element is only allowed to display a single video track at a time. Therefore for this problem you need to use both approaches: in-band and synchronized resources.

We take a in-band multitrack resource with a main video and audio track and three sign language tracks in ASL, BSL and ASF. The second resource will be made up from the URI of the first resource with a media fragment address of the sign language tracks. (If required, these can be discovered using the getID() function on the first resource.) The markup will look as follows:

<video id="v1" poster=“video.png” controls mediagroup="a11y_vid">
 <source src=“video.ogv#track=v_main&track=a_main” type=”video/ogv”>
 <source src=“video.mp4#track=v_main&track=a_main” type=”video/mp4”>
</video>

<video id="v2" poster=“sign.png” controls mediagroup="a11y_vid">
 <source src=“video.ogv#track=asl&track=bsl&track=asf” type=”video/ogv”>
 <source src=“video.mp4#track=asl&track=bsl&track=asf” type=”video/mp4”>
</video>

Note that with multiple video elements you can always style them in the way that you want them displayed on screen. E.g. if you want a picture-in-picture display, you scale the second video down and absolutely position it on top of the first one in the appropriate location. You can even grab the second video into a canvas, chroma-key your sign language speaker on a green or blue screen and remove that background through some canvas processing before popping it on top of the video.

The world is all yours!

HOWEVER: There is one big caveat on all these specs – while they have all found entry into the HTML5 specification, it would be expecting a bit much to have browser support already. :-)

by silvia at May 01, 2011 08:12 AM

Silvia Pfeiffer : WordPress plugin for external videos updated


Over the last weeks I’ve updated my “external videos” wordpress plugin. I’ve fixed bugs and added some new functionality.

List of changes:

  • fixed a bug in attaching blog posts to videos for link-through from gallery overlays
  • allow re-attaching a different blog post to a video
  • added a shortcode that allows to link straight through to video pages instead of the overlay
  • fixed a bug on retrieval of keyframe for dotsub
  • added option to add the video posts to the site’s RSS feed
  • fixed a bug on image paths for the thickbox
  • made sure whenever a user goes to the admin page that the cron hook is active
  • changed some class names to avoid clashes with other plugins that people reported
  • turned simple_html_dom code into a class of its own to avoid clashes with other plugins that use this code, too
  • cleaning up entered data from surplus white space
  • styling fixes to the overlay on gallery
  • shielding against a bug with no videos on channels to retrieve yet

Download the new plugin version 0.13

Note: there is something weird going on with the wordpress plugins site, which still shows version 0.7 as the current one, but when you download it, it gets the latest version 0.12. If somebody knows how to fix this, that would be awesome. I think it also stops people from auto-updating this plugin, which is sad with this many improvements.
(I think I fixed it by actually changing the version number in the external-videos.php file – how silly of me – and thanks to the WordPress Forum person who pointed it out to me! Download 0.13 now.)

by silvia at April 29, 2011 09:10 PM

Monty : A Ghost update! For... last month!


Not actually last month, but pretty close at this point.

I never publically released the my previous Ghost update delivered internally to Red Hat at the beginning of the month because I hadn't finished some of the diagrams I wanted to do for the last section on chirp coding. Well, the diagrams are done! Here's the latest Ghost demo update, just in time for the next one to almost be due!

by Monty (monty@xiph.org) at April 27, 2011 05:27 PM

Monty : WebM Community Cross-License announced


In case folks have missed it (or worse, read about about it on Ars Technica)...

The WebM folks have finally finished up their work on the WebM Community Cross-License project and announced the license launch. This is a FOSS defensive license/pool similar to what a couple other groups are trying out (and similar to the defensive patent license that Xiph is already using for our parts of Opus within the IETF).

The basic idea of the cross-license is:

"Everyone is free to use any known or unknown WebM patents. Unless you sue over patents related to WebM. In that case, we all agree to yank your license."

In short, it's sort of a NATO for FOSS patents; a free license with an agreed-upon mutual defense clause that tries to enforce everyone playing nice. This strategy is not a new idea, but it's interesting that several different FOSS groups, Xiph and WebM included, are finally trying the idea for real in practice.

by Monty (monty@xiph.org) at April 27, 2011 01:09 PM

Monty : Opus (CELT) demo page 2: Listening tests


Greg Maxwell has just posted a nice second 'demo' page for Opus. It mostly covers the recent listening testing done by volunteers at Hydrogen Audio. Pretty colors and interactive listening/comaprison scripting!

For those of you new to Opus, it's the FOSS/RF codec we're working on within the IETF codec working group. Opus is a collaborative hybrid speech / high-fidelity audio codec built using primarily Xiph's CELT codec and the Skype SILK voice codec as inputs. That makes Opus similar in some ways to what MPEG is trying to achieve with USAC (Unified Speech and Audio Coding), though Opus is also ultra-low latency, so it looks like we're considerably ahead of MPEG here.

by Monty (monty@xiph.org) at April 15, 2011 03:24 AM

Silvia Pfeiffer : WebVTT explained


On Wednesday, I gave a talk at Google about WebVTT, the Web Video Text Track file format that is under development at the WHATWG for solving time-aligned text challenges for video.

I started by explaining all the features that WebVTT supports for captions and subtitles, mentioned how WebVTT would be used for text audio descriptions and navigation/chapters, and explained how it is included into HTML5 markup, such that the browser provides some default rendering for these purposes. I also mentioned the metadata approach that allows any timed content to be included into cues.

The talk slides include a demo of how the <track> element works in the browser. I’ve actually used the Captionator polyfill for HTML5 to make this demo, which was developed by Chris Giffard and is available as open source from GitHub.

The talk was recorded and has been made available as a Google Tech talk with captions and also a separate version with extended audio descriptions.

The slides of the talk are also available (best to choose the black theme).

I’ve also created a full transcript of the described video.

Get the WebVTT specification from the WHATWG Website.

by silvia at April 05, 2011 02:30 AM

Chris Pearce : HTML5 Video painting performance statistics in Firefox 5


I've landed video frame paint performance counters for HTML5 video onto mozilla-central. This should ship in Firefox 5, barring any disasters. This work was a combined effort by Chris Double and I. These are Mozilla specific fields which will only be available in Firefox.

The new statistics enable us to measure the performance of the video decoding and frame painting pipeline.

This adds the following fields to the HTMLVideoElement:
  • mozParsedFrames - A count of the number of video frames that have been demuxed/parsed from the media resource. If we were playing perfectly, we'd be able to paint this many frames.
  • mozDecodedFrames - A count of the number of deumxed/parsed video frames that have been decoded into Images. We skip decoding of parsed/demuxed frames if the decode is falling behind the playback position (this can happen if it takes a long time to decode a keyframe for example).
  • mozPresentedFrames - A count of the number of decoded frames that have been presented to the rendering pipeline for painting (set as the current Image on the video element's ImageContainer). We may not present decoded frames if the frame arrives for presentation late.
  • mozPaintedFrames - A count of the number of presented frames which were painted on screen. We may end up not painting presented frames if another frame is presented before the graphics pipeline has time to paint the previously presented frame, or if the video is off screen. 
  • mozFrameDelay - The time (as a floating point number in seconds) which the last painted video frame was rendered late by. This is the time duration between the decoder saying "paint frame X now", and the graphics pipeline physically getting frame X displayed on the screen. The value is accurate on desktop Firefox, but not on mobile. Improvements in the graphics pipeline, and the integration with the graphics pipeline, will show up as a decrease in this number.
Here's a demo of the video paint statistics in Firefox 5. You'll need a recent Firefox trunk nightly build for the demo to work.

by Chris Pearce (noreply@blogger.com) at March 31, 2011 02:58 AM

David Schleef : GStreamer SDI Capture Plugins


I’m getting ready to push several commits to the gst-plugins-bad source repository that add plugins for capturing SDI and HD-SDI using cards from two different manufacturers: BlackMagic Design‘s DeckLink, and Linear Systems SDI Master capture card.

The Linear Systems cards are probably better known by their reseller, DVEO. Entropy Wave uses both of these cards in the E1000 Live Encoder appliance, we’ve found that aside from some motherboard incompatibilities in the DeckLink cards, they both work great in Linux. While we’re primarily interested in live capture at the moment, output has also been implemented.

We slightly prefer the Linear Systems cards – mainly because the drivers are open source, but also because the API allows lower level access to the hardware, including SDI clocking and raw VANC and HANC data. It also allows subframe latency, although not implemented in the GStreamer plugin, it will be nice to use in the future.

In comparison, the DeckLink driver and SDK are not open source (which means I can’t fix any bugs), although they conveniently provide open source headers and shim code for interfacing with the SDK. This allows the GStreamer plugin to be completely open source and legally distributable separately from the SDK, but will only work if the SDK libraries and driver are present. Optical fiber connections are only available in the DeckLink, and the DeckLink cards tend to be less expensive.

It will take a few weeks for these to be available as part of a GStreamer release, however, they are available in the Media SDK now.

(Reposted from my Entropy Wave blog.)

by admin at March 24, 2011 07:35 PM

Silvia Pfeiffer : Ideas for new HTML5 apps


At the recent Linux conference in Brisbane, Australia, I promised a free copy of my book to the person that could send me the best idea for an HTML5 video application. I later also tweeted about it.

While I didn’t get many emails, I am still impressed by the things people want to do. Amongst the posts were the following proposals:

  • Develop a simple video cutting tool to, say setting cut points and having a very simple backend taking the cut points and generating quick enough output. The cutting doesn’t need to retranscode.
  • Develop a polyfill for the track element
  • Use HTML5 video, especially the tracking between video and text, to better present video from the NZ Parliament.
  • Making a small MMO game using WebGL, HTML5 audio and WebSockets. I also want to use the same code for desktop and web.

These are all awesome ideas and I found it really hard to decide whom to give the free book to. In the end, I decided to give it to Brian McKenna, who is working on the MMO game – simply because it it is really pushing the boundaries of several HTML5 technologies.

To everyone else: the book is actually not that expensive to buy from APRESS or Amazon and you can get the eBook version there, too.

Thanks to everyone who started really thinking about this and sent in a proposal!

by silvia at March 10, 2011 08:33 AM

Chris Pearce : Firefox 4 video decoder architecture


To assist others coming up to speed on the architecture of the video decoder, I've put together a diagram of Firefox 4's video playback engine. We rewrote our video architecture for Firefox 4 in order to give us better control over the complete stack.

Click on the image for a larger diagram.

The key classes in our architecture are:
  • nsHTMLMediaElement - This manages the JavaScript/HTML accessible HTMLMediaElement interface, and implements the resource selection, load, and preload logic.
  • nsBuiltinDecoder - Manages a main thread accessible snapshot of the state of the underlying decoder. The decoders run on non-main threads, and we don't want to block the main thread to dig into the decoders when JS queries playback state, so we maintain a snapshot of the playback engine's state in this class. This inherits from nsMediaDecoder. You can also implement playback support for a new format by inheriting and implementing nsMediaDecoder. nsWaveDecoder is currently implemented this way, but we're in the process of reimplementing that as a sublcass of nsBuiltinDecoderReader.
  • nsBuiltinDecoderStateMachine - Manages the decode, state machine, audio-push threads, frame queueing, A/V sync, and buffering logic. This ensures that all the HTML5 events get dispatched at the appropriate time, and that behaviour is consistent and sane across different media types. Demuxing is handled abstractly by subclasses of nsBuiltinDecoderReader. This way all media types can share as much playback logic as possible, reducing our maintenance overhead.
  • nsOggReader/nsWebMReader - Demuxing and codec specific functionality is implemented by subclassing nsBuiltinDecoderReader. This reduces the amount of work required to implement and maintain support for new codecs. When a new codec is implemented as a nsBuiltinDecoderReader subclass, support for HTML events, buffering, and playback logic does not need to be reimplemented, since it already exists in nsBuiltinDecoderStateMachine. To add support for a new codec, it's easiest to implement support as a new nsBuiltinDecoderReader subclass.
  • nsAudioStream - Our cross platform audio API wrapper. It is based on libsydneyaudio, which operates on a push model rather than a (more commonly used) callback-based model, which has brought in a whole raft of headaches. Matthew Gregan is in the process of rewriting our audio layer to a more sane callback based model. We also provide a cross-process nsAudioStreamRemote, which proxies audio commands to an audio stream in another process. This is required on mobile.
  • ImageContainer - When it comes time to present a video frame, nsBuiltinDecoderStateMachine sets it as the "current image" of the video element's ImageContainer object. This then propagates through the Layers/2D scene rendering system, and it eventually gets rendered on the screen. The Layers compositing runs on the main thread, and ImageContainer provides a thread-safe wrapper. The images contained in the ImageContainer can be in OpenGL/D3D surfaces, so we can take advantage of hardware accelerated scaling, rendering, and YCbCr to RGB conversion.
  • nsVideoFrame - This resides in layout, and manages the dimensions/reflow of the video, as well as its poster image.
  • nsMediaStream - Our network code runs on the main thread, but the underlying libraries we use for media decoding (libvpx, libtheora, etc) assume synchronous reads. We can't afford to do blocking reads on the main thread, so we cache the media data downloaded into the nsMediaCache, and provide a thread-safe wrapper synchronous wrapper for reading in the nsMediaStream class. We use Necko for our networking, so we can take advantage of all the existing security and load-group functionality it implements.
The advantage of controlling the entire playback engine are many. We can easily control frame dropping, memory allocation, the threading model, what, when, and how we decode, and we can integrate more tightly with our network stack.

by Chris Pearce (noreply@blogger.com) at February 21, 2011 01:58 AM

Silvia Pfeiffer : HTML5 Video Presentations at LCA 2011


Working in the WHAT WG and the W3C HTML WG, you sometimes forget that all the things that are being discussed so heatedly for standardization are actually leading to some really exciting new technologies that not many outside have really taken note of yet.

This week, during the Australian Linux Conference in Brisbane, I’ve been extremely lucky to be able to show off some awesome new features that browser vendors have implemented for the audio and video elements. The feedback that I got from people was uniformly plain surprise – nobody expected browser to have all these capabilities.

The examples that I showed off have mostly been the result of working on a book for almost 9 months of the past year and writing lots of examples of what can be achieved with existing implementations and specifications. They have been inspired by diverse demos that people made in the last years, so the book is linking to many more and many more amazing demos.

Incidentally, I promised to give a copy of the book away to the person with the best idea for a new Web application using HTML5 media. Since we ran out of time, please shoot me an email or a tweet (@silviapfeiffer) within the next 4 weeks and I will send another copy to the person with the best idea. The copy that I brought along was given to a student who wanted to use HTML5 video to display on surfaces of 3D moving objects.

So, let’s get to the talks.

On Monday, I gave a presentation on “Audio and Video processing in HTML5“, which had a strong focus on the Mozilla Audio API.

I further gave a brief lightning talk about “HTML5 Media Accessibility Update“. I am expecting lots to happen on this topic during this year.

Finally, I gave a presentation today on “The Latest and Coolest in HTML5 Media” with a strong focus on video, but also touching on audio and media accessibility.

The talks were streamed live – congrats to Ryan Verner for getting this working with support from Ben Hutchings from DebConf and the rest of the video team. The videos will apparently be available from http://linuxconfau.blip.tv/ in the near future.

UPDATE 4th Feb 2011: And here is my LCA talk …

with subtitles on YouTube:

by silvia at February 10, 2011 06:54 AM

David Schleef : FLAC code mirror


Since parts of SourceForge have been down for a while, I am making my git mirror of the FLAC CVS repository available.  It is here.  I’ve also made my fairly uninteresting patches available on the Entropy Wave Open Source site, here.

by admin at February 04, 2011 01:31 AM

Cristian Adam : Manifest this


In this blog entry I will talk about Visual Studio 2008 and the C runtime library.

Firstly we create a new Win32 hello world console application:

#include <iostream>

int main()
{
    std::cout << "Hello World" << std::endl;
}

The resulted hello.exe binary is 9KB large, and by using Dependency Walker we find that is dynamically liked to msvcp90.dll, msvcr90.dll, and kernel32.dll

One might think that in order to deploy this hello application you will only need msvcp90.dll and msvcr90.dll alongside the application, or to have them somehow installed in the system.

The latter option is preferred by most applications, you only need to install Microsoft Visual C++ 2008 Redistributable Package and that's it.

The redistributable package is 1.7MBytes in size, and is installed in c:\Windows\WinSxS directory, and it also includes MFC. But this is just a hello world application, surely we don't need to install MFC.

The good news is that you can copy from C:\Program Files\Microsoft Visual Studio 9.0\VC\redist\x86\Microsoft.VC90.CRT\ only msvcp90.dll, msvcr90.dll, and the funnily file named Microsoft.VC90.CRT.manifest. You don't need to copy msvcm90.dll because this is required for "Managed" C++ applications.

The manifest file is required because starting with Visual Studio 2005 the C runtime library (CRT) is packaged as Side-by-side Assembly. Applications need to have also a bit of magic in order to work with the new CRT namely the RT_MANIFEST resource embedded in the executable.

By using Resource Hacker we can take a peek at this RT_MANIFEST section:

<assembly manifestversion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <trustinfo xmlns="urn:schemas-microsoft-com:asm.v3">
    <security>
      <requestedprivileges>
        <requestedexecutionlevel level="asInvoker" uiaccess="false"></requestedexecutionlevel>
      </requestedprivileges>
    </security>
  </trustinfo>
  <dependency>
    <dependentassembly>
      <assemblyidentity name="Microsoft.VC90.CRT" processorarchitecture="x86" publickeytoken="1fc8b3b9a1e18e3b" type="win32" version="9.0.21022.8"></assemblyidentity>
    </dependentassembly>
  </dependency></assembly>

I have highlighted the version of  the CRT used.

What happens if you decide to install Visual Studio 2008 SP1? You would expect that you should have a different CRT version. That was the case for Visual Studio 2005, Visual Studio 2008 uses the RTM version also when you install SP1, as described in this Visual C++ Team Blog article.

The size of the Microsoft Visual C++ 2008 SP1 Redistributable Package changed as well, from 1.7MB to 4.0MB due to MFC Facelift... Feature Pack and C++ TR1 inclusion.

In order to have the new CRT referenced one needs to set the following preprocessor define _BIND_TO_CURRENT_CRT_VERSION. After doing so the CRT version becomes 9.0.30729.1.

After Visual Studio 2008 SP1 there was another CRT update due to ATL Security Update, which bumped the CRT version to 9.0.30729.4148. This version is referenced in "c:\Program Files\Microsoft Visual Studio 9.0\VC\redist\x86\Microsoft.VC90.CRT\Microsoft.VC90.CRT.manifest"

So we have build our hello application with version 9.0.30729.1 and we distribute the CRT with version 9.0.30729.4148, which of course doesn't work in practice.

Others have encountered this problem and they have taken drastic measures like reinstallation of Visual Studio 2008 and/or manually change the CRT version in "c:\Program Files\Microsoft Visual Studio 9.0\VC\include\crtassem.h" header file.

Since these manifest files are xml files why not just change them using the manifest tool?

The following two commands extract and repack the RT_MANIFEST resource:
mt.exe -inputresource:hello.exe;#1 -out:hello.manifest
mt.exe -outputresource:hello.exe;#1 -manifest hello.manifest

Now there is the issue of replacing "9.0.30729.1" to "9.0.30729.4148" in hello.manifest and since sed is not present on a normal Windows machine I have created a Javascript script which does this task, which is called like this:

cscript.exe replace_string.js hello.manifest "9.0.30729.1" "9.0.30729.4148"

This way we can update the CRT files without having to wait for Visual Studio to catch up.

They say Visual Studio 2010 has addressed the whole manifest nightmare and one can copy the CRT files alongside the executable file without any hacking.

by Cristian Adam (noreply@blogger.com) at January 19, 2011 12:50 AM

Cristian Adam : ActiveX <video> controls


The recently released OpenCodecs release, beside fixing bugs and updating audio and video codecs libraries adds support for "controls" attribute in the ActiveX <video> player.

Without further ado here is a video showing this feature:


The ActiveX <video> player uses DirectShow to render an URL. The DirectShow "File Source (URL)" filter downloads a file from beginning to the end and it doesn't support seeking until has downloaded the whole file. This behavior is unfortunate and I hope I can fix this by writing my own "File Source (URL)" filter.

The above video presents a Theora video, which at first it has 0:00 length and then it has the right duration. This is due the fact that the current DirectShow Ogg demuxer needs to have access to the whole file in order to build the seeking table, the ActiveX player knows when the download is completed and builds the seeking table at the end. WebM should not have this limitation.

The UI was borrowed from Firefox and I've used GDI+ to draw the controls. I hope to use Cairo in a future version, in order to have prettier controls and to share it with the Kate subtitles decoder.

This should make the ActiveX player a bit more usable. In the next version besides having a new URL source filter I plan to add scripting support in order to control the ActiveX player from JavaScript code.

by Cristian Adam (noreply@blogger.com) at January 13, 2011 12:21 AM