Friday, November 18, 2011



I put together a quickie portfolio site, which should make it easier to find the good stuff on this blog.

Tuesday, November 15, 2011




I've started a new blog about compressive structures. Bit of a niche thing, but an abiding fascination, so I thought it deserved its own dedicated corner of the internet.

Tuesday, November 08, 2011

The Kinect: A Camera that Gives Good Zed





Earlier this year, Monique Priestley interviewed me for my thoughts on the Kinect as part of a project for her Master of Communication in Digital Media program. She was interviewing developers who had produced notable kinect demos. My replies to her questions are pretty well-trodden ground by now, but I thought I'd share anyhow. For them what's innerested, you can see a video of the demo I wrote here.

What is the Kinect allowing you to do that you weren't able to before?

I usually say, the Kinect: finally a camera that gives good zed.

When you do computer-vision, you at first always have to remind yourself that you only have two dimensions. Depth information is not available. Our own depth-perception is so instinctual that you actually have to become habituated to not having it available when you do computer vision.

But the kinect changes all that.

[Later thought: For most of us in machine vision, the arrival of the Kinect was rather like this moment from the Simpsons.]

It bears mentioning that depth information actually could be had before the Kinect came along; only the solutions tend to be expensive (both in money and CPU cycles), or pain to set up, and they didn't work very well; there was dense stereo, various types of structured light, and then very nice but exceedingly expensive things like time-of-flight sensors.

As for what having fast, accurate depth information allows, we're all still figuring that out, and expanding our imaginations into the new dimension.

What uses do you see for the Kinect?

I suspect the technology will find many, many practical uses that none of us even suspect. I, unfortunately, appear to only be able to think of completely useless but hopefully amusing and surprisingly software toys.

Where is it the most groundbreaking?

Again, it's simply the fact that we have fast, accurate, inexpensive, and invisibly acquired depth information.

What would you change? Why?

I would like there to be a model that worked at closer range. The current kinect doesn't work for distances closer than a meter or so. The designers apparently had full-body tracking in mind for their target applications. But it would be lovely to be able to get more detailed information up close.

What would you like to see in v2?

See my last comment, but finer detail would be sweet as well. You know: more, better, faster, cheaper.


How would you like MS to support you?

In general I aim to have as little to do with MS as possible, personally and professionally. I'm mostly a Linux guy, though I have a windows box and a mac as well.

The Kinect is the first cool MS product I've ever wanted to have anything to do with. MS's lurching and conflicted response to news of the hacking of the kinect shows me that the company leadership really didn't really grok what it was they had on their hands, and just as well: if they had known of its susceptibility to cracking, I have little doubt that Steve Ballmer would have insisted it be encrypted.

Indeed, that's one of the most intriguing things about the way this all came about: MS could trivially have made it more or less impossible to hack, and in fact, under the DMCA, a crime to even try. But for some reason, the technologists creating the kinect declined to do so. I like to think it was a mischievous sense of rebellion on the part of the kinect team, or perhaps it was just their instinct as geeks to share the coolness of what they did.

So I don't want libraries, SDKs, training videos, any of that nonsense. No ghastly "community" website featuring all those tone-deaf ideas of what the marketing department thinks hackers like. I would like them simply to decline to rescind the access we already enjoy.

Do you expect/need Microsoft to open up a set of developer tools to make tweaks/hacks easier to create?

My feeling is that nothing more is really required from them, at least as far as software is concerned. I certainly don't need anything else. I have the image and the depth map. The body-tracking code might be nice, but I think there is an open-source library that works fine.

They could make the next version twice the resolution, say, and I would be happy.

I would love to have the kinect integrated into some PTAM/VSLAM stuff -- i.e., dense 3D reconstruction along with online bundle adjustment. That would be seriously kick-ass. But MS doesn't have to implement it. I have no doubt whatsoever that people are busily working to incorporate the kinect into PTAM and similar.


What do you see as "next" after this?


I'm still grappling with the implications of what the current one makes possible. Hm, what's next....

What was the inspiration behind what you've tried?

I work full-time at Weta Digital, where we work long hours, so I had very little time to work on Kinect hacking. So my aim was to simply make something, anything, as quickly as possible, and get a video out there. There was a real gold-rush mentality for a few crazy weeks there, everybody doing exciting new things.

For a while I could only watch with my nose pressed to the glass, since it wasn't available for sale in New Zealand until well after it was released in North America. I bought mine on the first day they went on sale here.

Have you tried similar stuff with previous tech? If so, what made the Kinect different?

I've done a lot of experimentation with real-time interactive video over the past several years. I used to work in computer vision and vision-guided robotics, which gives me a lot of skills that are useful for interaction design. There are all kinds of nifty things you can do with current tools and techniques. Here's an overview of the stuff I did pre-kinect:

http://methodart.blogspot.com/2009/06/engine-room-audition.html

Good depth information makes a huge difference, makes things that are hard easy, and makes things that weren't possible doable.

A specific example: One trick that you frequently have to do when designing an interactive video installation, is figuring out what is foreground and what is background. For instance, people walking in front of the camera are foreground, and the wall behind them is background.

There are a couple ways to do this. One is to use background subtraction techniques, but they are computationally expensive, sensitive to changes in lighting, and overall, they just don't work that great. Another way is to use an infrared light source and blast it on the scene, along with a separate IR camera, and segment the foreground based on what is bright in the IR image.

But with the Kinect, you simply use the depth map. Anything closer than some threshold is foreground. Done. Easy-peasy.

Do you expect to take what you've learned over to another platform/tech?

Unlikely. There just isn't any comparable tech out there, unless you count time-of-flight sensors, which are low-res and fabulously expensive.

What is your favorite part about developing different tweaks using the Kinect?

The greatest thing is just the sense that we have a whole new blank canvas to work on. Interactive video was starting to get a bit stale -- everyone had basically done everything.

What is the most frustrating part - biggest drawback?

It feels like a quibble, but the fact that the camera and the depth sensor are slightly offset from one another causes a "shadow" effect along one side of object when you project the colour info onto the depth map. But that is a mere quibble.

Thursday, October 20, 2011

Interpolative Dance




I wrote a paper about six months ago on, ahem, "RBF Interpolation as a General Architecture for Expressive Middleware". It was written great haste -- I had 12 days to 1) write it, 2) write the software demoing the concept, 3) find a dancer to work with and 4) move house from New Zealand to Canada. Oh, and I had a bad cold as well. And all for nought: the paper was rejected -- a rookie effort, after all, and, as it turned out, I could only offer a tantalizing but null-ish result. Basically, if you have good mocap data, the approach works a treat; but the OpenNI pose tracking isn't quite ready for this type of application (there has been another release of the SDK since I wrote the paper, and it's probably worth taking another kick at the can). I was invited to resubmit it as a two-pager, but I ran out of time.

But whatever. I still think it's a very good idea, and I really think someone should make a system that works like this, and I think RBF interpolation is an intuitive and powerful tool for media arts applications. If anyone is interested in the source code, drop me a line. If I have time to clean it up a bit (not likely any time soon, since I'm up to my eyeballs in a very tricky consulting gig), I'll post it here on the blog.

So here's the paper. Let me know what you think. I'll write more later to explain the concept, which is simple and elegant. I can say so since it's not particularly original -- it just adapts J.P. Lewis' notion of pose space deformation to media arts applications. The pun in the title is mine, though. I'm not sure JP would want to take credit for that one.

I should point out what this approach is not: it is not simply mapping individual inputs to individual outputs. That is exactly the sort of crabbed, awkward, inflexible approach that I am proposing an elegant alternative to. Instead it is keyframing an arbitrary number of inputs to an arbitrary number of outputs all at once. Read the paper for details.

Friday, August 26, 2011


I'm working a variety of tools right now. One is a vector field design tool, and another is a tool for doing region-growing/clustering on mesh faces and vertices. So now I can design an embedded field on a mesh, and then create a cluster error measure that is aware of that field to do field-aware clustering, so the resulting tiling flows very naturally along the field direction.

Nuther one, with denser clusters:



There's a field source at the center:



Slightly more elaborate field:


Tuesday, August 02, 2011



Another experiment with generating semi-regular planar tilings of surfaces. The previous post featured meshes which were convex, which makes things much easier. I wanted to extend the technique to surfaces that have negative curvature, which is a trickier problem. There is a lovley paper from 2007 ("Constrained Planar Remeshing for Architecture" by Barbar Cutler and Emily Whiting) that I read a while ago that I couldn't find a copy of, and my recollection was that they do VSA, as I do, and then follow up with some kind of local tweaks of vertices to get the polygons to be planar. I decided instead use nonlinear optimization to tweak all the vertices to make the polygons planar, while remaining as close as possible to the original surface. It's similar to what Yang Liu, Helmut Pottman and the rest do in a paper from 2006 ("Geometric Modeling with Conical Meshes and Developable Surfaces"), except they are concerned with quad-dominant meshes.

The video makes it appear as though the optimization is essentially instantaneous, but it actually has to crank away for about 20 seconds. It generally gives good results, though sometimes it results in a few nasty looking polygons with long slivery bits, or interpenetrating polygons. Fiddling with the cost function would probably fix that.

Negative curvature will result in concave polygons. I wasn't sure if I liked the look of them at first, but they've grown on me.

Monday, July 18, 2011



Some experiments using clustering on meshes to create semi-regular planar tilings of surfaces. I use a variant of a technique called Variational Shape Approximation (VSA), from one of the most cited papers published by the Caltech Applied Geometry group. The algorithm is simple, and moreover has the nice property that it's easy to retrofit with new error functions. The so-called L2,1 metric that they use in the paper, which is based on similarity of normals, resulted in too many long, skinny regions, and gave tilings which differed considerably from the clusters. After a bit of experimentation I came up with a cost function which favours compact regions, along with similarity in normals, and it works really well, resulting in planar regions which are very close indeed to the clusters.

It works well on a big range in numbers of clusters. The above example uses 300; this one uses 50 (the first frame is mostly blank due to an ffmpeg glitch):