The Kinect: A Camera that Gives Good Zed
Earlier this year, Monique Priestley interviewed me for my thoughts on the Kinect as part of a project for her Master of Communication in Digital Media program. She was interviewing developers who had produced notable kinect demos. My replies to her questions are pretty well-trodden ground by now, but I thought I'd share anyhow. For them what's innerested, you can see a video of the demo I wrote here.
What is the Kinect allowing you to do that you weren't able to before?
I usually say, the Kinect: finally a camera that gives good zed.
When you do computer-vision, you at first always have to remind yourself that you only have two dimensions. Depth information is not available. Our own depth-perception is so instinctual that you actually have to become habituated to not having it available when you do computer vision.
But the kinect changes all that.
[Later thought: For most of us in machine vision, the arrival of the Kinect was rather like this moment from the Simpsons.]
It bears mentioning that depth information actually could be had before the Kinect came along; only the solutions tend to be expensive (both in money and CPU cycles), or pain to set up, and they didn't work very well; there was dense stereo, various types of structured light, and then very nice but exceedingly expensive things like time-of-flight sensors.
As for what having fast, accurate depth information allows, we're all still figuring that out, and expanding our imaginations into the new dimension.
What uses do you see for the Kinect?
I suspect the technology will find many, many practical uses that none of us even suspect. I, unfortunately, appear to only be able to think of completely useless but hopefully amusing and surprisingly software toys.
Where is it the most groundbreaking?
Again, it's simply the fact that we have fast, accurate, inexpensive, and invisibly acquired depth information.
What would you change? Why?
I would like there to be a model that worked at closer range. The current kinect doesn't work for distances closer than a meter or so. The designers apparently had full-body tracking in mind for their target applications. But it would be lovely to be able to get more detailed information up close.
What would you like to see in v2?
See my last comment, but finer detail would be sweet as well. You know: more, better, faster, cheaper.
How would you like MS to support you?
In general I aim to have as little to do with MS as possible, personally and professionally. I'm mostly a Linux guy, though I have a windows box and a mac as well.
The Kinect is the first cool MS product I've ever wanted to have anything to do with. MS's lurching and conflicted response to news of the hacking of the kinect shows me that the company leadership really didn't really grok what it was they had on their hands, and just as well: if they had known of its susceptibility to cracking, I have little doubt that Steve Ballmer would have insisted it be encrypted.
Indeed, that's one of the most intriguing things about the way this all came about: MS could trivially have made it more or less impossible to hack, and in fact, under the DMCA, a crime to even try. But for some reason, the technologists creating the kinect declined to do so. I like to think it was a mischievous sense of rebellion on the part of the kinect team, or perhaps it was just their instinct as geeks to share the coolness of what they did.
So I don't want libraries, SDKs, training videos, any of that nonsense. No ghastly "community" website featuring all those tone-deaf ideas of what the marketing department thinks hackers like. I would like them simply to decline to rescind the access we already enjoy.
Do you expect/need Microsoft to open up a set of developer tools to make tweaks/hacks easier to create?
My feeling is that nothing more is really required from them, at least as far as software is concerned. I certainly don't need anything else. I have the image and the depth map. The body-tracking code might be nice, but I think there is an open-source library that works fine.
They could make the next version twice the resolution, say, and I would be happy.
I would love to have the kinect integrated into some PTAM/VSLAM stuff -- i.e., dense 3D reconstruction along with online bundle adjustment. That would be seriously kick-ass. But MS doesn't have to implement it. I have no doubt whatsoever that people are busily working to incorporate the kinect into PTAM and similar.
What do you see as "next" after this?
I'm still grappling with the implications of what the current one makes possible. Hm, what's next....
What was the inspiration behind what you've tried?
I work full-time at Weta Digital, where we work long hours, so I had very little time to work on Kinect hacking. So my aim was to simply make something, anything, as quickly as possible, and get a video out there. There was a real gold-rush mentality for a few crazy weeks there, everybody doing exciting new things.
For a while I could only watch with my nose pressed to the glass, since it wasn't available for sale in New Zealand until well after it was released in North America. I bought mine on the first day they went on sale here.
Have you tried similar stuff with previous tech? If so, what made the Kinect different?
I've done a lot of experimentation with real-time interactive video over the past several years. I used to work in computer vision and vision-guided robotics, which gives me a lot of skills that are useful for interaction design. There are all kinds of nifty things you can do with current tools and techniques. Here's an overview of the stuff I did pre-kinect:
Good depth information makes a huge difference, makes things that are hard easy, and makes things that weren't possible doable.
A specific example: One trick that you frequently have to do when designing an interactive video installation, is figuring out what is foreground and what is background. For instance, people walking in front of the camera are foreground, and the wall behind them is background.
There are a couple ways to do this. One is to use background subtraction techniques, but they are computationally expensive, sensitive to changes in lighting, and overall, they just don't work that great. Another way is to use an infrared light source and blast it on the scene, along with a separate IR camera, and segment the foreground based on what is bright in the IR image.
But with the Kinect, you simply use the depth map. Anything closer than some threshold is foreground. Done. Easy-peasy.
Do you expect to take what you've learned over to another platform/tech?
Unlikely. There just isn't any comparable tech out there, unless you count time-of-flight sensors, which are low-res and fabulously expensive.
What is your favorite part about developing different tweaks using the Kinect?
The greatest thing is just the sense that we have a whole new blank canvas to work on. Interactive video was starting to get a bit stale -- everyone had basically done everything.
What is the most frustrating part - biggest drawback?
It feels like a quibble, but the fact that the camera and the depth sensor are slightly offset from one another causes a "shadow" effect along one side of object when you project the colour info onto the depth map. But that is a mere quibble.