Microsoft Kinect – Social Video Platform

August 30, 2012 Bozhidar Lenchov

This article is co-authored by Jason Chutko and Bozhidar Lenchov.

The way humans are physically interacting with technology is currently a hot topic. Traditionally, mouse and keyboard were the primary method. More recently, touchscreen inputs have risen in popularity through smartphone and tablet usage, which has pushed developers toward exploring new alternate methods of user interactions with the world around them.

In the past few weeks, the Xtreme Labs R&D team had the opportunity to work with Microsoft’s Kinect for Windows. This version of Kinect differs from the original Xbox 360 version, as it contains API improvements and skeletal tracking controls. We challenged ourselves to explore this new platform by creating an application which would let human gestures control a social video player. Over the course of a few week, we extended the app to perform many tasks:

  • Display YouTube videos, channels, and perform playlist keyword search
  • Search for Facebook user videos
  • Keyword search Facebook public posts for YouTube videos
  • Local (machine-specific) playlist and automatic playback for content in search results or local playlists
  • Audio and gesture support for common commands

The application allows the user to control a stream of YouTube and Facebook videos from the public domain and/or the user’s social circle. Hand gestures for controlling video playback include support for selecting a video to play, maximizing and minimizing the player and controlling playback (pause/play/next/previous video). Voice commands for video searching are also supported, and accept multiple keyword searches.

The tools and frameworks used in the development process include:

  • Microsoft Blend (UI design), Microsoft C# Visual Studio (gesture and backend logic) and MVVM UI pattern for event-driven applications
  • Kinect Toolbox 1.1 and 1.2 (gesture support)
  • Facebook C# SDK 6.0 and Json.NET parser
  • YouTube native API player and VideoJS HTML5 player (for Facebook MP4 videos)

Here are some of the major lessons learned throughout the process and possible solutions left for future iterations:

1. Gesture calibration (for the controlling user) and movement stabilization is paramount for a great user experience

  • Simple setup, limited to two or three steps
  • Training should show up when a gesture is about to be used
  • Individual save profiles with different gesture values

2. Current version of Kinect does not support tracking of the individual fingers, limiting gesture complexity

  • More complex gestures also come at the cost of needing to teach the user how to perform the gesture, hence reducing its intuitiveness
  • Care should be taken in gesture design, so that users do not inadvertently transition into another gesture after just completing one (e.g. relaxing their hands after performing an action)
  • Gestures for engaging/disabling the gesture recognition system or passing control to another user are effective solutions for preventing gesture recognition from people in the surrounding background area

3. Security concerns for virtual on-screen keyboard

  • Consider support for hardware registration on hosted server, where users can configure passwords for YouTube/Facebook accounts that the program can automatically retrieve based on a hardware signature
  • Be aware of issues around reselling devices, eavesdropping on transmissions, and account hijacking

4. Audio commands require user to be close to receiver, whereas gestures expect user to be at least 1.0m away from the receiver

  • Background noise is an issue
  • Users can wear portable microphones which feeds into Kinect hardware
  • Perform audio filtering for possible commands with AI/Machine Learning algorithms

5. Program controls should be independent of hand used (i.e. supporting both left and right controlling hand for gestures)

  • If used with calibration, user can set dominant hand and appropriate UI hand element can be displayed
  • Screen movement could be represented by a visual element of both hands (not just one hand moving in a direction)
  • Hand commands should be equivalent for both hands (e.g. right hand moving right or left hand moving right should cause the same result, if possible)

We at Xtreme Labs R&D welcome the opportunity to explore these emerging design challenges in motion interaction. Our work with Kinect goes on, and we will continue experimenting with how this field intersects with social media.

About the Author


Two Heads Are Still Better Than One
Two Heads Are Still Better Than One

"With so many conflicting opinions on what the best practice is, it’s no wonder there’s so much commotion o...

Hardware Hacking Meeting
Hardware Hacking Meeting

Events Thursday: Hardware Hacking Meeting today in this room @ 12:30