SEGD talk - Direct and Ambient Interaction
April 14th, 2011


Andrew Lovett-Barron
Interaction Designer, technologist and programmer

Initial interest in synthesizers led to Interaction design and broader exploraiton
Sites like CreateDigitalMusic introduced to greater things
Currently workin Processing/Java, Python, Arduino C, Javascript/Jquery, and Max/MSP/Jitter,. Learning ObjectiveC, C++, and Django/GeoDjango framework for Python.


The great enabler of Interaction, or at least that which makes it ubiquitous, is technology, and we're swimming in it. Unfortunately, on a day to day basis, we're really only interacting with the lowest common denominator, and as pretty as it is, unfortunately your iPad falls in to that category.

I'm going to talk about two core types of interaction and its technology, which I'd generally refer to as direct and ambient interaction design.

Ambient interfaces are not this. They're more subtle and while maintaining a defined feedback loop, it's not the 1:1 of the direct interface. If anything, it is many to one or many to many in nature. I'll discuss this later.

First, direct interfaces.
We've gotten so used to the keyboard and the mouse that we tend to think of these as core to computer use, but from a mechanical perspective, all these really give us are on/off button states, and delta values between arbitrary X/Y coordinates. To this section, I'd also add the standard array of buttons, potentiometers, sliders, and other physical interface devices which have been core to interacting with analog electronic devices. As mentioned before, the keyboard is just an array of on-off state, and the classic ball mouse is a pair of optical encoders.

These kind of mechanical pieces are incredibly rich for the range of interactions that they allow us. The keyboard, for example, has seen its success not for its mechanical complexity, but for its conventional richness. We learn to type, typing is standardized. QWERTY, DVORK, whatever, you've an incredibly rich way of physically interacting with the computer for a certain number of tasks. WASD is another convention built on those, using the physical orientation of the keys as central, but describing it using the convention of their layout.
This issue of convention is one I'm going to return to, but let's go through some more of the technology and methods first.

Most physical interfaces like the keyboard, mouse, joystick, etc. are really just combinations of different physical sensors combined to form something a bit more complex. For example, a joystick is really just two potentiometers inside a physical housing, with springs and supports that apply the actual tension. That interaction is relatively complex and is used to control everything from robot arms to fighter planes, mapping effectively on top of our understanding of space and oriention.

There's tons of different ways to go about it. The point is that we see an emergent complexity of use come out of well designed physical objects. Think about what MIGHT work away from the technology, then break down those motions and interactions and just give it a try. Don't be afraid if it's new, or unfamiliar. Just think simple and think "what if." At the very least someone will make an instrument of it.

Multitouch technology is broadly similar to mice and keyboards: though in this case identifying absolute x/y coordinates on a plane and on/off touch points. Multitouch is interesting because it's actually been around since the 80s, and can be accomplished in so many different technical fashions that it's probably one of the most versatile methods for interacting with a surface or plane that we have.

For example, Frustrated total internal reflection is a method that sees ones touch disrupt the reflective integrity of an internally reflective surface, such as acrylic. By saturating the internal area with diffused infrared light and then pointing an IR camera at the back of the surface, you can create a damn cheap and easy multitouch surface. Only problem is that you need a bit of space behind it.

Or you could use capacitive touch, which is activated by the conductive properties of the human body, allowing for the positioning of those values, even their relative measurement over a spot: basically touch hovering. Apple has a patent on this, but considering it was invented in the 80s by a U of T team and expanded in countless projects since, I wouldn't worry about that.
The project I'm showing here is Stefan Powell's vagduino, a capacitive touch project. I'm actually filling in for him today, he's a Toronto based electronic and computational artist.

Other methods include audio-piezo based solutions, placing multiple vibration sensors around a surface and triangulating touch based on the vibration of the surface between the sensors, or other optical based solutions using computer vision techniques. An early Kinect hack, for example, simulates a multitouch surface for any surface by associating hand movement and a physical plane in the kinect's view.
Long story short, there's a multitude of solutions for touch technology that we're going to be seeing them more and more as the technology becomes cheaper and the expectation of touch ingrains itself in to our minds. Already we're seeing conventions around touch interfaces spring up, where we develop a language of complex interactions around this idea of touch and touched patterns. What we haven't seen yet is a complexity to the interactions emerge like we have with keyboards and mice. The central thinking is there, for example the idea of augmentation of central interactions, like command - C meaning copy. The blackberry Playbook, for all its mindnumbing of flaws, has a brilliant idea around enabling richer multitouch interaction, using the black bezel around the device as the equivilent to a multitouch shift key, allowing different actions to occur depending on whether you start a gesture on the screen or on the bezel. I'm really hoping this isn't locked in to a patent vault, because it'll allow multitouch to begin competing with the kind of richness witnessed in keyboard/mouse based operating systems.

So on the topic of gesture, another interactive technology which has similiarly been around since the 80s is that of camera driven gestural interaction, basically body-based interactions.

My favorite example is David Rokeby's Very Nervous System, which was originally begun in 1982 (PLAY MOVIE CLIP).
Gestural interactions are a sort of holy grail for a lot of interaction designers, and rightly so. Body language is an incredible rich, incredibly meaningful way of communicating. It communicates more honestly than words and with greater complexity than touch, and it's something that we're all attuned to culturally and in some cases genetically.

Figures like John Underkoffler and his work through Oblong Industries, MIT, and the Minority Report movie came to the fore because of their work with gesture, and the way they displayed easy, intuitive movements as the future of human computer interaction. That said, if you think the standing desk is a weak answer to the sedentary 8 hour work day, you're going to enjoy the eight hour calisthenics of this vision of the future.

Gestural interaction is made possible using computer vision and spatial sensors. Some early examples of the spatial sensor variety are actually from Nintendo. The nintendo power glove, for all its absurdity, is actually pretty impressive for being an early spatial sensing device released in the late 80s. This thing could recognize the X/Y/Z position of the glove via ultrasonic speakers and triangulation between them.

Likewise, the Nintendo wiimote a decade and a half later was similarly impressive. It used a combination of accelerometer data and triangulation through infrared light WITHIN the Wiimote, so the Wiimote is actually measuring the distance between the two infrared LEDs mounted on top of your TV. As a slight aside, a LOT of these techniques use infrared light, but in all cases as a point of reference. In the same way that mariners use lighthouses as way-finding tools, computers can apply various wave-form based inputs for their direction finding methods.

Anyway, jump up to about six months ago, with the release of the Microsoft Kinect and the explosion of interest in gestural tech. The Microsoft Kinect is a computer vision oriented camera system designed for the Xbox which was hacked for do-it-yourself computer vision projects within a couple of days, thanks to Johnny Lee, the wonderful Limor Fried (also known as Lady Ada in DIY circles) and Hector Martin, a ps3 firmware hacker and the first to release libraries for the Kinect.

The Kinect works by shooting out pulses of infrared light in a pattern known as "structured light," which are then captured by a camera with an visible light filter, only seeing the infrared pattern. The camera interprets the distortion of the known light patterns to create a map indicating the distance away from the point of capture. That's it. It's the same as a photograph, with the infrared beam as a flash, and with the same weaknesses. Actually, it has more because of the physical distance between the RGB camera and the Infrared camera, creating a disparity in angle of the images recieved. So why is it so valuable?
Mostly because it's cheap like nothing else, and really, really accessible. In the same way that the lesser known playstation 3 eye swept through the DIY computer vision circuit before it, an accessible, hackable piece of well engineered gear makes all the difference in bringing certain project ideas together, especially when making that one piece of gear is either prohibitive or infinitely more time consuming than the project you had in mind. We innovate as opportunists.

The other great value of the kinect is within what we can call the Region of Interest. By separating depth from the RGB camera image, we're able to use both to create a richer image that before. The biggest challenge of computer vision work lies in defining meaningful information and distinctions within a system. What the kinect allows us to do is narrow that focus and remove a level of error when looking for these patterns: instead of just going through a flat image, it's instead only focusing on the weird round thing extruding above what could be a set of shoulders.

We then use tools like Haar-like feature sets and cascades to understand shape and distinction between objects. This is generally how facial recognition and gesture recognition work: by training a computer to identify various visual patterns inherent in a physical object, the computer can then loop through the frames afterwards to see if it recognise the object's proportions/

Most computer vision, at least that I'm aware of, uses techniques like this to discern between objects, and the applications are massive: whether in the realm of facial recognition, hand recognition, gesture, whathaveyou. It can all be trained in the computer, and that training can then be shared, expanded, and contextualized.
So that's a bit of an overview for the big three direct interaction methods and technologies we use now: inherently physical items like the keyboard and mouse, multitouch, and gestural. Grand, so let's talk about some stuff that might be coming.

Environmentally or Ambient interactive refers to interactions and data gathering which occurs outside the direct, immediate physical interaction and goals of the user. These are interactions a step up: they're often initiated by the user through their activation and generate their output through an aggregate of user actions and movements. They don't reward any one action to be repeated time and time again, but instead focus on behavioral states or trajectories.

Let's use Angry birds and jogging as an example.
Angry birds is a 1:1 association. You aim for a desired trajectory using the X/Y position on a multitouch screen and basic dragging motions, and self-correct the assocation between your xy movement and impact it has on the pigs. As a consequence of the direct action-based association and the visualization of the results, we're able to quickly and effectively map a cause-effect relationship beween your actions and the completion of your goal: pig murder.

Now take up jogging. Go running for two months. Track your distance, routes, and timing. Notice that you're not doing too well on Tuesday nights, 'cause you tend to work longer those days. Adjust to running in the morning. Continue to track, continue to adjust. Find you can go longer on same days, change your route over all, have a friend comment that you've lost weight, despite not noticing yourself. Don't run for a month and notice that you've put on weight. Repeat. This is a longer, more self reflective feedback loop. Many actions, decisions, and variables are associated with the goal you're setting off on, but the outcome is usually a singular thing, or combination of things under a single abstract goal, such as losing weight or getting in shape. So how do you make this more effective and how do you accomplish these kinds of goals when they can't be coupled with tight feedback loops?
This is a major concern of the "big data" focus in business: understanding trends and behavior from the aggregate value of many smaller actions, instances, or points of data.

Already, we're seeing products emerge which are little more than data loggers aimed at informing users about their own behavior and trends: for example, the Fitbit is a small clip on object which is just a pedometer with the heavy analytics behind it. You wear them 24 hours a day, seven days a week, and what you get is a cohesive image of your activities.
So, designing for this can be done using any number of technologies but I'm going to go over a couple of key ones which allow for a more direct "Interaction Design" element than simply data monitoring and information visualization.

First is the idea of ubiquitous sensor environments, then a look at three big things which I think are worth knowing about now: Neural interfaces and eyetracking.
Sensor networks are not anything new. The grandfather of Ubiquitous Computing, Mark Weiser, proposed these (like many things) back in the 80s. In fact, they're very much our present.

So high level overview: networked and ubiquitous sensor environments are made possible by cheap, semi-disposable sensors, the idea of shared network access, whether direct through small modems, self-healing as in mesh networks between devices as in the Xbee radio, shown above, or telecom networks using technology like that in our cellphones.
Whether in our home, the environment we walk in, attached to our pants as in the fitbit, our bike, or our car, this tecnhology is not only already here, but going to have MASSIVE implications for our behavior and understanding of what human computer interaction means. I'm going to stop myself here because I'll end up going on a long tangent, but here are a few things I'd like you to read if you're interested.

First is Shaping Things by Bruce Sterling. It's about objects and how the physical and digital worlds are rapidly becoming difficult to distinguish. It is short, well written, and describes the trajectory of change and its context all the way through.

The second is Adam Greenfrield's Everyware, which is a high level overview and important for understanding both context and implication. He goes in to theories and examples of both possibilities and peril, and gives a broader social context for these things.

Third is Mike Kunivasky's Smart Things. He's a founder of adaptive path and of the ThingM company, which creates objects and building blocks for a DIY internet of things. This book is about the "How to" of ubiquitous computing user experience and interaction design, and goes in to case studies of successes and failures we've seen so far. It's a great read and I feel rounds out the previous two

And I'll stop there, though grab me later if you want to chat.

Apart from the environmental, we have the very, very personal.
"Brain Computer interface" are still somewhat in the realm of science fiction, but moving up pretty quickly. As we speak, my little brother is probably in a lab at Columbia University shooting lasers at living rat neurons, attempting to understand how neural networks work communicate . Likewise at Columbia, Mark Collins and Toru Hasegawa are working at Cloud Lab, a part of Columbia's architecture and design dept exploring new technology, including Brain-Computer interfaces. (http://www.thecloudlab.org/bci.html) Some of the stuff they've done so far is fascinating. An example is mapping EEG readings on to built spaces as a measure of attention or restfulness associated with the surroundings.

The use of this technology for direct interaction though is, at best, a bit iffy. EEG technology is fantastic for understanding certain qualities and states about the human mind, but not that great for understanding goals, direction, or intention in an immediate sense.
There are a few commercial EEG companies developing tools and techniques for dealing with this kind of input

I managed to get a beta version of the Neurosky mind band recently, which is a single sensor EEG in a headband form factor which they consider "unobtrusive" and wearable in public space. I end up looking like what the lady in the background referrers to as a "Future Jogger," so dunno about that.

There are a few applications that I'm playing around with, such as concentration based photo capture and video manipulation. A lot of the applications we've seen for direction interaction are a bit ridiculous: Toronto's InterAxon, for example, uses their Neurosky powered chairs to change the CN Tower based on EEG outputs, signal processed in to values for concentration and lack of attention. Concentration is a big one, with other companies making levitation toys and similar. But none of this is really all that great.

My suspicion is that EEGs are going to be a valuable way of augmenting direct interaction and facilitating ambient interaction through focus, concentration, and awareness. Similar to what I mentioned before with physical interfaces, I suspect the value of brain-computer interfaces will emerge as a component of a more complex whole. An example might be a networked home connected to EEG, eye tracking system, and your body language enabling the dimming or light-based desaturation of our surroundings if we're REALLY focused on a writing project, because the system can tell that you actually are focused on such a project.
Brain-computer interfaces could, in a somewhat ironic way, permit a more natural approach to sleep, work, and health. The data from a continuously attached EEG could permit our actual physical patterns to become agents for interactivity. There's a massive amount of research and product design to be done before this becomes a reality though, but it's happening and happening fast. Let's just make sure that as designers, we're not only ready, but active in the exploration.

Lastly, I want to chat about eyetracking:
Eyetracking based solutions are something that interaction designers are familiar with through user testing scenarios, but as an interaction method, i think it's poorly explored. Zach Lieberman, Evan Roth, and the Eye Writer team has been exploring the use of cheap eye tracking applications for helping a friend of theirs, TEMPT. TEMPT is a Graffiti artist who was paralyzed in an accident, and the team at FFFAT labs and elsewhere joined up to built the EyeWriter system, a cheap physical and digital interface for TEMPT to be able to continue working on his art when he didn't have the use of his limbs.
EyeWriter Initiative - Most of these images are from this source as well.

They've released the specs and instructions for building your own on Instructables, which I strongly recommend taking a look at. The general idea is this (copied form the instructables):
The 2.0 system works by strobing 3 IR illuminators every frame. On even frames, it uses the center illuminator (located around the camera lens) and on odd frames it uses the 2 side illuminators. On even frames, the pupil appears bright, since the IR light is actually bouncing off the back of your eye, like red eye effect. On odd frames, your pupil appears dark. The difference between the two allows us to isolate and track the pupil in realtime. Additionally, the glints (reflections of the IR illuminators) of the dark frame are tracked, and these, plus the info on the pupil, is calibrated to screen position using a least squares fitting process for an equation that provides a mapping of glint/pupil position to screen position.
This kind of camera based approach is probably the most common method in eyetracking, though there are other methodologies as well.

Electrooculography is similar to EEG work, in that it measures the place and displacement of the eye by measuring the electrical potential between the front and back of the retina, which can then be mapped to movement and direction. I think the visual system works a bit better, but the fact that it can read eye movement behind closed eyes is pretty handy, depending on the task at hand.
So, the stuff with Eye Writer and TEMPT is incredible, but that's actually a fantastic example of direct interaction using a different, computer vision based method. My curiosity in eye tracking as a means of ambient interaction stems from the way we scan a room, objects, pages, etc. We use eye tracking to understand how users see a page, and then optimize or critique based on that kind of information. What I'm interested in investigating is using this kind of gaze tracking for embed systems and making unobstrusive physical and digital objects.

An idea I've been sketching out and kinda imagining right now, for example, sees unobtrusive low light digital objects, say an array of LEDs behind a diffuser, glowing dimly, and then activating in response to gaze being drawn to them.
My imagined use would be for language learning. Imagine a dimly displayed Japanese character, almost like a highlight on the wall. When you gaze is drawn to it, it lights up, prompting you to say what it means. If you get it right, it fades: the object disappearing. Only to reappear elsewhere in the house. They'd be so cheap you'd have a few, and they'd just naturally talk to each other without any set up. They might be an entirely transparent piece of plastic that nonetheless is connected wirelessly to a database of words and a profile keeping track of your score and progress in language learning.
The mechanism might simply be an infrared LED and a photosensor trigger which activates when it catches the reflect from your pupils, it wouldn't need to be a camera. Simple, and certainly not a replacement for study, use and rote knowledge, nut it would help you learn a language and help you maintain that language once its learned: something I've felt personally in losing some of my language skills. And all made possible because the eye tracking becomes an ambient enabler of direct interaction.
There's a bit of a cautionary tale here too, by the way, in that this kind of thing can be just as easy applied to things like advertising. In a world where we're forced in to captive audience status, who knows where we'll be forced to look.

We've gone through a series of different interaction methods and their enabling technologies, with both the more developed direct interactive technologies and the emerging ambient interactive technologies addressed. I really want you to go out and experiment with those. Buy an arduino, learn a bit of processing, cobble together projects from pieces of code others have written, and just get stuff out there as quickly as possible.

I want to close with something I mentioned briefly, that of convention, learned interactions, and the so called "Natural User Interface."
This idea of learning and convention is central to how we understand interface and direct interaction, and I think flies in the face of the idea of "natural user interfaces" described by the NUI group and others. Their core notion behind this is, to quote their site:
[the Natural User Interface as ]an emerging paradigm shift in man machine interaction of computer interfaces to refer to a user interface that is effectively invisible, or becomes invisible with successive learned interactions, to its users.
The word natural is used because most computer interfaces use artificial control devices whose operation has to be learned. A NUI relies on a user being able to carry out relatively natural motions, movements or gestures that they quickly discover control the computer application or manipulate the on-screen content.