Tackling tech-created communication barriers.
In online dating, where relationships start — and are often largely conducted — via text message, the emotional stakes are high. Communicating via text often increases the anxiety and misunderstanding since both parties are missing essential information. To combat this, mobile app Crushh analyzes your text messages to see how much the person you are talking to actually likes you.
In text-based conversations, we lose the cues that facial expression, body language, and tone of voice give us about who people are and what they really mean, as opposed to what they say. The conversational etiquette is ambiguous. You may expect an instant answer to every message while your correspondent only checks messages a few times a day. Online dating is just one example of how technology can create as many communication barriers as it removes.
In text-based conversations, we lose the cues that facial expression, body language, and tone of voice give us about who people are and what they really mean, as opposed to what they say.
With the advent of bots and voice interfaces we are starting to have conversations not just via technology, but with technology. Similar issues will arise in these conversations. Alexa can tell a joke but doesn’t know whether you laughed at it. Siri can understand your question but not hear the frustration in your voice. Technologists are now starting to teach machines to detect — and express — those missing social cues.
Mark Curtis is the co-founder of service design agency Fjord whose services include chatbots and voice control. Curtis had been thinking about how to improve the conversations we have with machines when he noticed something about his teenage daughter: she didn’t seem to understand the conversational etiquette of phone calls.
“She actually doesn’t have the same set of assumptions about how to answer the phone or how to begin a phone call,” said Curtis. “That got me thinking if something fundamental is happening to the structure of the way in which we have conversations through technology,“ he says.
For most of human history, technology didn’t affect conversation at all. Then letters made asynchronous conversation possible for the first time and recorded those conversations permanently. Then the telephone introduced long-distance, synchronous conversations.
Now most of us use multiple methods of online communication, each of which has its own cadence and demands its own etiquette. That etiquette is often ambiguous: partly dictated by the expectations we bring to the conversation and partly by the communication technology itself.
An email, like a letter, has a clear beginning and end but it’s not always obvious when an instant messaging conversation ends. “I see WhatsApp as being largely synchronous, and my wife clearly sees it as asynchronous,” said Curtis.“That causes trouble.”
As we start to have longer conversations with software like Amazon’s voice-activated assistant Alexa (Amazon offers a $1 million Alexa prize for any bot that can carry on a convincing conversation for 20 minutes), it will need to follow basic rules of etiquette and learn to adapt to the expectations of its interlocutor.
“All of that is a design issue,” says Curtis. “Success or failure could lie in the way in which you think about that etiquette.”
Things we don’t say
Much of our communication doesn’t even involve words. Paralanguage is what linguists call the non-verbal information conveyed by the volume, tempo, and tone of your voice or body language like posture, eye contact, and gestures. Paralanguage reveals much about the emotion and intention of a speaker or listener and can either reinforce or contradict the spoken words. A meaningful look or a sarcastic tone of voice can completely reverse the meaning of what is being said.
Like our spoken language, paralanguage is learned — so it varies widely between cultures. For example, “high-context” cultures built on close relationships and strict social hierarchies rely much more heavily on nonverbal communication than relatively low-context cultures like the US.
Most of these nuances are completely lost in text-based communication. Emojis, GIFs, and even creative uses of punctuation can be seen as imperfect attempts to re-introduce paralanguage into our electronic conversations.
Amazon has recognized the importance of paralanguage in humanizing Alexa by allowing developers to use Speech Synthesis Markup Language (SSML) tags to change the rate, pitch, and volume of Alexa’s speaking voice.
Our use of paralanguage is one of the main ways we express personality and emotion. Alexa has a personality team which scripts every possible conversation with Alexa to ensure that it reflects her required personality traits. Amazon designed Alexa to be smart, approachable, humble, enthusiastic, helpful, and friendly — but she is no pushover. She goes into disengagement mode and refuses to respond when insulted or asked inappropriate questions.
Alexa cannot yet detect the paralanguage used by her interlocutors, although Amazon is already working on software to detect emotion from the human voice. An article in Refinery29 pointed out that if a seven-year-old girl tells Alexa she’s pretty the meaning is different from when an adult man says the same thing in a leering tone. Alexa’s knowledge already varies between countries, but since paralanguage varies between cultures Alexa will eventually also have to learn those cultural differences.
Bringing back body language
Body language is another casualty of text-based conversations. “Body language is important because it’s really rich,” says Curtis. “The range of body language my dogs use to communicate is exceptional and they understand a lot of our body language. That’s why we think they’re so empathetic.” Curtis expects to see an explosion of cameras in our homes partly so that services like Alexa can analyze our body language.
Paul Kruszewski is the CEO of wrnch, a startup which detects a human’s position in 3D space from video footage. The software identifies 63 points of interest such eyes, ears, noses, and joints and uses them to create a skeleton tracking each visible human. Using this information, wrnchAI’s deep learning engine can be used to train software to understand gestures, activities, and non-verbal cues or to capture a real human’s pose and project it into a VR world.
Edward doing human pose estimation shuffle at 13FPS+ on a Samsung S8 https://t.co/Myqsa8eOdk
Wrnch is already working on detecting non-verbal communication for autonomous vehicles. “A very concrete intention use case is knowing when someone is going to cross the street,” says Kruszewski. “Is that person going to step across? Is she going to cross the street, even though it’s a red light?”
A demo of Nvidia’s Isaac robot used wrnch’s software to detect which humans in a room are paying attention to him and to return that attention. If someone turns his back on Isaac, the robot will also ignore that person.
“I think understanding people’s moods, and understanding how people feel is going to work its way into Alexa and Siri,” says Kruszewski. “Alexa will have eyes. It’s going to read your posture is not right and then you can correlate the data: the person looks sad, is in a sad posture, and is sounding sad.”
Kruszewski thinks that robots will eventually use similar techniques to recognize social dynamics using non-verbal cues. “When you see two people, you can get a vibe if that’s a happy conversation or it’s a bad conversation,” he says. “I think it only makes sense that the robots that are going to be wandering around will do the same.”
Change is the only constant, so individuals, institutions, and businesses must be Built to Adapt. At Pivotal, we believe change should be expected, embraced and incorporated continuously through development and innovation, because good software is never finished.