New England Chapter
The Applied Voice I/O Society (AVIOS) is an international organization of speech technology professionals that seeks to build community and bridge the gap between basic research and applied product development.
Our meetings provide a forum in which technologists, students, developers, business managers and users can meet and exchange ideas and solutions.
10 September 2016 | Jibo: Getting Social Robotics Right
Presenter: Peter B. Krogh
In early 2014, Jibo Inc. was spun out of Cynthia Breazeal’s social robotics group at MIT. The “social robotics” market is still so nascent it’s fair to say it doesn’t really exist yet. The vision of bringing home a little robot buddy who gets to know you, help you, and possibly love you, is incredibly compelling for many people but the obstacles to getting it right are enormous. Problems to consider and solve include far-field speech recognition on a moving device, how to craft a character that includes memory, humor, and emotional states, how to launch skills without burdening the user with too many rules, and how to load and transition from one dialogue state to another. While not all of our solutions will be presented, we’ll cover most of the challenges required to bring a social robot to market.
3 May 2016 | Spoken Language Understanding for Amazon Echo
Presenter: Alborz Geramifard, Research Scientist, Amazon
Echo is high performance fully voice controlled wireless speaker created by Amazon that is designed for the home. It is very convenient, always plugged in and ready to use. The technology behind Echo lives on device as well as in the cloud and represents some of the best in Natural Language Processing (NLP) technologies today. Echo is extensible and supports a broad range of functionalities out-of-the-box, such as music, wikipedia, to-do and shopping lists, sports and weather information, generic question answering and more. It features high performance keyword spotting, automatic speech recognition, natural language understanding, question answering and text-to-speech. This talk provides an overview of speech and natural language processing activities at Amazon around Echo and describe some of the core technologies and research challenges our teams are facing.
Presenter: Julia Hirschberg, Columbia University
Host: Jim Glass and Victor Zue, MIT CSAIL
Clarification in Spoken Dialogue Systems such as in mobile applications often consists of simple requests to “Please repeat” or “Please rephrase” when the system fails to understand a word or phrase. However, human-human dialogues rarely include such questions. When humans ask for clarification of user input such as “I want to travel on XXX”, they typically use targeted clarification questions, such as “When do you want to travel?” However, systems frequently make mistakes when they try to behave more like humans, sometimes asking inappropriate clarification questions. We present research on more human-like clarification behavior based on a series of crowd-sourcing experiments whose results are implemented in a speech-to-speech translation system. We also describe strategies for detecting when our system has asked the ‘wrong’ question of a user, based upon features of the user’s response.
Presenter: Jordan Cohen
Speech Morphing (or Voice Morphing) is changing one person’s voice to sound like another person, or like something else. There are many issues to be dealt with, including the size and shape of the vocal apparatus, the pitch of his or her speech, the particular habits of the two speakers, the accents of the two people, and other linguistic elements.
Presenter: Dr. Daryush Mehta
Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, termed vocal hyperfunction. Thus an ongoing goal in clinical voice assessment is the long-term monitoring of noninvasively derived measures to track hyperfunction. In this talk, I will provide an update on our group’s development of a smartphone-based voice health monitor that records the high-bandwidth accelerometer signal from the neck skin above the collarbone. Data collection is under way from patients with vocal hyperfunction and matched-control subjects to create a dataset designed to identify the best set of diagnostic measures for hyperfunctional patterns of vocal behavior. Vocal status is tracked from neck acceleration using previously-developed vocal dose measures and novel model-based features of glottal airflow estimates. Clinically, the treatment of hyperfunctional disorders would be greatly enhanced by the ability to unobtrusively monitor and quantify detrimental behaviors and, ultimately, to provide real-time biofeedback that could facilitate healthier voice use.
Presenter: Mike Phillips
A review of current, real-world applications of voice and language technologies across healthcare and an exploration of how these technologies are being leveraged by clinical staff and healthcare provider organizations to contribute to improved quality of care and efficiency. The session will provide insight into future technologies that will be driven by voice and natural human interaction and will outline various use cases in healthcare as the voice interface becomes increasingly important to clinical/machine interactions.
Presenter: Mike Phillips
In just five years, the state-of-the-art of speech interfaces on mobile phones has gone from simple voice dialing, which nobody used, to Siri, which everyone is talking about. Vlingo has been one of the companies leading this transformation. In this presentation I will talk about the technologies which have made this happen as well as where we think things will be heading over the next few years.
Presenter: Steve Springer
Think about “automated phone systems” or “trying to phone up a human at a company these days” and the next thing to come to mind is probably not “great user experience.” But that’s what Nuance Communications’ Professional Services group is tasked with every day: designing successful computer-human interactions over the phone, with voice as (most often) the only medium. While these systems never lack for enumerated requirements from the system or corporate perspective, understanding the callers and their context is too often given short shrift. In this talk, Steve will illustrate how a variety of prior experiences affect callers’ expectations before they call, and consider some of the implications for voice-only interfaces. We’ll discuss how the first few seconds of a call greatly affect a caller’s willingness to listen, to act, and to react. And we’ll consider what may go on in the caller’s head after they hang up, and how that may inform their next calling experience.
Panelists: Jeff Adams, Evandro Gouvea, Mike Phillips
Moderator: Marie Meteer
The goal of the meeting is to hear what’s currently happening in consumer and mobile targeted speech applications from companies who are on the forefront of this market and to have a conversation about what the challenges are from the technology perspective.
Presenter: Daniel O’Sullivan
Daniel O’Sullivan: Using Adaptive Technology and Heuristic Models to personalize automated calls and improve the efficiency of Interactive Voice Response (IVR) systems
Dr. Wilde has applied a multi-disciplinary background to build conversational interfaces and assistive devices. She has a doctorate in Electrical Engineering and Computer Science from MIT. Dr. Wilde has taught at the MIT Media Laboratory and Boston University. She serves on the AVIOS executive board.
Marie Meteer - Associate Professor in Computer Science and Computational Linguistics, Brandeis University
Dr. Marie Meteer is Associate Professor in Computer Science and Computational Linguistics at Brandeis University, where she teaches courses such as speech recognition, mobile application development, and discourse modeling. Dr. Meteer is also an independent consultant with over 25 years experience and deep technical expertise in speech recognition, dialog design, and text analytics.