
The Applied Voice Input Output Society

Our Proposed Solution
Section I: Our Proposed Solution
-
4-layer architecture
-
Function of each layer.
-
IVR Client
a.k.a. VoiceXML (or HTML or other channel) browser -
Application server (presentation layer):
Generates VoiceXML (or other channel).
This is channel-dependent. -
Dialog manager
a.k.a. Dialog management expert (cross-channel)
a.k.a. Interaction manager
Build service logic once, regardless of the channel.-
Simultaneous visual, voice, … (a.k.a. “multimodal”)
-
Has a channel independent dialog manager.
-
Can seamlessly switch between contexts.
-
This layer is
* Multimodal
* Channel-independent.
* Maintains context.
Channel examples: Voice w/IVR, web, human agent, online chat, email response, interactive IPTV, multimodal cell/VoIP phones.
-
-
Domain-specific manager
a.k.a. Domain-specific expert (diagnostics, etc.)
a.k.a. domain-specific management engine.
-
-
Why do we need 4 layers?
-
How this framework morphs to accommodate various AD methods.
-
Distribution of dialog responsibility across servers.
-
Illustrative examples (show architecture diagrams)
We show that these fit the 4-layer architecture.
These are only examples – there are more.-
Trivial static VoiceXML application.
-
Trivial app server with FSM generating VXML.
-
More comprehensive example like case-based reasoning
-
​
Section II: Integrating with existing standards
-
Mention existing standards: VoiceXML/V3, CCXML, SCXML, EMMA, MRCP, SRGS, SLAML, …
-
Dialog control distribution– AJAX or client-side scripting
-
Multimodality and channel independence
​
Section III: Open Questions
-
How do you integrate AD dialogs & VXML?
-
Is there a path forward for standards?
-
How do we handle a variety of AD methods?
-
How to make AD applications easier to build?
-
Performance issues (Jean-Francois)
-
Is dialog control on the left or right?
-
Active vs. passive?
-
How to build a PhD-free development tool
-
We still need to handle non-AD applications
-
Handles parallel, asynchronous dialog events.