Projects | aviossite

Speech, natural language understanding, and dialog resources

The resources below have been collected by members of the AVIOS board for those who would like to build speech applications or learn more about speech recogntion. If you know of other resoure that should be included, please contact board member Marie Meteer. Note that inclusion is not an endorsement by AVOIS.

Speech Recognition in the browser

Three Strategies

Speech manually captured and processed in the browser (Firefox and Chrome, using pocketsphinx.js)
Speech automatically captured by the browser and processed in the cloud (Chrome WebSpeech API)
Speech manually captured by the browser and processed in the cloud (wit.ai, api.ai,IBM, Nuance NDEV)

pocketsphinx.js

local processing in the browser
based on CMU Pocketsphinx
available in browsers that support Web Audio
grammar only (JSGF format) no dictation
English and Mandarin

Web Speech API

WebSpeech API in Chrome, Since Chrome version 31
most Chrome platforms are supported now
dictation only (no grammars)
find out about support at Can I Use?

Some resources:

English and Mandarin

Manual Server-based Speech Recognition

“Manual” means that the developer is responsible for capturing the audio and sending it to a server for processing

To capture speech in the browser:

HTML5: (getUserMedia())
WebRTCis a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs.

Text-to-Speech (TTS)

TTS in the browser

SpeechSynthesis API
implemented in Chrome and Safari

Server-based TTS

Natural language understanding

wit.ai (Facebook)
api.ai
Linguasys
Wolfram Alpha
Language Understanding Intelligent Service (LUIS) Note: Private beta. By invitation only

Combinations

Speech recognition and natural language understanding
API.ai
wit.ai (Facebook) provides a Javascript library

Native (non-browser) Mobile Speech Recognition and TTS

Many of the browser-based tools described above also include versions for native OS’s

Dialog Processing

OpenDial

CMU Ravenclaw

Speech Recognition Development

Language Modeling Toolkits

SRI Toolkit

SRILM Toolkit
SRILM Tutorial
SRILM at Sixteen: Update and Outlook Stokcke, et. al

Open Source Speech Recognizers

KALDI
Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

CMU Sphinx
CMUSphinx represents over 20 years of CMU research, with state of art speech recognition algorithms for efficient speech recognition. CMUSphinx tools are designed specifically for low-resource platforms

Evaluting speech recognition

NIST SCLite
Asclite is multi-dimensional extension of the Dynamic Programming solution to Levenshtein Edit Distance calculations capable of evaluating STT and SASTT systems during periods of overlapping, simultaneous speech

The Applied Voice Input Output Society

Speech, natural language understanding, and dialog resources

Three Strategies

pocketsphinx.js

Web Speech API

Manual Server-based Speech Recognition

Text-to-Speech (TTS)

TTS in the browser

Server-based TTS

Natural language understanding

Dialog Processing

Speech Recognition Development

Language Modeling Toolkits

Open Source Speech Recognizers

Evaluting speech recognition

Other tools

The Applied Voice Input Output Society

Speech, natural language understanding, and dialog resources

​

Three Strategies

pocketsphinx.js

Web Speech API

​

Manual Server-based Speech Recognition

Text-to-Speech (TTS)

TTS in the browser

​

Server-based TTS

​

Natural language understanding

​

Dialog Processing

Speech Recognition Development

Language Modeling Toolkits

Open Source Speech Recognizers

Evaluting speech recognition

Other tools