As I see it, the next convergence of voice and speech technologies is at a tipping point: crossover into popular musical, educational, and entertainment applications.
- Research on language and reading emphasizes “Mothers, read to your children,” and at the earliest age.
- Listening to music on earphones is so prevalent in some places that user’s primary audio channel focus is electronic not ambient.
- English Captioning for television programs: on or available for many/most shows now.
- Captioning for multiple languages available for many television programs now.
- Application users are more international now than ever before, speaking several major languages, but with speaking English well as a common aspiration.
- Cars now have user-friendly-enough voice recognition systems for simple functions and a population of several million users.
- Phones now have simple but user-friendly-enough voice recognition systems for simple functions, with a population of several million users.
- Existing electronic dictionaries and phrase translators are in limited use in specialized areas such as military, medical and translation services.
Add these all up and you have a world that immersed in an interactive, electronic audio world, and is already talking and seeing through computers. So the training of speech follows.
This is an opportunity for speech technologies to cross into more popular cultural, musical and edutainment apps. For example, language within certain television shows can be translated in real-time. Technology can impact the communications of speakers of all languages when using the English language of popular culture and commerce. These apps can be built for adults or children with a mashup of APIs from a variety of vendors who make it easy to leverage their own code bases in exchange for access to users. This makes it affordable for voice learning systems to complete development of Conversation Space Version 1.