Synchronized speech sounds and conversations

People learning to speak another language learn best by seeing someone speak, listening, imitating, correcting, and repeating.   Reading along with text helps a lot.

Simultaneous speech, where the player speaks along with a model, integrated with rhythm and realtime responses to the learner’s expressive messages, adds a new and critically important component to current language instruction.

Language educators agree that immediate feedback about students’ spoken language is an essential component of language instruction.  Currently, none of the interactive device provide students with immediate feedback about their spoken language. 

Ideally, language training delivered by computer should enable students to learn by listening, repeating, speaking, providing evaluative and corrective feedback, without an instructor being present.  A system that can train speech and then help students to use correct pronunciation in context would be very desirable. 

Speech technology should free instructors from most of the repetitive correction of students’ pronunciation, grammar, syntax, and choice of vocabulary.  Integrated into existing training systems, this feature could significantly increase the rate and efficiency of English, foreign language, literacy, and many other types of instruction.

It is obvious that students learning a new language need to practice verbal expression. Many diligent students of English have good reading and writing ability, but many have not acquired correct pronunciation of the English phonemes, and of combining the accurately in conversation. 

As a former teacher of English to Japanese students and business people here in Los Angeles and through my video courseware, I have seen this visual phonetic system proven over two decades of use to enable anyone who follows the methods to acquire and to deliver correct pronunciation of the sounds of English in context. 

Is there a Karaoke in the house?

What’s wrong with the f-word

Frequent use of the f-word, and others like it, is in public discourse everywhere: peppering our daily conversations, the dialogues on TV shows, the shouted lyrics on our music, and the scripts in our award-winning movies!  The f-word was even used, abbreviated, in a promotion campaign for a major brand of software.

Why is the f-word and others like it so popular?  The main reason is that the user releases emotions and frustrations by saying it.  It says “I have strong emotions of anger and frustration right now, and I am not afraid to show it!”  Since so many people are angry and frustrated, the word helps to express those feelings.  It is also a shortcut for other statements: ” I’m a bad boy/girl”, “This is a great experience,” “Go away, I don’t like you!”

But too frequent use of the f-word is a bad habit and I am saying it must be reduced.  Let me tell you why.  Frequent use degrades the power of the words to repel and shock, which is their purpose.  Frequent use adds sharp and ugly audio notes to the messages that are otherwise supposed to be conveyed, detracting from the chances they will be understood.  And frequent pubic use exposes young people to the words, well before they are of legal age and old enough to hear them.

Why not say it another way, that really says what you mean, and raise the tone of our conversation space?


Parts of Speech

To compose any original statement and say it well is not a simple matter.  Besides competence in the written grammar and vocabulary of a language, you need to deliver the speech parts of your expressive message: the voice’s sound, the words articulated and composed in a fluid and grammatical way, and so the intent, the meaning.

You need the right sounds of voice, at the right volume for that place, delivered with the right timing.  Almost all successful people, Actors-Zookeepers, are good at communicating with convincing speech.  Practice can make anyone a better communicator.

Starting a conversation

Starting a conversation will be more successful if you find a location that is not in a doorway with movement of people, and where noises and music are not very loud.

To begin a conversation with a person you do not know, you can start by saying “Hello, my name is,” then say your name and make eye contact.  If you wait a moment, in most cases a person will reply with the same.  Then it is your turn, so you should be ready with a simple comment about the location or event.   At such a moment, an introduction, someone may offer a handshake.  Use the fingers-down, business handshake with a new person, giving the hand a short but firm squeeze.  (Fingers-up handshakes can be used between friends or teammates. High-fives are for some kind of victory.)

In the U.S., distance from face-to-face should be around 18” if you are standing together.  If you are talking more softly and so leaning in together, it can be much closer.   You should lean in toward the other speakers to indicate you are in a conversation -and you are listening.

If you want someone to talk to you, you cannot be wearing earbuds and connected to a music player.  Earbuds are a signal that says ‘I am into my music here’.   If someone should signal that they want or need to talk to you, or if you want someone to talk to you, remove the earbuds.

So how much pronunciation emphasis (diction) is too much, or not enough?  I say you can never hear too much of the right sounds, but it is so easy to get too much of the wrong ones!  On the phone, missing phonemes is a frequent casualty of low fidelity in the audio waveform or headset.  If that happens, you should ask about the word you could not understand.

Since it is hard to make statements, it is equally hard to be a listener, to focus on hearing the speech against background noise, decoding input into expressive messages, then understanding the emotional implications.  By listening and replying with a statement that shows you understand, even if you don’t agree, a conversation can be extended beyond the opening moments.

To end a conversation you could lean back or in another direction, perhaps standing up or moving your body.  That cues the other speakers that you ending the conversation – Say something like it was good talking to you today! And move away.


Trending: voice and speech technologies ripe for crossover

As I see it, the next convergence of voice and speech technologies is at a tipping point: crossover into popular musical, educational, and entertainment applications.

Trending now:

  • Research on language and reading emphasizes “Mothers, read to your children,” and at the earliest age.
  • Listening to music on earphones is so prevalent in some places that user’s primary audio channel focus is electronic not ambient.
  • English Captioning for television programs: on or available for many/most shows now.
  • Captioning for multiple languages available for many television programs now.
  • Application users are more international now than ever before, speaking several major languages, but with speaking English well as a common aspiration.
  • Cars now have user-friendly-enough voice recognition systems for simple functions and a population of several million users.
  • Phones now have simple but user-friendly-enough voice recognition systems for simple functions, with a population of several million users.
  • Existing electronic dictionaries and phrase translators are in limited use in specialized areas such as military, medical and translation services.

Add these all up and you have a world that immersed in an interactive, electronic audio world, and is already talking and seeing through computers. So the training of speech follows.

This is an opportunity for speech technologies to cross into more popular cultural, musical and edutainment apps. For example, language within certain television shows can be translated in real-time. Technology can impact the communications of speakers of all languages when using the English language of popular culture and commerce.  These apps can be built for adults or children with a mashup of APIs from a variety of vendors who make it easy to leverage their own code bases in exchange for access to users. This makes it affordable for voice learning systems to complete development of Conversation Space Version 1.

The Voices of Space

The Voices of Space are nationally known and accomplished singers and musicians. They agreed to come into the studio try something new, to singsong the speech tracks. These are like rap, but slowed-down enough to follow. They recorded a few hours of simple phrases spoken, by singers, in a rhythmic speech that really expresses meaning and intention, in natural, regular cadences.

Lindsey Harper, Steve Aguilar, Dani Armstrong, Sid Sheres.

express and affirm ideas through speech repetition

All speech is an expressive message. And there are many type of expressive messages, here are just a few : Informational messages, corporate messages, esoteric messages, erotic messages, and mixed messages.

There is great enjoyment in interpersonal expression, including emotional expression, a very high-value type of expressive message. When a person learns the speech skills to express herself easily and forcefully, she is well on her way to getting personal satisfaction, and social and business success.

In the case of a Conversation space, the interactive user participates by speaking.  With speech repetition, the interactive user is learning how to say phrases like the model. By speaking the phrase, the interactive user is affirming the message and becoming more sympathetic to the meaning.  A Conversation Space also enables the interactive user to design and share an interactive and personalized message with other people.

So besides the value to the interactive user of speaking better English, the application can be harnessed to contain and deliver specific expressive messages.  That is what we are doing with our conversation spaces.  For example, hearing and saying a prayer out loud certainly has value to many people.  That is what we have in a Prayer Space.

Corporations certainly have proven their interest in having the public repeating their product slogans and singing their product jingles. Patriotic songs and spoken oaths have always been a big factor in promoting patriotism or nationalism.  While our content is limited to a few hundred phrases right now,  there is a lot of potential for custom vocabularies, whether created by our company, Voice Learning Systems, or in the future contributed by our users and customers.




The Voice Reading Ability Drill

Chatterbox was a friendly name for the Voice Reading Ability Drill, and the name of the character speaking.

If you had worked with as many diligent but dyslexic teens as I have, you too would have invented the VRAD!

When I was a special ed teacher, I taught people of all ages and exceptionalities.  In response to a need for more speech and reading practice for my dyslexic teens at Long’s Peak JHS in Colorado, I began a project to develop a speech interactive application enabling vocalizing rehearsal in decoding words. At that time we built Chatterbox, the first speech-interactive educational software product. It actually prompted to user to read and speak the words on screen, and assessed the replies. It even included student progress tracking!

It contained a vocabulary of 800 words organized in consonant-vowel-consonant combinations, along with images. Advanced modules taught consonant blends and sight words.  We contracted with students at Denver’s Colorado Institute of Art, students in the first classes devoted to the new field of computer graphics. They created 1000 Apple-friendly images for Voice Reading and Voice Math.

It included the Voice Reading Ability Drill for dyslexics, Voice Math, and the Chatterbox Dictionary, running on the Echo and the Apple IIe. The Echo content had to be custom recorded since speech synthesis was n it’s infancy, not to mention speech recognition.  CASP by RJ Cooper in 1985 was definitely the first phonetic recognizer / pong game that was ever programmed and sold!

Chatterbox was a featured software application at Apple’s booth at many educational trade shows in 1986 and ’87. And several excited young graduate students did masters research about then emerging field of adaptive technology, championed by the Council for Exceptional Children. AFB, and other forward-thinking educational technology advocates. With their feedback and encouragement, we developed Voice English, and then Hummingbird Speech Method.  You can view historical videos of the 1986 Chatterbox and 1992 Hummingbird on our site.

The software, coded by Jay A Miller, then of TRW, to run on 64K of Apple memory, on the Echo speech synthesizer and the Voice Input Module.

At the time, at least two Masters theses included consideration and assessments of the system effects on bilingual and handicapped learners.

Active reading is also known to linguists as neural processing, where the body (voice and mouth) and brain are engaging in a sensory feedback loop which better imprints word and sound patterns and links them to meaning.  So every word or phrase is simultaneously read, spoken and heard.  This is the same technology we are enhancing for connected speech, seeing and saying is a proven technique for beginning English readers.

Conversation Space layers

Some of the spaces: lyrical, poetic, spiritual,

Visual Arts: The pages I imagine are built with fourth dimensional audiovision, which is what I call our means of painting the space frames with imaginative perspective.

Our visual layers are ripe for expression and content expansion .  Many layers can be added to enhance the spaces.  (Some conversation visuals have green screens to provide richer options for the visual dimension.)

We have seeded our library with a few Masters portraits and some other amazing mystical artworks.  I would love to have a LOT more visuals going on here.

The MUSIC: Voices of Space: Steve Aguilar is the lead keyboard and supports our amazing spokesmodels.   Voices of Space, who are Dani Armstrong and Lindsey Harper.   VLS also has access to record conversations in synch with libraries of original music by Larry Russell of Muzikjakit in NYC.  A lot of popular music that contains clearly spoken English phrases will work.  (Think of that popular musical refrain, ‘I DON’T CARE!’).

Musics forms the platform for personal prayers, lyrics, and mantras too. And saying ‘I DO CARE!’, we have Sada Sat Khalsa who sings mantras at her Yoga Borgo Ashram in Italy.  Her beautiful mantras were recorded and found on Spirit Voyage Radio, we are integrated them with the Mantra Space lyrics.  (Shout-out: I need your help connecting this, Ellen from NYU Uptown /  Sada Sat ( *~*)

Put all together, the possibilities for Conversations Spaces are Infinite.

language basis for speech I/O

Conversation Spaces are audiovisual layers of communication.

Beneath the live models, and supporting images, are the Libraries of short, spoken phrases  The phrase type can be statements, prompts, and replies. When phrases are presented as prompts or replies, they become conversations.

Rules of reply: Linguistics forms the basis for connecting phrases, in any language.

We have ‘chategorized’ phrases based on their places in a conversations. These communication roles for phrases are cross-referenced by a software engine featuring a Statement Response Hierarchy, plus rules of reply, which generate some other phrases and responses.    So the man-machine interactivity comes from the exchange of phrases in various contexts, of various communication types. We don’t really even have to do speech recognition, but we can.  The interactive part is technically not that hard, as long as the algorithm can know what to expect.

In Interplay space we are categorizing phases as assertive or receptive.  And you have subtypes, assertive / greeting or receptive / apology.  (Hey NY, we also have assertive/pushy! )

It’s amazing, there are so many combinations of phrases.  And the music makes it fun to speak along.

express yourself

So we are creating new spaces for conversational speech on the web. It’s about communicating a message using spoken words.

Exercise of the speaking experience is the reason for the spaces. Think of it like speech aerobics . Speaking in sync with the models provides the practice.

The purpose is to provide interactive training for American speech, with a focus on active expression of phonetic definition, rhythm,  tone.  We also organize and present spoken messages by message purpose, such as Greet, Praise, Affirm, Complain, Admire, Persuade, Reject, Inform.

Besides this being a ton of fun, it can be the key to speech confidence for people.  Their “voice impression” will be greatly enhanced, and of course their speech will be even more effective in person.

If you can hear and say it with confidence, emphasis, and conviction, you can make it true. Then  you can be strong enough to communicate.