Voice is not a new concept. It’s been around for decades and while it’s making its second trip around the sun, evaluating and analyzing how far it has come is truly a fascinating endeavor. For all of the super-hip, tech-savvy-millenials, you might as well put on a poodle skirt for this.
1950’s to the 1960’s
In the history of speech recognition technology, this was the era of ‘baby talk’; only numbers and digits could be comprehended. In 1952, ‘Audrey’ was invented by Bell Laboratories which could only understand numbers. But in 1962, the ‘shoebox’ technology was able to understand 16 words in English. Later, voice recognition was enhanced to comprehend 9 consonants and 4 vowels.
The U.S. Department of Defense contributed heavily towards the development speech recognition systems and, from 1971 to 1976, it funded the DARPA SUR (Speech Understanding Research) program. As a result, ‘Harpy’ was developed by Carnegie Mellon which had the ability to comprehend 1011 words. It employed a more efficient system of searching for logical sentences.
There were also parallel advancements in the technology such as the development of a device by Bell Laboratories that could understand more than one person’s voice.
A major breakthrough was the development of the hidden Markov model which used statistics to determine the probability of a word originating from an unknown sound. It did not rely on speech patterns or fixed templates. Many of these programs made their way into industries and business applications.
A doll was also made for children in 1987; it was known as ‘Julie’ and it could be trained by children to respond to their speech. But speech recognition systems of the 80s had one flaw: you had to take a break between each spoken word.
With the introduction of faster microprocessors, speech software became feasible. In 1990, the company Dragon released ‘Dragon Dictate’ which was the world’s first speech recognition software for consumers. In 1997, they improved it and developed ‘Dragon NaturallySpeaking’; you could speak 100 words in a minute.
In 1996, the first voice activated portal (VAL) was made by BellSouth. However, this system was inaccurate and still is a nuisance for many people.
By 2001, speech recognition development had hit a plateau, until Google came along. Google invented an application called ‘Google Voice Search’ for iPhones which utilized data centers to compute the enormous amount of data analysis needed for matching user queries with actual examples of human speech.
In 2010, Google introduced personalized recognition on Android devices which would record different users’ voice queries to develop an enhanced speech model. It consists of 230 billion English words. Eventually, Apple’s Siri was invented which relied on cloud computing as well, and the result is that you have a personal assistant who is not only intelligent, but funny and witty too.
Looking back on the development of speech recognition technology is like watching a child grow up, progressing from the baby-talk level of recognizing single syllables, to building a vocabulary of thousands of words, to answering questions with quick, witty replies, as Apple’s supersmart virtual assistant Siri does.