Hannover, April 2, 1995
Spoken language is not only the most essential element of interpersonal communication, it is the basis of all scientific endeavor. Human beings think in terms of language. It is thus not surprising, particularly in the modern age of multimedia communication and information processing, that speech processing is playing an increasingly important role.Until recently, machines capable of engaging in human conversation were relegated to the realms of science fiction. But research in automatic speech recognition has advanced by leaps and bounds to the point where spoken words can now be reliably recognized even under poor acoustic conditions. And this can be done despite the uniqueness of each individual voice. Such speaker-independent recognition is crucial for the development of telephone information services capable of functioning without restriction to particular users.
While some telephone applications need only recognize a store of a few hundred words, systems involving direct user dialog with the computer must typically handle a vocabulary of several thousand. In addition, these words are spoken not as single self-contained units but rather in conversational strings. Speaker-independence, vocabulary range, and "robustness" of recognition are all requirements driving research to further improve recognition capability.
At Daimler-Benz, researchers have demonstrated the individual stages of word recognition. As is the case with human hearing, computer voice recognition initially begins by resolving oscillations into their individual frequencies and analyzing how they change throughout the course of a word. This process generates the spectrum of an individual word. This spectrum contains all the essential information present in the original signal in an extremely condensed form. The energy distribution of the different frequencies is graphically depicted in color on a personal computer. During voiced sounds, the bulk of the energy is concentrated in the low frequency range, while during unvoiced sounds the energy is concentrated in the high frequency range.
This spectrum is the basis of the pattern comparison that follows. The as yet unclassified spectrum is compared with model spectrums for all the words to be recognized, and any similarities are shown. The correct temporal assignment of the unclassified spectrum to the stored reference model is crucial to this process. The correspondence between the reference pattern and the unclassified pattern can be illustrated on the PC. The most similar reference pattern is the one then "recognized."
The multi-dimensional nature of this comparison process makes the overall recognition procedure extremely time-consuming. Conducting such processes in real time has only become possible in recent years thanks to advances in digital signal processing. This technology has made it possible to build voice recognition systems capable of carrying out the thousands of comparisons necessary to identify one word from a vocabulary of thousands.
Back to the Newsarchive
© 1995 Daimler-Benz