Blog
Dynamic time warping (DTW)-based word recognition
Discussion 3
Author
Institution
Date
Introduction
Words are essentially the most efficient, common and basic communication method for the interaction of individuals. The introduction of computers has increased the avenues for speech. Rather than, use common interfaces such as the mouse or keyboards, many people choose to incorporate speech or word recognition systems. These systems assist computers in the identification of the words that individuals speak into microphones, which are then converted into written text. This gives the computer the potential of becoming a crucial mode of interaction between computers and human beings. Spoken language dominates human communication. In this case, it is only natural for individuals to look for word interfaces that can recognize speech and speak in native language. This enables even common people to benefit from information technology. There are varied word recognition approaches that can be used.
Dynamic time warping (DTW)-based word recognition
Also simply known as DTW, the Dynamic Time Warping refers to an algorithm used in measuring the similarity between sequences that vary with time or speed. This approach was historically used for word recognition. However, it has been largely displaced by the HMM-based approach, which is considered more successful. The DTW based approach has been applied to graphics, audio, video and any other data that may be turned to linear representation.
Statistical-Based Approach
The approach involves statistical modeling of variations in speech using automatic procedures of learning. It represents the existing state-of-the-art approaches for word recognition. However, this approach is disadvantageous in that the statistical models must make assumptions, which can be inaccurate thereby hindering the performance of the system(Alex W & Kai-Fu L 1990).
Template-Based Approaches
These approaches come with numerous techniques that have led to the considerable advancement of the in the last two decades. These approaches involve matching unknown words then comparing them against prerecorded words or templates so as to come up with the best match. The advantage of the method is that it uses word models that are perfectly accurate. However, it comes with the limitation in that prerecorded templates are rigid, in which case speech variations would have to be modeled using numerous templates for every word. This approach increases in an also prohibitively impractical and expensive with increasing in vocabulary (Alex W & Kai-Fu L 1990).
Knowledge-Based Approaches
This approach involves hand-coding speech variations into a system. The approach uses a number of features of speech, and then the system is trained to generate some rules from the samples. The approach is advantageous in that it expertly models speech variations. However, it is limited in its practicality since it would be difficult to obtain such expert knowledge.
These approaches are com in handy for individuals with disabilities. Individuals who are deaf or have difficulty hearing can use the approaches for the automatic generation of closed conversation captions such as classroom lectures, conferences and religious services. They would also be useful for individuals facing difficulties in using their hands. These difficulties could range from involved disabilities that prohibit the use of conventional PC input devices to mild persistent stress injuries. Word recognition approaches are also utilized in deaf telephony including relay services, captioned telephone and voicemail to text.
Conclusion
Words or speech constitutes the most basic form of communication. However, there are instances where communication is hampered especially when the individuals involved have disabilities. This would necessitate the use of word recognition programs, which would transform the spoken words into written text thereby enhancing communication.
References
Alex Waibel, Kai-Fu Lee, 1990. Readings in speech recognition. New York: Morgan Kaufmann
(Alex W & Kai-Fu L 1990)