Voice Recognition first emerged in the early 1950s at Bell Labs. They developed the “Audrey” system which could understand single digit numbers spoken by a specific voice. All of the early VR systems had one major problem: they worked in principle but were hopeless in practice. They suffered from problems such as comprehension, noise interference and over-dependence on design around specific voice or two.
Currently Voice Recognition software can be quite frustrating. The likes of Xbox, Samsung Voice and Siri are currently some of the major voice recognition software in use. The trouble is that they often misunderstand user input and return an incorrect command. An example would be : “Call Tara”… “Did you mean Hill of Tara, the archaeological complex in County Meath, Ireland.
Johan Schalkwyk, an engineer at Google believes this is going to change in the next 1-2 years. He believes new machines will not only be able to properly comprehend human speech but also understand nuance and context. Siri engineers are currently working on technology that will engage in authentic conversation with the users and that this is the cutting edge of the research being carried. Tim Tuttle, who set up Expert Labs in Silicon Valley believes that speech recognition systems with human or better-than-human levels of accuracy will become commercialised in the not too distant future.
An example of where the technology is heading is that you are in a car with complex electronics on-board that can talk to your smartphone. You are driving towards belfast on the motorway and your fuel is running low. Your phone tells you you are low on fuel and you ask where the nearest station is. The phone knows your direction of travel through GPS and the surrounding location through maps. It may also know that you have a preference for a specific station and the VR software can react accordingly. Another example is you are sitting at home watching your TV and you want to know how to get to Ennis, you will be able to ask your TV and it will send the coordinates to the GPS system in your car. Basically as all our everyday devices become smarter and more interconnected through the net, the usability and effectiveness of voice recognition software will become more apparent. It may not quite be at the level of Jarvis from the Iron Man movies but it is certainly heading in that direction.
In cars and with Smart TVs the voice recognition interface is much more preferable than the traditional input devices. In cars this can be dangerous while driving and often the input system of TVs can be clunky and difficult to use. In the not too distant future I believe we will have smart houses which are operable in every room using VR technology. Simple things like turning on and off the lights, turning on the oven and changing the temperature in house will all be able to be done through voice commands.
Machine learning matched with VR software will allow for further growth in the future for the technology. The accuracy and usability of the software will improve over time and this is an area that engineers are putting a lot of effort into.
Voice Recognition is probably most famous from science fiction. In Star Trek : “open the pod doors” etc but we have eventually got to a technological point where that has become or at least is becoming a reality. A lot of experts in the field believe that VR will upend the current computer interfaces. In call centres high level VR is being used to handle customer requests and directing them to the right department. It has even come so far as being able to identify irate customers and hand them over to a human immediately. Nuance is a leader in voice recognition software worldwide and it’s CEO sees the technology growing fast in the coming future.
From a technological standpoint Google have changed their methods of voice recognition from an older method called feed-forward neural networks to the more effective recurrent neural networks. This allows the system to store more information and process longer sequences of input from the user. Their goal is to have the most elegant system as system complexity can hinder longevity and long-term growth of the technology.
Schalkwyk says that 2 years ago voice recognition software could understand 3 out of 4 words of user input, today it is 12 out of 13 words and in the future he predicts we will live in a world without keyboards. This is quite a statement and one that personally I feel is excessive. There are many advantages keyboards have over voice which will not change. A big one is privacy, if you are in work and complaining to your friend about your boss it is quite unlikely that you will want to broadcast it to the whole office. Also with text input the way in which people construct sentences is very different to how they do in common speech. I can see voice recognition becoming more commonplace due to the advances in technology and the ease-of-use factor but I think at best it will be seen in tandem with more traditional input methods. That’s not to say that the advances in this technology are not exciting and will not lead to a new way in which people use ICT.