With the accuracy of voice recognition to enhance the new paradigm of human - computer interaction
According to the China speech industry association, the global smart voice industry reached us $6.12 billion in 2015, up 34.2 percent year on year, and the total market size is expected to exceed us $10 billion by 2017. Among them, China‘s smart voice market reached 4.03 billion yuan in 2015, up 41% year on year and higher than the global growth rate. In the next two years, the Chinese market is expected to maintain growth of around 60%, and the global market share will be further enhanced, indicating the recognition of the domestic voice market by statistical institutions.
In the last 20 years, the speech recognition technology has made significant progress, but the accuracy of the recognition has been hindering the further development of intelligent speech. With the improvement of accuracy, the application scope of speech recognition will be widened and voice interaction will gradually become possible.
Speech recognition is a cross discipline, also known as automatic speech recognition. The goal is to convert the lexical contents of human speech into computer-readable inputs such as keystrokes, binary codes, or sequences of characters. Unlike the speaker and the speaker, the latter tries to identify or identify the speaker rather than the words contained in it.
At the end of the twentieth century, speech recognition systems have been widely used in computer games and toys, control of different instruments, data collection and dictation. In the past two decades, thanks to the rapid development of artificial intelligence and machine learning, speech recognition technology has improved significantly, and voice control has become more practical and began to move from the laboratory to the market.
In the Internet trend report, the speech will be the new paradigm of human-computer interaction. Speech technology will liberate human hands and eyes, and users will visit at any time at a lower cost. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliance, communication, automobile electronics, medical care, home service, consumer electronics and so on. The future will be interactive mode with smart home, wearable device and robot, and voice will be the best interactive mode.
The intelligent voice technology involves multiple disciplines, has high technical barriers, development cycle is long, with large properties, such as only comprehensive strength outstanding manufacturers can stand out, so make oligopolistic market structure. Since apple developed its first intelligent voice assistant, Siri, in 2011, GuGe, Microsoft, amazon and Facebook have joined the camp, each of which is grafted onto a smart mobile device terminal.
But the accuracy of the recognition has been hindering the development of intelligent voice. Currently in practice, we see speech recognition in smart home areas, such as smart appliances or smart speakers. At this point, we need to consider the question of who is the executive order for smart appliances or smart speakers when multiple family members speak at the same time? How can they find their master‘s orders in so many voices? These are the problems that the current voice recognition needs to solve. After all, what we usually call speech recognition is more than just the recognition of speech content.
In this regard, Microsoft has recently made new progress. Microsoft speech team xue-dong huang introduction, "in October last year, after our transcription system error rate up to 5.9%, other researchers have conducted their research, adopt much more involved in transcription process, the error rate reduced to 5.1%. This is a new industry milestone that vastly exceeds the accuracy achieved last year."
According to xue-dong huang, from a research perspective, the significance is very significant, even 0.1% of the gap, in both the computation time and cost are huge: "do you know the gap of 0.1, 0.2, 0.3, how much time will reach, the error rate calculation should be in accordance with the relative error, relative error rate should be 5.9 to 5.9 13%, the relative error rate over 13%, have been statistically significance." To put it simply, the Microsoft voice team significantly reduces the error rate by improving the neural network acoustics and language model of Microsoft‘s speech recognition system.
With the improvement of accuracy, the application scope of speech recognition will be widened and voice interaction will gradually become possible. But in the process of speech recognition to update the iteration, the old and new coexistence phenomenon can be avoided, the initial chaos of the market in the blue ocean, only to see the development trend, can truly seize the opportunity, for a new development.