Technology that enables computers to recognise and interpret human speech and to speak it themselves - summarised as speech recognition, speech interpretation and speech synthesis (see chapter two in the report).
Nothing is more human than our speech. In our conversations, we express ourselves and develop customs. It is therefore important to get speech technology right. In addition, speech technology comes close to us: we install speech systems in our living rooms and in our offices. In the wrong hands, a speech computer is a surveillance tool that can unlock our secrets. You can even clone voices and put words in someone's mouth. Moreover, our autonomy is at stake. Speech technology is increasingly functioning as a guide that leads the user through the digital world. But this guide is made by companies who pursue their own interests, and these do not necessarily correspond to the interests and wishes of citizens.
The study therefore calls for the development of ethical speech technology that, among other things, is inclusive, respects our private lives and is offered on a healthy market. The study also calls for social dialogue and political debate. The emergence of speech technology raises questions that we must answer together. For example, do we want to be disciplined by a speech assistant? In the past, this question would have sounded fanciful, but today it is real. Computers have started to talk: time for a good conversation.
The market for voice technology is currently growing rapidly. For example, in 2018, 6% of Dutch households purchased a speaker that you can control with speech, and this percentage grew to 19% in 2019 (Multiscope, 2020). And in America and China, developments are going even faster (Kimmich, 2019). According to some analyses, the rise of smart speakers there seems to be even faster than at the time the rise of mobile phones - a device you can now increasingly control with your voice as well (Kinsella & Mutchler, 2018). (For the full source citation, see the publication).
With the breakthrough of immersive technologies, digital society is entering a new phase. The physical and digital worlds are becoming more intertwined than ever. This raises urgent social and political questions. The Rathenau Institute has therefore published a manifesto with ten design requirements for the digital society of tomorrow.
During Dutch Design Week we organised an online talk show: Enriching Reality: Designing human-centered AR, VR and Voice applications. During this talk show, coordinator Rinie van Est and researcher Jurriën Hamer received inspiring guests to discuss how AR, VR and Voice touch people's lives - and under what conditions they can enrich society.Watch the talk show on the website of the Dutch Design Week.
On 26 November 2020 (15.30-17.00 hrs), our annual Rathenau Live event took place. This year it was an online event entirely dedicated to Virtual Reality, Augmented Reality and Speech Technology. Together we discussed and experienced what these techniques do to our perception of ourselves, others and the world around us.
A computer system that can perform speech recognition, speech interpretation and/or speech synthesis is called a speech system. There are various types of speech systems available on the market. The most important is the speech assistant, a speech system that can usually perform a wide range of tasks. Well-known examples are Amazon's Alexa and Google's voice assistant. These assistants can be installed on all kinds of digital devices, such as a mobile phone, a desktop PC, or a smart speaker. The study also looks at other voice systems, such as transcription software and navigation systems.
Speech assistants are also called cognitive or virtual assistants. These digital systems can also perform tasks, and are usually able to interpret text. They do not have to be based on speech technology. In this exploration, we focus on systems equipped with speech technology. We will therefore use the term 'voice assistant'.