Speech technology is more than just an interface

Turning dials is long past and throw that remote control out the window too. Speech technology may well be the interface of the future. Its rapid development raises questions which we as a society have hardly thought about. High time we did, says Serf Doesborgh. He is one of the authors of the report 'Hoor wie het zegt', which is being published in English today.

29 January 2021

Een vrouw spreekt tegen een spraakassistent. Coverfoto van het rapport 'Hoor wie het zegt' — Photo: Frank Duenzl/ANP

Typing, clicking and swiping. For years, we have had to learn the language of our computer, and now, for the first time, our computer is learning our language. That is a promise often heard from developers and providers of speech technology. The GPS system in your car, the voice assistant on your phone, the smart speaker in your house. These are all forms of technology that promise to interact with us in a more intuitive way: through speech. Developers, therefore, present speech technology as the interface of the future.

‘Speech technology turns our voice into a new possible data source for businesses'

How well speech technology understands what you say depends on how clearly you speak, ambient noise, your accent and the language you speak. In other words, the extent to which speech technology understands you is determined by how 'clearly' you articulate yourself, but also depends on how much data is available about a language, dialect or accent. Speech technology is therefore by no means accessible to everyone.

What voice technology does make accessible is our voice as a new data source for businesses. By making computers listen to us, we give them our voice. Voices contain a lot of information. In a telephone conversation you can quickly hear whether someone is a man or a woman, a child or an adult, cheerful or depressed, and whether they have been drinking or not. Moreover, through voice we also recognise with whom we are speaking.

Scientists and companies are working to extract this kind of information from our voice. At the moment, they are even trying to detect symptoms of COVID-19 in our voice. Because the disease affects the lungs and airways, changes in your voice can be captured and analysed. Our voice as a new data source poses privacy risks, and poses the question: what am I giving away when I say something

'In the US, children sometimes command their friends at school just like they command their smartspeaker at home'

A voice is more than a means of communication. It influences our feelings and behaviour. In an experiment with speech technology, Dutch senior citizens described having a smart phone as 'having a boyfriend in their house'. In America, where a quarter of the population already has a smartspeaker, some children at school appeared to command their classmates in the same way as they commanded their smartspeaker at home. A baby's first word was not mum or dad, but Alexa, the name of Amazon's voice assistant. The distinction between human and machine is becoming increasingly difficult to make. In 2018, Google already demonstrated its DUPLEX function where a voice assistant can make reservations at a restaurant. The assistant sounded so human that 'he' was indistinguishable from a human.

These examples raise ethical questions. What degree of attribution of humanity do we find acceptable or even desirable? And at what point does the confusion clearly go too far? How can speech technology be used with sufficient attention to important values such as inclusiveness, privacy, reliability and autonomy.

A voice for human and machine

In the report 'Look who's talking', the Rathenau Instituut calls for the start of an ethical dialogue on speech technology. Agreements must be made about the right to human contact and how to prevent speech technology from confusing people.

It is important that the government, businesses and citizens contribute together to develop speech technology that does not impoverish our society and social relations, but enriches them. Speech technology can take over tasks such as transcribing or translating conversations, answering general questions, or verifying a user with the help of voice analysis. It can also make the digital world more accessible to larger groups of people.

In doing so, we must remain mindful of social and ethical issues. Only then will we be able to give both technology and people a voice.

This column by Serf Doesborgh previously appeared on iBestuur.nl.

Speech technology is more than just an interface

A voice for human and machine

Related content:

Scrolling to the ballot box

From digital dependence to digital autonomy

Design of apps and online environments falls short

Inclusive online (summary)

Risks of generative AI necessitate restraint in use

Generative AI

Immersive technologies

What is the metaverse?