By Dr Norman Lewis, writer, speaker and consultant on innovation and technology, was most recently a Director at PriceWaterhouseCoopers, where he set up and led their crowdsourced innovation service. Prior to this he was the Director of Technology Research at Orange.
Projects to develop gender-neutral voice assistants to replace the historically dominant white and Chinese male voices are not solving a real problem – they are creating one.
As products like Alexa and Siri face mounting criticism that the technology behind them disproportionately misunderstands women and ethnic minorities, voice software companies are developing gender neutral voice systems to ensure that the voice tech industry becomes more inclusive – both when it listens and talks.
This might sound positive, but it is absurd in the extreme. What precisely is the problem these companies are trying to solve?
Is it the complexities of developing Voice Powered User Interfaces (VUIs) which could advance the humanization of computational power?
Or is the problem that the human voice is biologically gendered and culturally determined in which case the challenge is not reengineering VUIs but social and political reality?
This might come as a bit of a shock to the woke AI class: gender-neutral voice systems do not exist in the real world where most of us live. The human voice is gendered. This is not a choice or a prejudice. It is a biological reality. And yes, many voices have been marginalised in history, not just because of gender but because of race and culture. These are social, political and cultural problems that should be addressed but they cannot be solved through software.
Focus on solving the real problem – making AI understand us
Instead of virtue signalling, these software companies should be trying to solve the real problems that still inhibit the full potential of voice interfaces like background noise and regional accents.
Try speaking to Alexa or Siri with a cold.
The fact that computers are not on a par with humans in understanding the contextual relation of words and sentences is what truly causes misinterpretations of what the speaker meant to say or achieve – in whatever language or gendered-voice. Speech recognition systems lack the millennia of contextual experience.
Voice user interfaces should enhance us as human beings. Instead, many of us, on many occasions, are forced to slow down our speech, enunciate every vowel and consonant as if we are speaking to a hard-of-hearing foreigner with a rudimentary grasp of English, in order not to waste time repeating ourselves so as to accomplish what we could probably have done on a keyboard in half the time in the first place.
This is sadly just another example of how science is being forced into the service of the culture war. Ask Siri what ‘he’ thinks of the culture war. ‘He’ doesn’t have an answer for that. But ‘his’ bosses think they do.