The Digital Human is one of my favourite podcasts – always thought provoking about an element of our digital lives. I’ve just listened to the episode on voice, and one of the areas it covered was how distribution of voice online is far less prevalent than video, pictures or text. I have been aware of this for a while, as my first love is radio and in the early days of the internet I imagined a brave new world of radio broadcasts free of the constraints of radio stations’ demands. This didn’t happen, and I think these are contributing factors.
1 – Music rights. When media distribution online was new, there was a brief period of a few years when it seemed music programs, with a presenter, would have a life online. The reality of this, though, was that the rights restrictions around music made this impossible for the average amateur to do. This is the easiest way to do radio – some songs I love, and some chat about it. And it was impossible. Since then, Spotify and similar services have arrived that allow for curation of songs, but not the addition of voice. I long for informed music selection, combined with knowledgeable comments from a presenter. This is still the preserve of radio, but it shouldn’t be.
2 – iTunes. iTunes is the easiest way for most people to subscribe to audio content. And it doesn’t make discovery easy.
3 – Emotion. The Digital Human episode talks rightly about the emotional element of voice. It hits deeper than the visual. As such, the emotional risk of listening is greater, and you need to trust the audio you’re about to listen to. You make yourself vulnerable to voice.
4 – Time. A picture can be viewed in a split second, scanning a page. A video can be fast forwarded through. Audio tools currently widely available have no equivalent way to quickly parse the content.
5 – People don’t like the sound of their own voices. Because voice is so intimate, and because we don’t often hear our own voices, the experience of having your voice recorded is deeply uncomfortable for a lot of people. It’s also harder to quickly approve and delete or edit than pictures, so recording people casually feels far more invasive.
6 – Editing and quality. A picture can be quickly cropped or enhanced. Text can be easily edited. But good audio is hard to put together. People rarely make coherent whole sentences off the cuff that other people would be happy to listen to in a passive way. Think of the free form podcasts of people chatting that you download once and never listen to again. Creating good audio requires a similar skill set to video (a point the programme makes), but for seemingly less ‘gain’ than you get by learning to edit video.
7 – Offices and public spaces. I can browse pictures, text and even some video in an open plan office. Voice is more invasive of the space. And you really have to trust you’re not about to blast something awful into a shared space.
Having said this, I have hope that at least some of these obstacles can be overcome and audio will play a greater role in our digital lives. I agree with a point made in the programme that audio has a great role as ‘background’ content – exactly how I use podcasts when walking or tidying the flat. I think this element is under exploited and is where a great ‘gap’ for audio exists online.