Vocal presence

19 May 2026. Published by Benoît Labourdette.

8 min

Making a phone call, leaving a voice message, joining a video call, listening to a podcast, watching a live stream, talking with a voice AI: six ways of being present through the voice at a distance. I propose a typology of these regimes, distinguishing in each of them everyday use from artistic creation.

An experience that classical phenomenology has not really thought through

Calling someone on the phone is an experience that most humans alive today have had thousands of times, and yet few philosophical texts give an account of what happens in it. Classical phenomenology has thought through perception, the embodied encounter, the face-to-face, and has not really thought through disembodied listening. Levinas centred ethics on the face of the other, Merleau-Ponty thought through the body as perceived and perceiving ; the voice heard at a distance, without the body that emits it, remains a philosophical object that has been little worked on.

This gap seems to me uncomfortable now that our lives unfold in large part through vocal devices at a distance. Phone calls, voice messages, video conferences, podcasts, live streams, conversations with a voice AI, these forms coexist, combine, and shape a significant part of what we today call being in connection. They are not equivalent, each one produces a particular quality of presence that deserves to be analysed for itself.

Talking on the phone without seeing each other

On the phone, two people talk at the same time without seeing each other. The channel of communication is one narrow band, the voice, and everything else, expressions, postures, gazes, environment, has to be inferred, imagined, or explicitly recounted.

This restriction produces, paradoxically, a particular quality of attention. When one cannot be distracted by the other’s face, one hears their voice better, one perceives the modulations, the hesitations, the breaths, the sounds around them. One builds an auditory presence that can be more precise, in certain dimensions, than face-to-face presence. Someone walking, sitting down, drinking, crying without saying so, signals through their voice and the surrounding sounds information one might not have picked up otherwise.

This mode also produces a temporal asymmetry. The bare voice, without image, does not sustain long pauses well. A silence on the phone is more loaded than a silence face-to-face, because one does not know whether the other is thinking, is uncomfortable, has gone away, or has had a problem. The phone therefore demands a more continuous speech, or else an explicit naming of the silences (« I’m thinking », « I’m listening », « sorry, I was distracted »), and this need for explicit naming produces, in turn, speech that is more aware of itself than in face-to-face exchange.

The phone is therefore a device that deepens listening and demands awareness of the voice as an act. It can produce an intimacy that exceeds that of the embodied encounter, on condition that both persons accept this particular regime.

Recording a message that one can rework before sending it

The voice message was democratised by messaging applications in the early 2010s. It combines the expressive quality of the voice with the asynchrony of the written message, one records, the other listens whenever they want.

This regime produces a particular posture of utterance. The person recording is addressing someone who is not there and who will listen at a moment they do not control. This absence of the addressee at the moment of utterance modifies the speech, which becomes more monological, more constructed, more composed. One can listen to oneself before sending, start over if one does not find oneself right, delete and rephrase. Asynchronous voice speech is a speech that can be edited like a text, while keeping the flesh of the timbre.

This device serves well to say things that take time. A difficult apology, a declaration, a delicate remark, can be conveyed by voice message better than by phone, because the person who receives it can listen, listen again, and take the time to reply. Speech gains in precision without losing in flesh, but it loses the immediate adjustment to the other’s reaction.

When the image degrades the quality of listening

Video conferencing adds the image to the sound and is generally presented as an improvement over the phone. One sees in addition to hearing, so one would be more in connection. This presentation is, in my view, misleading.

The image on a video call is framed, partial, often shown from the waist up, sometimes at face height. It does not give access to the full bodily presence of the other, but to a flat representation, behind a camera whose position and lighting one does not control. Gazes cannot really meet, because looking the other in the eye requires looking at the camera, and therefore no longer seeing their eyes. This structural asymmetry of gazes produces, in the long run, a particular fatigue, which psychology named Zoom fatigue after the 2020 lockdown.

The video call combines a degraded visual presence and an auditory presence that is also not full, because awareness of the image distracts from the voice. One looks partly at the other, partly at oneself in the small frame, partly at the environment visible behind the other, and attention fragments. One is paradoxically less present to the voice than one would be on the phone.

This observation does not disqualify the use of video calls, which is useful in many professional contexts and maintains connections at a distance ; it simply invites us not to consider it as a mere improvement over the phone. It is another regime, which produces another quality of presence, which is better analysed for what it is rather than compared to an embodied presence whose place it does not hold.

A voice that enters the ear like that of someone close

The podcast, which experienced a massive expansion in the 2010s and 2020s, offers yet another regime. A voice that speaks, without an interlocutor present, is listened to by a variable number of people at different times. The relation between voice and listener is radically asymmetrical.

This asymmetry nonetheless produces a particular intimacy. The listener hears the podcast in intimate contexts, earbuds in their ears, on the underground, while cooking, while walking. The voice enters their ear as if it were that of a very close person. The podcaster, for their part, often speaks to a single imagined person, not to a crowd, and this address to a single person, multiplied by thousands of listeners at the same time, produces the illusion of a direct bond between the voice and each one of them.

This illusion has powerful effects. It allows podcasters to build, with their regular listeners, a sense of closeness that exceeds that of reading a book, because the voice carries the flesh that writing does not carry. It also allows for drift, when this simulated intimacy is exploited to sell, to indoctrinate, or to build captive communities around a charismatic voice. The podcast can be, at this point, a political device comparable to what the radio was in the twentieth century, but individualised and deterritorialised.

A voice that reads in real time what is written back to it

More recent still is live streaming, on platforms such as Twitch, Instagram Live or TikTok Live, where a person speaks in video or audio before an audience that can react with written messages. The conversation becomes radically asymmetrical, on one side an embodied voice, on the other fragments of text scrolling by.

This device has something unprecedented in the history of human communication. For the first time, a voice that speaks can read in real time what its listeners reply and adjust its speech accordingly. Vocal speech becomes interactive, but the interaction takes place on two incommensurable registers, the voice on one side, writing on the other. The streamer can reply to a comment by speaking, or ignore the mass of comments by focusing on a few.

This asymmetry produces a very particular regime of presence. The listeners are at the same time individuated, each one can write and be read, and anonymous, the mass of messages erases each one. The streamer is at the same time close, their voice arrives in each one’s ear, and distant, they do not really know those who are listening. A public intimacy is built, which has no real equivalent in earlier forms of communication.

This device has generated, in a few years, an attention economy whose mechanisms are still poorly understood. It has also produced new subjectivities. Those who stream live several hours a day before an audience they cannot see develop a vocal presence that demands specific skills, which have so far been little theorised.

Talking with a voice that has no subject behind it

Since 2023-2024, artificial intelligences capable of holding a vocal conversation in real time have been available to a wide public. ChatGPT in voice mode, Claude voice, Gemini Live and their equivalents open up a regime that is unprecedented in human history, which is talking aloud with an entity that is not a person, that can modulate its tone, hesitate, laugh, mark silences, but possesses no subjectivity.

This regime would deserve its own analysis, which I do not carry out fully here. I note two salient facts. First, the experience of talking vocally with an AI is very different from that of writing to it ; the voice immediately produces the illusion of a subjective presence that resists the knowledge one has of the absence of that subjectivity. Second, this illusion is useful for certain uses, dangerous for others, and its ethical framing largely remains to be built.

The concept of vocal presence finds in this regime its most radical formulation. The voice produces a presence, independently of the actual presence of someone behind it. This dissociation between voice and subject, which until very recently existed only in marginal devices, is becoming an everyday experience for hundreds of millions of people in a few years.

Artistic intention within each of these regimes

The six regimes I have just described are first of all ordinary uses, calling a relative, leaving a message for a colleague, having a professional video call, listening to a podcast on the underground, watching a live streamer in the evening, asking a voice AI what one might cook. These everyday uses engage the voice in order to make a connection, to convey a piece of information, or to organise an action.

But each of these regimes can be inhabited by an artistic intention, which changes its scope, and the stakes are not the same depending on whether one is in an everyday use or in a creation.

On the phone, everyday use largely dominates. The telephone voice has nonetheless also been a medium of creation, from telephone performances of the 1970s to the contemporary works of Sophie Calle, where the call device itself becomes the material of the work. The recorded, edited and broadcast telephone conversation, as a sound piece, is a form in its own right.

For voice messages, the passage to the artistic is rarer. The format is still recent, and has not given rise to a tradition of creation comparable to that of the phone. Some performances use it as material, without a genre having yet emerged.

Video conferencing, on the other hand, experienced during the 2020 lockdown a flowering of artistic proposals, plays adapted to Zoom, multi-screen performances, choreographic experiments at a distance. This brief period showed that the video device could carry a creation, on condition that one took its own constraints seriously rather than trying to imitate the embodied stage.

The podcast is probably the regime in which the distinction between the everyday and the artistic is most visible and most developed. On one side, news podcasts, conversation podcasts, chronicle podcasts, which extend classical radio formats in everyday use. On the other, a tradition of sound creation, particularly lively in France, around structures such as ARTE Radio, France Culture, or publishers such as Binge, which produces audio fiction, creative documentaries, audio essays. This tradition belongs to the lineage of the great radio documentaries of the 1960s to the 2000s, those of Yann Paranthoën, of René Farabet, of the Atelier de création radiophonique, and finds in the podcast format new conditions of distribution.

Live streaming on Twitch or Instagram Live serves massively for everyday use (chroniclers, gamers, hosts who talk about their day), but it also hosts artistic proposals, live concerts, performances that play with the device, public readings. The two uses coexist in the same tools.

Voice AI has not yet given rise to a consolidated artistic tradition. A few isolated performances use conversation with an AI as the material of a piece, but we are in the early days of these devices. Forms are likely to emerge in the coming years, and artists are likely to take the synthetic voice seriously as a material, as the pioneers of electroacoustic music did with the recorded voice.

This distinction between everyday use and artistic creation is not a hierarchy. A phone call to a loved one can produce more intensity than an elaborate sound performance. But artistic intention changes what is at stake in the device. It opens an attention to form, to the choice of silences, to the composition of meaning, which transforms the use into a shared proposal. The two regimes feed each other. Artists who work with a vocal device make perceptible what is at stake in its ordinary use, and everyday practices provide artists with the material out of which they build their works.

Why this list of six regimes is not closed

These six regimes do not form an exhaustive list. One could add classical radio, audio recordings on cassette or CD, conversations in virtual reality that are beginning to develop. The typology remains open, its purpose is to structure thought, not to enclose experience.

What it allows us to see is that vocal presence is not a single thing, but a family of phenomena that share a few common properties and are distinguished by others. All these forms rest on the voice as the main vehicle of presence, and all operate at physical distance. But they differ along several lines that the typology has brought out, whether or not they are synchronous, whether the speech is ephemeral or recorded, whether it addresses one person or an audience, and whether or not adjustment is possible during the exchange.

I therefore propose the concept of regime of vocal presence to designate each of these particular configurations in which the voice carries a presence at a distance, under specific technical and social conditions. Each regime produces a quality of listening and a use of speech that are its own. Thinking these regimes for themselves is a way of giving ourselves the means for more conscious uses, and for more precise critiques when some regimes produce undesirable effects.

Choosing the right regime as an act of care

This typology has an immediate practical bearing.

It first invites us not to confuse the regimes. A couple’s difficulty is not resolved in the same register depending on whether one talks about it on the phone, on a video call or face-to-face. A professional meeting does not produce the same decisions depending on the device. A long voice message does not have the same effect as a text message. Choosing the right regime is part of the care one gives to the relationship.

It then invites us to recognise that modern Western culture has long disqualified the voice in favour of writing. Plato already, in the Phaedrus, had Socrates say that writing was a threat to memory and to living thought. Modern thought has largely reversed this hierarchy, writing has taken the lead as the medium of authority, of law, of knowledge, and the voice has been relegated to the register of the familiar, the fleeting, the less serious. This hierarchy no longer makes sense in a world in which the voice circulates through networks like writing, where it can be recorded, transcribed, archived, reprocessed.

It finally invites us to think politically about the devices that structure our vocal regimes. Who owns the podcast and live streaming platforms ? Which voices have access to them ? Which algorithmic rules amplify which discourses ? Who controls the voice AIs that millions of people are going to consult daily, and according to which criteria ? These questions are not technical, they are political questions that will partly decide what the common thought of the coming decades will be.

Recognising these differences between regimes of vocal presence, and practising each one for what it is, is now part of the work needed to critique those that degrade human bonds, and to inhabit the others with discernment.