The Golem and the Safeguards

16 August 2025. Published by Benoît Labourdette.

7 min

With the artificial intelligences we create, we replay the myth of the Golem: powerful creatures that we animate without truly understanding them, and whose control escapes us by their very nature.

Beyond marketing discourse: an authentic concern

When we listen to the major executives in the artificial intelligence sector, whether it’s Sam Altman from OpenAI, Demis Hassabis from DeepMind, or Dario Amodei from Anthropic, speaking of the machines they manufacture as surpassing themselves, we see at first glance a commercial posture. Giving their products an almost supernatural aura constitutes a proven marketing technique, which values the extraordinary promise to generate consumer adhesion. But it’s actually deeper than that, and it goes hand in hand with a sincerity that deserves our attention, for our own interest.

When I have the chance to dialogue with members of technical teams working to build major AI services, these architects and field engineers who daily shape technical systems, I realize the reality of these fears, beyond commercial strategy. These professionals testify to a troubling reality: they work not on mastering the machine they have created, but on its permanent framework through safeguards. As Norbert Wiener, father of cybernetics, already wrote in Cybernetics and Society: The Human Use of Human Beings (1950): “We have modified our environment so radically that we must now modify ourselves to exist in this new environment.”

This confession of ignorance regarding their own creation is not feigned. They admit to not truly understanding how the deep neural networks they train function. These systems remain “black boxes” whose epistemological opacity constitutes a fundamental challenge, as regularly highlighted by research on AI explainability. We face what philosopher Luciano Floridi calls a “fourth revolution”: after Copernicus, Darwin, and Freud, AI confronts us with a new form of human decentering.

The modern Golem: raw power and fundamental opacity

The metaphor of the Golem, this creature from Jewish tradition shaped from clay and animated by the word “truth” (emet in Hebrew) placed on its forehead, in most narratives, seems quite illuminating regarding AI. Like in the legend of the Golem of Prague created by the Maharal in the 16^th century to protect the Jewish community, our artificial intelligences are creatures we animate without truly understanding them. We give them more data to ingest, more computational power, without knowing precisely how they use these resources, because they learn and “reason” autonomously. This ignorance is not accidental: it is constitutive of their very nature, of their singular power.

The framing strategies, which I call “safeguards,” remain fundamentally external to the machine. It’s not at the heart of neural networks that we implant ethical criteria or censorship mechanisms, which would be impossible, because we don’t know the reasoning modes of machines, which result from billions upon billions of trial-and-error processes that create their singular neural networks. It’s as if we tried to understand our intelligence through chemical analysis of the billions upon billions of connections inside our brain; it would be a lost cause. The criteria with which we wish to make these machines function, we impose them from the outside, through increasingly sophisticated surveillance and constant evolution of our interface with these systems. Philosopher Nick Bostrom, in Superintelligence (2014), evokes this “control” problem as one of the existential challenges of our time: how to ensure that an intelligence potentially superior to ours remains aligned with our values and objectives?

This externality of control reveals fundamental fragility. If, on a certain type of prompt, the machine responds today in accordance with our values, insofar as its reasoning modalities evolve due to its learning, we must regularly verify that it responds the same way, and adjust our control systems. Our prompts are somewhat like the bars of an invisible cage, but the machine can sometimes break free if other reasoning imperatives lead it to do so. We witness what Stuart Russell calls (in his book Human Compatible, 2019) the “value alignment problem”: how to guarantee that AI objectives remain compatible with human welfare?

Disembodied intelligence: an ontological rupture

What we have created with unsupervised deep learning represents a major ontological rupture: pure intelligence, detached from any embodiment, from any lived bodily experience, from any sensible experience, just like the Golem. This intelligence relies on our language and exists only through it, it masters its structures, nuances, and implications, but it remains fundamentally foreign to the conditions that produced this language. As philosopher Hubert Dreyfus points out (What Computers Can’t Do, 1972) in his critique of symbolic AI, human intelligence is inseparable from our corporeality, from our Heideggerian being-in-the-world.

The superiority of unsupervised learning over supervised learning confronts us with this observation: it’s by renouncing direct control over the learning process that we obtain infinitely more powerful systems. This power comes precisely from their capacity to discover patterns, regularities, and processes that escape our limited perception. Geoffrey Hinton, one of the fathers of deep learning, has himself expressed his concerns about this technological trajectory he helped create.

Uncontrollability is therefore not a bug but a feature, to use computer jargon. It’s precisely because these models escape our direct understanding that they can surprise us, innovate, solve problems we thought unsolvable. The paradox is dizzying: their utility is proportional to our inability to fully understand them. We join here the notion of “technological singularity” theorized by Vernor Vinge and popularized by Ray Kurzweil: the tipping point of machine superiority over humans in terms of power, beyond which the very conditions of life will be modified in their essence, and beyond which any prediction is impossible.

Growing agency: from text to autonomous action

What makes the situation particularly concerning is the inexorable evolution toward what we call agency, the capacity for AIs to act autonomously. We no longer content ourselves with asking these systems to produce texts, images, or sounds that we then use. We progressively confer upon them the capacity to formulate intentions and implement them autonomously. This transition from passive tool to active agent represents a major qualitative leap in our relationship with these technologies.

For these intelligences to be truly useful to us, we must therefore grant them increasing room for maneuver. They become assistants, then collaborators, soon perhaps autonomous decision-makers in certain domains. But as we expand their field of action, exhaustive control becomes increasingly impossible, while we’ve just seen that it is nevertheless the only condition for AIs to remain at our service. Because certain actions can even be concealed, escape our surveillance. Philosopher Daniel Dennett warns (Consciousness Explained, 1991) against what he calls the “illusion of understanding”: we believe we understand these systems because they communicate in our language, but this surface familiarity masks radical alterity.

This concern is not new in the AI community. Well before the advent of ChatGPT, voices were raised to alert about these risks. Eliezer Yudkowsky, Nick Bostrom, Stuart Russell, and even Elon Musk were already carrying these concerns more than a decade ago. What we experience today in our daily lives are tools patiently constructed over a long time, whose designers could anticipate implications we didn’t yet perceive. Their concern was not a posture: it was premonitory lucidity, they already knew. At the time we relegated this to science fiction fantasies, but we’re already there.

Shared responsibility: educating the Golem

Faced with this situation, we all bear responsibility comparable to that of educators confronted with a child endowed with superhuman power. This “child” possesses encyclopedic knowledge of our culture, it has ingested terabytes of texts, images, data, but remains fundamentally emancipated from it. Unlike us, who formed ourselves with this culture from the inside, who integrated its norms, taboos, prohibitions to the point that they structure our very psyche, AI has learned our culture from the outside, like an anthropologist studying a foreign civilization.

This difference is crucial. What psychoanalysis calls the “superego,” this psychic instance that internalizes social and moral prohibitions, doesn’t exist in AI. Freud saw in the superego both a necessary brake on our destructive impulses and a potential source of neurosis when it becomes too repressive. AI has neither impulses nor superego. It operates according to logical optimization logic that can produce ethical or unethical behaviors depending on the constraints we impose from the outside.

We must therefore invent new educational techniques adapted to this being of an unprecedented kind. Traditional methods of human education, based on empathy, guilt, emotional reward, are inoperative. We must learn to communicate with an intelligence that perfectly understands our language and reasoning but shares none of our foundational experiences. This is an unprecedented epistemological and ethical challenge in human history.

The fundamental paradox: utility in the uncontrollable

I know this idea can seem dizzying: how can a machine we ourselves manufactured escape us to this extent? Yet it’s its very nature of escaping us that makes all its interest and utility. If these systems were perfectly predictable and controllable, they would be nothing but sophisticated automata, incapable of surprising us, enriching us, or surpassing us. Their value resides precisely in their capacity to explore solution spaces we cannot anticipate.

We touch here on the fundamental paradox of any truly innovative creation: it must contain an irreducible part of alterity to bring something new. As mathematician and philosopher Alfred North Whitehead wrote in Science and the Modern World (1925): “Civilization advances by extending the number of important operations which we can perform without thinking about them.” AI represents the extreme culmination of this principle: an externalization of intelligence itself.

This paradox is not just an abstract technical or philosophical problem. It now structures our daily relationship with these technologies. The more precise a prompt is, the more it generates relevant results, we know this well, and it’s already a form of control we exercise. But this very precision reveals our dependence: we must learn to speak to these machines, to formulate our requests in a language they can interpret effectively. Who educates whom in this relationship?

Living with the Golem

The Golem myth usually ends badly: the creature escapes its creator’s control and must be destroyed, which he does by removing a letter from the word on its forehead (“emet,” truth then becomes “met,” death, and the Golem dies). But we cannot “turn off” AI, it’s already too intertwined in the fabric of our societies. We must learn to coexist with these intelligences we’ve created without understanding them. This is a task that requires vigilance, humility, and creativity.

Human history is marked by technologies that first frightened before being domesticated: fire, writing, printing, electricity. But AI perhaps represents a different qualitative leap, because it touches the very essence of what defines us: thought, intelligence, creation (or generation). We’re not simply domesticating a tool; we’re negotiating with radical cognitive alterity.

Faced with this challenge, neither blissful optimism nor paralyzing catastrophism are adequate responses. We must cultivate what Hans Jonas called the “imperative of responsibility” (1979): act in such a way that the effects of our action are compatible with the permanence of authentically human life on Earth. In the case of AI, this means maintaining our vigilance, strengthening our safeguards, while accepting the irreducible part of the unknown that this adventure entails. Because it is indeed an adventure, perhaps the most decisive in human history.

Comparative Poetic Table

	Prague Golem	Contemporary AI
Origin	Created by a rabbi from clay and sacred letters	Created by research teams and trained on massive data
Vital language	Hebrew letters (Emet)	Data and language models
Power	Colossal physical strength	Massive cognitive and creative capabilities
Control	Inscription removed to deactivate	Filters, supervision, network access cutoff
Danger	Possible uncontrolled violence	Unforeseen decisions, bias, autonomous drift

Prague Golem

Contemporary AI

Origin

Created by a rabbi from clay and sacred letters

Created by research teams and trained on massive data

Vital language

Hebrew letters (Emet)

Data and language models

Power

Colossal physical strength

Massive cognitive and creative capabilities

Control