What is “Sentient AI?”

Recently, as anyone who has managed to find this post is likely to know, a Google engineer was placed on leave after raising concerns that they may have created a sentient artificial intelligence (AI), called LaMDA (Language Model for Dialog Applications).

This story made waves in the popular press, with many people outside the field wondering if we had at last created the sci-fi holy grail of AI: a living machine. Meanwhile, those involved in cognitive science or AI research were quick to point out the myriad ways LaMDA fell short of what we might call “sentience.”

Eventually, enough popular press articles came out to firmly establish that anyone who thought there might be a modicum of sentience in LaMDA was a fool, the victim of their own credulity for wanting to anthropomorphize anything capable of language. That previous sentence has 23 hyperlinks to different articles with various angles describing why having a sentience conversation about LaMDA is some flavor of naïve, ridiculous, stupid, or all of the above, if you’d like to see those arguments.

It's not necessary to add yet another piece about whether or not LaMDA is sentient to the internet (though we will touch on that more later). At this point, it’s likely every possible argument has been made. Far more interesting is to use this opportunity to explore a more fundamental question: what would sentient AI be, and how would we recognize it?

Always Check With Captain Picard First

Before we get into details, let’s first look at perhaps one of the best scenes and the finest bits of acting Patrick Stewart ever delivered as Captain Picard, a case quite literally about whether or not an AI (in this case, Data) was sentient.

This scene is essentially a TLDR; for this entire post. We’ll go into more detail on several of these topics, including describing what properties a sentient AI might need to possess in a scientifically verifiable way. But like so many things, Star Trek was decades ahead of its time here (and also like so many things, authors like Isaac Asimov and Phillip K. Dick were decades ahead of Star Trek getting there). Captain Picard’s argument points out a fundamental issue most have trying to identify sentience: they can’t define what they are looking for.

What is Sentience?

If you read enough articles on LaMDA, and indeed on the more general idea of human-like general AI, you might notice a surprising trend. Even when the speaker is someone engaged in research which seeks to create intelligent systems, even when they state a belief that human-level AI is possible (even if it might take a long time), they often provide vague or qualitative definitions of sentience. On the one hand they do not doubt sentient AI is possible, and on the other, they say little to nothing substantive about how we might recognize it.

As Carissa Véliz, associate professor of philosophy at the Institute for Ethics in AI at the University of Oxford, wrote in a Slate article:

To be sentient is to have the capacity to feel. A sentient creature is one who can feel the allure of pleasure and the harshness of pain. It is someone, not something, in virtue of there being “something it is like” to be that creature, in the words of philosopher Thomas Nagel.

Or from Gary Marcus, founder and CEO of Geometric Intelligence and author of books including "Rebooting AI: Building Artificial Intelligence We Can Trust," in a blog post:

To be sentient is to be aware of yourself in the world; LaMDA simply isn’t. It’s just an illusion, in the grand history of ELIZA a 1965 piece of software that pretended to be a therapist (managing to fool some humans into thinking it was human), and Eugene Goostman, a wise-cracking 13-year-old-boy impersonating chatbot that won a scaled-down version of the Turing Test.

Or from Melanie Mitchell, Davis Professor of Complexity at the Santa Fe Institute, and the author of “Artificial Intelligence: A Guide for Thinking Humans,” as reported by MSNBC:

There's no real agreed-upon definition for [sentience]. Not only for artificial intelligence, but for any system at all. The technical definition might be having feelings, having awareness and so on. It's usually used synonymously with consciousness, which is another one of those kinds of ill-defined terms.

But people have kind of a sense, themselves, that they are sentient; you feel things, you feel sensations, you feel emotions, you feel a sense of yourself, you feel aware of what's going on all around you. It's kind of a colloquial notion that philosophers have been arguing about for centuries.

Notice that these (admittedly poetic) descriptions of sentience shift the burden of defining or detecting it on to other nebulous concepts, like “feeling” or “awareness” or “consciousness.” Why is it that even people who work in AI, people trying to build the kinds of systems they think could one day become sentient, nonetheless have a difficult time defining what a sentient AI would look like? Like pornography to the Supreme Court, the “I know it when I see it” definition seems to be the consensus answer. For practical purposes, this kind of definition is insufficient, because it will suffer greatly from our own biases about what “seems” sentient.

If we want to have a more constructive notion of what makes something sentient, we need to establish something more workable than a hand wave at what sentience might be. To do that, we will look at sentience in a new way:

“Sentience” is not a property an entity possesses. It is a label applied to ascribe motivation to an entity’s behaviors.

This definition has several important components. First, it rejects the notion that sentience is an innate property of any entity – what we might call “intrinsic sentience.” As we’ve seen, attempts to define sentience as some sort of nebulous “feeling” or “sense of self” are not truly definitions. They simply pass the buck for defining sentience on to whatever might be meant by a feeling or a sense. Second, it defines sentience as a label applied to that entity by external agents: what we might call “extrinsic sentience.” Third, the purpose those agents have in applying the label is to explain why the entity may be exhibiting certain behaviors in the first place. Together, these second and third components make the application of the label “sentient” to an entity relative to an external observer, and thereby imply the value of applying that label to an entity is only in how that external observer decides it should affect the treatment of the entity.

We will go through these three aspects in more detail shortly. First, however, it’s important to provide more explanation for why intrinsic definitions of sentience are not a very productive approach.

Suitcase Words

Typical definitions of sentience make it into what Marvin Minsky called a “suitcase word:” a container into which we bundle a bunch of things we don’t understand, close it up, give it a name, and pretend like we have accomplished something. He most famously described consciousness as a suitcase word (and I HIGHLY recommend reading that interview in its entirety), In many ways, discussion of sentience and consciousness are intertwined (or even the same thing), and as such, Minsky’s arguments regarding consciousness apply equally well to sentience. Intrinsic definitions of concepts that are suitcase words attempt to layer more meaning on top of an already nebulous base, to the extent that they are meaningless in any kind of scientific context.

Why is it so damning if sentience is a suitcase word? After all, our experience of being sentient is complicated, and truly hard to describe. We can’t pin it down precisely, one might argue, because fully understanding it strains the limits of our ability to understand ourselves. Sentience is a big, powerful thing, and perhaps we shouldn’t try to reduce it to something specific.

The problem, then, is that leaves us only with our personal experience as a framework to understand sentience. Because everyone’s personal experience of being sentient is different (or at a minimum, it’s impossible to know if two people’s experiences are the same or not), this approach does not yield a concrete, testable (therefore scientific) definition of sentience. It produces a definition in which, hopefully, allusions to a personal experience are enough for another person to understand what is being defined.

Because definitions of sentience based on personal experience rely on allusions to personal experience, they lead to efforts to define sentience intrinsically. Yet despite numerous attempts, there is no accepted intrinsic definition of sentience. Philosophers and scientists have proposed and debated possibilities for millennia, with no consensus found thus far. This means one of two things: either we are incapable (so far) of understanding sentience as an intrinsic property, or sentience is not an intrinsic property.

While I am fully on board with the concept that humans might not be capable of understanding everything about the universe, whichever of those two possibilities about intrinsic sentience are true, from a practical standpoint, continued efforts to frame sentience intrinsically will not lead to a productive definition or use, because both are searching to define an indefinable object.

The Homunculus Theory of Sentience

The reliance on subjective experience or ineffable qualities to define sentience intrinsically embodies the “homunculus theory” of sentience: the belief that, if you keep digging deep enough, you will eventually find an atomic self, an irreducible sentient core of our beings. Setting aside arguments that this core is a soul (this may not be the site for you if you’d make that claim), the homunculus theory of sentience lacks an actual definition for what this core is. Is it part of the brain? A collection of memories? A pattern of neural activations that recurs every time we think about “ourselves?” No proposals like this are backed by science. Not unlike Robert Sapolsky’s argument that there must not be free will, because at no point can our brain spontaneously do anything except follow the laws of physics, there is no discrete part of our mental faculties you can draw a neat border around and say “here, this is the sentient part that comes up with spontaneous thoughts.” In the most generous intrinsic case, sentience is a continuous, emergent property of our mental processes, an advantageous evolutionary illusion that helps coordinate our survival and pass on our genes, not a monolithic output of a discrete mental faculty – meaning there isn’t anything in particular to “look for” to find the origin of sentience.

Nonetheless, homunculus theory feels right. It feels like a way to define sentience because it matches the way it feels to be a person. We do feel that we are an irreducible whole, like a pilot sitting in the cockpit of a biological body, looking out the windshield and reading our instruments and pulling levers. That pilot is the homunculus. But from Phineas Gage to social media influencers to doctors to optical illusions, there are numerous examples of people’s “self” being more mutable, rudimentary, or deterministic than the level on which we feel the homunculus operates. Despite all these examples, however, homunculus theory remains extremely powerful, even among philosophers and scientists, in no small part because there are no alternative explanations that match our subjective experience of sentience. We feel like an atomic sentience, regardless of whether any independent evidence supports that feeling. This feeling also provides a useful, if imprecise, way to test for sentience in a human-centric world: to what extent can I pattern-match the actions of another entity to my own, such that I can imagine they have their own sentient homunculus just like mine?

A pattern-matching approach like this creates biases in assessing sentience. First, a person supposes anything similar enough to them (other people, or perhaps even advanced animals) to display behavior they could imagine doing themselves must also have a homunculus. Second, a person assumes that the possession of a homunculus is essential for sentience. When combined, these biases cause us to assume anything (a) whose actions we can explain, or (b) whose functional details we understand, must not be sentient. Machine intelligences fall victim to both assumptions.

Why Machine Intelligences Don’t Feel Sentient

A machine intelligence takes actions we can mostly explain, at least in terms of fulfilling the goals it was trained to pursue (assuming it’s a supervised system or otherwise trained). LaMDA has been described as “auto-correct on steroids,” and indeed, any chatbot one could imagine constructing today could probably be described that way. Chatbots are pattern recognizers, and as we know that’s how they work, we write off their achievements as simply the results of sophisticated pattern recognition.

Ironically, pattern recognition is among humanity’s strongest skills, if not the strongest, and there is no way to prove whether another person is sentient or a sophisticated pattern matcher using an intrinsic definition of sentience (see Captain Picard's efforts above). The Turing test was predicated on our inability to do distinguish between these two scenarios. Yet because we know chatbots were designed and trained to recognize patterns in language, to us, that’s all they seem to be doing – even if we can’t prove whether we are doing anything more than pattern recognition ourselves.

We also understand the mechanisms by which a machine intelligence functions. Neural networks are extremely complex, and it’s not always possible to understand exactly what role each calculation and simulated neuron play in a behavior. Yet we still understand the mathematics that underlie these networks, how to construct them to achieve desired capabilities, and how to train them to exhibit seemingly intelligent behavior. We know that layered combinations of simulated neurons (which, at this point, bear only a passing resemblance to biological neurons) are capable of modeling extremely complicated nonlinear mappings of inputs to outputs, and this modeling can produce results that seem very intelligent. Since we understand how these systems function, our natural conclusion is they are rather mechanical and basic, in contrast to systems whose function we don’t fully understand, such as our own minds (which are, admittedly, far more complex than any artificial neural network ever created… so far). Our ability to explain their function removes the gaps in understanding into which we can bundle concepts we don’t know how to define.

For these reasons, it seems unlikely the homunculus theory of sentience would ever conclude a machine intelligence was sentient. We know both how to explain the actions of a machine intelligence and how it functions, and so there isn’t enough mystery left for us to ascribe any of its behavior to a nebulously defined intrinsic sentience. Homunculus theory expects there to be something more than just a set of explainable actions and functional responses at the core of a sentient being. It only ascribes an entity sentience when something unexplainable can be found within it, something complex and hard to define but easily alluded to. This bias forms because our most natural description of sentience is through our personal experience with it: that there is a core “us” within which our sentience lives. A homunculus. So, we assume anything else that’s sentient must be equally hard for us to explain or describe, and have a homunculus that yields its sentience. We might visualize this point of view like this:

We privilege the properties of our own complexity because of our inability to describe it well. In other words, any sufficiently indescribable intelligence is indistinguishable from sentience.

An Extrinsic Definition of Sentience

Rather than letting our subjective experiences of sentience guide our efforts to decide whether another entity is sentient, we can cast sentience in a more productive way. Instead of looking for sentience as some object or property contained within an entity, we will simply use “sentient” to describe an entity exhibiting a collection of behaviors, constructing an extrinsic definition of sentience.

What are the behaviors that would lead us to call something sentient? Certainly, this question can be (and has been) debated extensively, but for now, here is a reasonable starting point:

Goal-Seeking. The entity identifies and pursues goals.

Intelligence. The entity performs problem solving tasks which require causal understanding of multiple interacting components.

Justification. The entity provides logical reasoning to explain its actions.

Sense of Self. The entity acts in ways that show its conception of itself as a discrete entity.

Self-Awareness. The entity acts in opinionated or nuanced ways that indicate it understands its place in its surroundings or the world.

Consistency. The entity’s above properties do not change absent stimulus or abruptly over time.

Interestingly, these are many of the same kinds of properties an intrinsic definition of sentience would say a homunculus has. However, importantly, here, the properties are described in terms of an entity’s actions, not qualities the entity possesses. In other words, an intrinsically defined sentience has these kinds of qualities; an extrinsically defined sentience exhibits these qualities.

It may seem like a minor distinction, but extrinsically defined sentience eliminates the biases created by the homunculus theory. The question of sentience is not searching through the source code of a machine learning model or dissecting the brain of some alien lifeform, hunting for irreducible gems of sentience. In fact, we can’t even search our own brains for such gems, because they don’t exist. Instead, tests for sentience should not focus on how sentience is being achieved; they should focus on if sentience is being achieved. An extrinsic definition of sentience is the only form of definition which creates a testable definition in the scientific sense. Relatedly, the Turing test has taken a lot of criticism in the LaMDA debate, in large part because people have misconstrued its purpose: it was designed to test an extrinsic definition of sentience, not an intrinsic one.

So, is LaMDA Sentient?

While I have not ever interacted with LaMDA myself, from the reports about it, it seems to come up short on at least two counts for extrinsic sentience. LaMDA’s statements suggest it may be meeting the Justification, Sense of Self, and Self-Awareness categories, but it doesn’t seem to meet the Goal-Seeking or Intelligence criteria. It’s also not clear whether it meets Consistency criteria, though this could also simply be because of the relatively small, edited amount of information which has been made public. This is an admittedly odd combination of outcomes (for example, because it seems Sense of Self and Self-Awareness are easier to achieve than Goal-Seeking or Intelligence?), but appears to best reflect the state of the field nonetheless.

However, we now have an extrinsic definition of sentience, which does not let our understanding of why or how a machine intelligence functions prevent us from recognizing its sentience. This extrinsic definition of sentience is not a suitcase word. It’s a list of behaviors which, together, comprise both necessary and sufficient criteria to be called sentient. Of course, by reducing it to a list of behaviors, we have removed the mystique of what being “sentient” means. If it feels like reducing sentience to such a mundane list of behaviors, rather than that deep, indescribable sense of consciousness we subjectively feel, is somehow oversimplifying it or removing a critical holistic component from the definition of sentience, well, that’s why the homunculus theory has been hindering our ability to make progress discussing sentience for millenia.

This extrinsic definition of sentience is even more helpful because it doesn’t merely describe how to assess the sentience of a human-created machine intelligence. It could be applied to any entity or system displaying the constituent behaviors. Is a dog sentient? Is an alien lifeform sentient? Is Earth sentient? Is some hypothetical self-organizing cloud of space dust sentient? We can interpret the behavior of such entities, without referenece to their structure or function, to try and make that decision. And if we have essentially removed any special meaning from the word “sentience” by using it simply to refer to a set of behaviors, well, that is by design. It is not a goal to retain undefinable qualities in sentience (or anything else for that matter). Eliminating those kinds of vagaries is consistent with the long history of scientific and philosophical progress, and allows us to consider what rights, protections, and benefits should come with being “sentient” without regard to how sentience is achieved.

In fact, ascribing of great importance to whether an entity is sentient is largely a problem of our own making. Any ethical, moral, or legal consequences of an entity being sentient are thes consequences of values we have imbued into the concept of sentience historically, largely understood intrinsically. Our tendency to describe sentience intrinsically has connected our quest to define the sentience homunculus to these ethical, moral, and legal questions. If sentience is defined intrinsically, it is important to know how to find the component of the entity that yields sentience in order to decide if it is ethical to terminate it, for example. But because intrinsic definitions of sentience ultimately do not make any particular concrete, testable statements about sentience (there is no homunculus to find), practical applications of an intrinsic definition of sentience will never move past philosophical debate about the nature of sentience and how to detect it.

An extrinsic definition of sentience is not only practical. It is also the only means by which a test for sentience can be defined. While an extrinsic definition may feel it loses some of the “magic” of the subjective experience of sentience, attempts to preserve that ineffable quality of experiencing sentience only hinder our ability to understand sentience as anything other than magic, certainly not a very useful scientific approach. By using an extrinsic definition of sentience, one day, when something just a bit more sophisticated than LaMDA comes along, we can recognize its sentience without digging around for a specific nugget of sentience within it. Such definitions might also help us recognize other hard-to-define qualities – such as emotions, consciousness (like this!), or being alive – through extrinsic means as well.

See Reason: Looking at the World through Science and Reason

Search This Blog