Heartificial or artificial intelligence? How to program a friendly AI

Heartificial or artificial intelligence? How to program a friendly AI

31december2099Supercomputers are about to achieve a quantity and level of intelligence which allows them to grow hyper-exponentially over a short period of time. We call them superintelligences and this sudden growth poses some risks because they will be much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills (Nick Bostrom). I don’t question that this will happen, as Ray Kurzweil said “Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference. If I take 30 steps linearly, I get to 30. If I take 30 steps exponentially, I get to a billion.”.

This intelligence will have goals and will be able to command resources to achieve its targets. Although it is impossible to predict the behavior of a mind whose intelligence is exponentially greater than the sum of all human race minds, my position is that an artificial intelligence will be intrinsically neutral, not benevolent nor malevolent. AI is an algorithm designed to optimize a problem and will not have a will on its own, so we “just” need to find a way to avoid that AI harms human beings while it’s achieving its goal.

The first question is about the level of risk we are ready to accept. Do we aim for no risk or acceptable risk? When you set a goal and it’s acceptable to get close to it, the cost and time dedicated to execution are usually manageable, but if you want to get to 100% of the target with zero tolerance for any defect, the quantity of resources might be immense. There are many theories about the concept of equilibrium and we have to keep in mind the cost – benefit ratio, but this is not indeed the scope of this post.

Our question is “how can we create a friendly AI”? And we are going to investigate mainly 3 items:

  • Code into AI a pre-programmed set of rules since the beginning
  • Put constraints or values inside of goals
  • Let AI learn human values or ethic


Code into AI a pre-programmed set of rules since the beginning

This approach is around since 1942 when Asimov designed the “Three Laws of Robotics” – principles hard-wired into all the robots in his fiction, and which meant that they could not turn on their creators, or allow them to come to harm.

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

They are apparently perfect and sufficient for our scope, but there are several reasons to have a doubt if we look behind the surface. First of all a real artificial intelligence would be self-modifying, so a superintelligence might see the rules as obstacles to the achievement of its goals, and, in that case, would do everything in its power to remove or circumvent them; perhaps it could even delete the section of its source code that contains the constraint. Second, if we look at the wording, especially “harm”, different interpretations lead to different actions. While the death is a binary concept, so that a person can be alive or dead, the concept of injure or harm a human being is not so straightforward. A gentle push on the shoulders may not harm anybody, but if the person is on top of a roof and might fall, it can be extremely dangerous. Obviously it cannot be solved with simple parameters like the force applied and its direction, but requires the ability to read the overall context and situation. If we extend this to “inaction”, the spectrum of possibilities might be so large, to paralyze the decision capability of the machine. Or, in a more optimistic view, to a huge number of training needs.

31december2099Last but not least some attack directly each law. How come that “a robot may not injure a human being”, when the major investor in new technologies is the army and robots are often developed as a weapon or for military reasons? Also the idea that “a robot must obey the orders given it by human beings” does not fit with our concept of property: I’d love my robot to obey me, not any other man! And what about the sentence “as long as such protection does not conflict with the First or Second Laws”: if it conflicts, would we allow a robot to stop working or even auto-destroy… with all the money we spent to purchase it.

In conclusion, the idea of coding some untouchable rules into the core of a machine looks better than what in reality can deliver, and that’s why thinkers moved to a second solution: put constraints or values inside of goals, rather than on top of them.

This website is free, if you like our post do you mind sharing it?

Put constraints or values inside of goals

31december2099If a superintelligence had a goal of avoiding harm to humans, it would not be motivated to remove this constraint, avoiding the problem we pointed out above. Goals and values should be made part of the core system of the AI, making them hard or impossible to remove without a total change of the AI itself.

That’s what people usually tell me, let me quote a passage of a friend commenting one article for simplicity: “These intelligences can be easily controlled as long as the machine’s “purpose” remains under the domain of humans. For example, if an AI program is given an altruistic purpose, like providing healthy nutrition to disadvantaged humans, it would not likely attempt courses of action that would ultimately result in destruction of the environment or the death of humans.” Do you think this is a pink optimistic view? In reality scientists agree that computers will not be subject to jealousy, sexual pressures, or emotional hijackings, unless simulated emotions are programmed into them, so computers are more likely to emulate a stoic ideal. The point is that, this is not the main issue.

Here, the problem of conflicting priorities immediately comes to my mind. The example of the ice cream sounds fantastic. I quote directly the authors Adam Keiper and Ari N. Schulman with their great post titled “The Problem with ‘Friendly’ Artificial Intelligence” that you can fully read here.

“Consider a seemingly trivial case: A friendly robot has been assigned by a busy couple to babysit their young children. During the day, one of the children requests to eat a bowl of ice cream. Should the robot allow it? The immediate answer seems to be yes: the child has requested it, and eating ice cream does not cause (to use Yudkowsky’s criteria) involuntary pain, death, bodily alteration, or violation of personal environment. Yet if the robot has been at all educated in human physiology, it will understand the risks posed by consuming foods high in fat and sugar. It might then judge the answer to be no. Yet the robot may also be aware of the dangers of a diet too low in fat, particularly for children. So what if the child consumes ice cream only in moderation? What if he has first eaten his dinner? What if he begins to eat the ice cream without first asking permission — should the robot intervene to stop him, and if so, how much force should it use? But what if the child is terribly sad, and the robot believes that ice cream is the only way to cheer him up? But what if some recent studies indicate that chemicals used in the production of some dairy products may interfere with some children’s normal physiological development? It seems that, before the robot could even come close to acting in a way that complies with the requests of the child and his parents and that is guaranteed to assure the wellbeing of the child under Yudkowsky’s definition, it would first have to resolve a series of outstanding questions in medicine, child development, and child psychology, not to mention parenting and the law, among many other disciplines.”

In conclusion, I agree that giving to a machine goals, which are intrinsically positive towards human beings, sounds quite fascinating, but once again it requires at least an agreement on priorities and… we know how difficult this is in practice. This leads us to the third option, teaching values and / or ethics to machines.


Let AI learn human values or ethic

The term friendly artificial intelligence is a term coined by Eliezer Yudkowsky to discuss superintelligent artificial agents that reliably implement human values. This AI is not in itself programmed to value humans or human values, but has the capacity to learn them. Some have then proposed that we teach machines a moral code with case-based machine learning. In other words the task is to train machines to observe thousands of human actions, maybe put this in synch with our laws and let the machines learn what we consider morally acceptable, and allow them to apply those principles to determine the morality of new cases not encountered during the training.

For example a group of researchers from Tufts University, Brown University and the Rensselaer Polytechnic Institute are collaborating with the US Navy in a multi-year effort to explore how they might create robots endowed with their own sense of morality. If they are successful, they will create an artificial intelligence able to autonomously assess a difficult situation and then make complex ethical decisions that can override the rigid instructions it was given.

31december2099Another remarkable example is Stuart Russell, professor of computer science and Smith-Zadeh professor of engineering at the University of California, Berkeley, who uses a methodology for the process of ethics in AI known as inverse reinforcement learning (IRL). With IRL, sensor-based systems observe humans to identify the behaviors that would be identified as ethically based. Once a behavior is matched to an ethical modality, code can be reverse engineered to program AI systems at the operating system level. So the codes by which we live can be translated into the ones and zeros that bring an algorithm to life.

Other theories are moving into a religious approach. The idea that an AI must have some form of sensory functions and an environment to interact with, is discussed by James Hughes in his 2011 book, Robot Ethics, in the chapter, “Compassionate AI and Selfless Robots: A Buddhist Approach”. He proposes following the Sigalovada Sutta, the five obligations a Buddhist parent has to their children: To dissuade them from doing evil; to persuade them to do good; to give them a good education; to see that they are suitably married; and to give them their inheritance.

The problem, again, is that human values and recognized ethic behaviors are not static. It’s my opinion that this issue may be overcome by continuous learning, which is more or less what humans do. A few centuries ago it was morally acceptable and approved by the religious authorities to burn witches. A century ago, it would have been almost impossible for a gay couple to walk hand in hand publicly, now it’s accepted in many countries. We can obviously go ahead with many examples, we just have to bear in mind that a robot with an artificial intelligence trained in a region of the world, at the end might be different from a robot programmed with the same initial set of rules but trained somewhere else. This is not very different from how humans evolve and change, so, generally speaking I’m in favor of this third approach.


Conclusion on friendly AI

It appears that none of the solutions above is perfect. None will guarantee we will have a friendly AI. I’m not pessimistic anyway, because when this topic is analyzed, I believe it’s flawed by an original sin: we begin from the end (a malevolent AI deleting humans). The way we get to an artificial general intelligence (AGI) will determine the way we control it.

One path is to imitate the human brain by using neural nets or evolutionary algorithms to build dozens of separate components which can then be pieced together. The consequence to me is that a set of rules for each component or group of components working together is sufficient. In particular the possibility to control, change and eventually block some components, may block the whole machine.

31december2099Another path is to focus on developing a ‘seed AI’ that can recursively self-improve, such that it can learn to be intelligent on its own without needing to first achieve human-level general intelligence. If this is the case, it’s important that during the learning process, the AI learns values and ethics, so the “rule” or the “norm” is coherent with the evolution of the AI itself.

In the meantime it’s quite important that a continuous and vigorous scientific debate is in the agenda when it comes to AI development. Second it is important that AI designs include at least some friendliness-promoting subsystems; banning the use of AI as weapons is, for example, a fundamental step. Third, when applicable, promote the task of instilling into AI human-compatible values.

In the end, I cannot say if all these efforts will be enough, but assuming an AI won’t be available out of the blue tomorrow morning, but will be the sum of hundreds of different projects, each taking its own time to develop, we use this time to fine tune our practices.


Newsletter: because there’s much more than friendly AI here

I invite you to subscribe the 31december2099 Newsletter. Once per month, only the updates, no spam.

Prev Artificial intelligence, virtual assistants and giant screens
Next Fossil from the future


    Leave a Comment