Matrioshka Testing: A way to keep your AI honest (or at least guessing)

−0

This method would work, almost without doubt, on humans. There would always be some doubt in their minds as to whether the universe was real, so they would probably not kill everything. Probably.

Now, consider AI. What is AI? Code. So, if your AI doesn't have sensors, yes, this method works. It can't sense its environment (and more importantly can't affect it), as it's just code on a hard disk, perhaps with a keyboard and monitor attached).

You start to have problems when the AI is linked to sensors and effectors. One wrong move and it'll know you're lying; then it might never believe you again and go on a killing spree (though you might want to see Dan's answer for reasons why it wouldn't). For example, if you kick the box and it senses some more light, it knows there's something outside the "universe" causing that. If someone walks by it and casts it into shade, the same thing.

Once it has effectors and sensors, it can not only tell it's not in the real universe, it can do something about it - like get out of the box and thump you.

If this is a different scenario and you put it in a full simulation, unfortunately, it may still be able to tell. Very rarely are simulations entirely accurate; there are very likely some bugs in it, which, if the AI finds in the course of its time there, may cause some pretty interesting speculation on its part. Additionally, if it's sensors are good enough, it'll be able to detect that the people it's interacting with are made of pixels not cells, and are cold. While it may not know what people are really like, it will be able to figure out that a complicated organism needs to be warm for its body processes to work correctly.

So in short, you can either disconnect all the sensors, or be very very careful.

^{I will also refer you to some XKCD: The AI-Box Experiment.}

posted about 10 years ago

ArtOfCode‭ staff

101 reputation 4 65 10 3

Copy Link

Raw

Markdown

History

Communities

Matrioshka Testing: A way to keep your AI honest (or at least guessing)

0 comment threads

1 answer

0 comment threads