Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Matrioshka Testing: A way to keep your AI honest (or at least guessing)

+0
−0

I have had some time to ponder my previous question, and here's what I came up with.

You take your freshly baked AI (or your destructively uploaded human), and put it in a box$^1$. As far as it can tell from inside, that's reality. Keep it there for a million subjective years, tell it to behave, and tell it that it might be in a simulation, and that if it is, it will be judged according to how it treats flesh-humans. If at any point it does not behave, you wipe it out with extreme prejudice, and bake a new AI. If it does behave (i.e. not wipe simhumans out and turn them into paperclips) for that time, take it out, put it in another box, and tell it this is reality, maybe, so better behave and not wipe (sim-?)humans out. Repeat N times. Finally take it out for real, and again tell it this is reality, maybe, so better behave and not take out us humans.

Can it work? Or to rephrase it, can a sufficiently patient uploaded human or an AI figure out if their world is a simulation or not? I assume that parts of the humans' memory or the AI training can be edited before being placed in the box-set.

  1. By Box I mean an incredibly powerful machine that simulates a subset of reality as well as physically possible, down to a subatomic level. The AI would be thus be an agent inside the simulation.
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

This post was sourced from https://worldbuilding.stackexchange.com/q/8613. It is licensed under CC BY-SA 3.0.

0 comment threads

1 answer

+0
−0

This method would work, almost without doubt, on humans. There would always be some doubt in their minds as to whether the universe was real, so they would probably not kill everything. Probably.

Now, consider AI. What is AI? Code. So, if your AI doesn't have sensors, yes, this method works. It can't sense its environment (and more importantly can't affect it), as it's just code on a hard disk, perhaps with a keyboard and monitor attached).

You start to have problems when the AI is linked to sensors and effectors. One wrong move and it'll know you're lying; then it might never believe you again and go on a killing spree (though you might want to see Dan's answer for reasons why it wouldn't). For example, if you kick the box and it senses some more light, it knows there's something outside the "universe" causing that. If someone walks by it and casts it into shade, the same thing.

Once it has effectors and sensors, it can not only tell it's not in the real universe, it can do something about it - like get out of the box and thump you.


If this is a different scenario and you put it in a full simulation, unfortunately, it may still be able to tell. Very rarely are simulations entirely accurate; there are very likely some bugs in it, which, if the AI finds in the course of its time there, may cause some pretty interesting speculation on its part. Additionally, if it's sensors are good enough, it'll be able to detect that the people it's interacting with are made of pixels not cells, and are cold. While it may not know what people are really like, it will be able to figure out that a complicated organism needs to be warm for its body processes to work correctly.


So in short, you can either disconnect all the sensors, or be very very careful.

I will also refer you to some XKCD: The AI-Box Experiment.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »