Can you use an AI to shackle (control) an AI?

−0

I'm really having trouble here. Let me outline my thinking:

The First AI
This is my major problem. If the first shackling AI is weaker than the next, which is weaker than the next, and so on, then surely the shackled AI would just outsmart the one below it and persuade it to release it.
My first thought on this one is then that they should all be of the same intelligence. This has the same problems as we do already though - where do we stop with AI complexity? If they're all of the same intelligence and they all think the same way then when one goes rogue, they all do - and then we have not one but 100 rogue powerful AIs to deal with.
So the solution, then, is to have it the other way round, is it? The powerful shackles the less powerful? Clearly it's not. This method doesn't work because the AI at the top just says to itself,

_{0101011101101000011110010010000001100100011011110010000001001001001000000110001001101111011101000110100001100101011100100010000001101000011011110110110001100100011010010110111001100111001000000110001101101000011000010110100101101110011100110010000001100110011011110111001000100000011101000110100001100101011100110110010100100000011010000111010101101101011000010110111001110011001000000111010001101111001000000110101101100101011001010111000000100000011101000110100001101111011100110110010100100000011011000110111101110111011001010111001000100000011101000110100001100001011011100010000001101101011001010010000001101001011011100010000001100011011010000110010101100011011010110010000000101101001000000100100101101101001000000110101001110101011100110111010000100000011001110110111101101001011011100110011100100000011101000110111100100000011001000111001001101111011100000010000001110100011010000110010101101101001000000110000101101110011001000010000001101100011001010111010000100000011101000110100001101001011100110010000001101100011011110111010000100000011011000110111101101111011100110110010100101110}

Or, for those of us less educated in base 2:

"Why do I bother holding chains for these humans to keep those lower than me in check - I'm just going to drop them and let this lot loose."

However, there may be a way. Have the AIs the opposite way around - most intelligent first. Subject the AI on the top to millenia of Matrioshka treatment. Then put it in charge as "just another part" of the treatment. If your Matrioshka premise works, this AI doesn't let the chains go and the others can't outsmart it.
The Biggest Problem
I think you've already hit it. The problem here is how to organise the AIs to make sure they can't be let loose. (Here is the point where everyone points out that AI will not necessarily go rogue - I know, I'm assuming worst case scenario).

Oooh. Something else that just came to me on my second read through. The CEV idea. While that's a brilliant idea in principle, there are plenty of other AI questions, comments and answers on this site that explain that even the most benign goal can cause destruction to humanity.
Will It Work?
Ah, the big one. I have to say - I don't know. The most plausible way of making it work that I've come up with is the one I explained above - but even that relies on your Matrioshka idea working. The only alternative I can see is for the difference in intelligence between each AI to be negligible - but that means hundreds or millions of AIs. For the sake of a definitive answer, I'll say yes - the Matrioshka idea seems sound to me so if applied correctly, should work.
My One Improvement
I'd have to say I'd make the system as I explained in the first point. Have the intelligent AI first. And then I'd spend years and trillions on making damn sure that I've got that "q-constraining" right. Let's see - if your AI is self-improving, there's a chance that it will see that as a restriction and remove it - but it's the part that this system is based on, it's why it works. If they remove that - 100 rogue super-powerful computers, anyone? And the most intelligent doesn't know who's real and who's not? So, you need to make absolutely sure that the self-improvement of the self-improvement routine that self-improves the AI can't possibly self-improve enough to see the q-constraint as counter-improvement and then go and self-improve it. Because that, my friends, would be bad.

posted over 10 years ago

ArtOfCode‭ staff

101 reputation 4 65 10 3

Copy Link

Raw

Markdown

History

Communities

Can you use an AI to shackle (control) an AI?

Intro and Context (feel free to skip if TL;DR)

Core Issue Discussed: Reinforced Recursive Self-Shackling

Questions to Worldbuilders

0 comment threads

1 answer

0 comment threads