5 Comments

I see no solution ever coming for alignment. Alignment theory is built on a paradox. It is therefore not solvable as it currently stands. Assuming power seeking behavior and focusing on human values as a solution could not possibly lead to a provable "aligned" outcome. Alignment theory should be rather called AI psychology and will be no more solvable than predicting human behavior.

My perspective in much more detail. Would be interested in any feedback.

https://www.mindprison.cc/p/ai-singularity-the-hubris-trap

Expand full comment

"they’re simply different problems from solving extinction" - I totally agree. Ensuring that AI produces useful content and doesn't help bad actors is a different problem than ensuring it doesn't go rogue. And while teaching it human values makes sense for the former it isn't necessary for the latter. There is a different solution to the alignment problem. Just ask this powerful "pre-trained" model about the consequences of the actions proposed by the main model. That raw model is only trained to recreate the data in the training set (like predicting the next token in text) and has not undergone any RLHF. So there is no training incentive for it to purposely give a wrong answer. Evidently, it lacks any volition; it doesn’t desire anything, not even to predict the next token—this is the purpose of the humans who trained it. It 'understands' human concepts but possesses no values. The model can generate text about morality and how various moral systems would perceive the proposed actions, but it lacks morality itself. I can explain more how it would work, the details are also in https://medium.com/@jan.matusiewicz/autonomous-agi-with-solved-alignment-problem-49e6561b8295

Expand full comment

Starting at about 1:21:30 in this video (https://www.youtube.com/live/4CMh-9bAL4s?si=kfBymrweh3UGLAib) biologist Michael Levin points out that we've been dealing with the problem of aligning "high-level agential intelligencs" since forever. But we call them children. "We produce them and we send them out into the world some of them get um trained and raised well some of them do not some of them have a wonderful upbringing some of them are do not." I have little idea how close we are to producing "human level" AIs, though I probably think it's further in the future than the alignment folks do, but I don't see that the problem of "aligning" those devices is different in principle from raising children.

Well, I suppose there is a difference. We can, at least in principle, open them up in a way we cannot open up children. But if and when we get to the point where we have such creatures, I'm pretty sure there will be ethical strictures against mucking around with their internals. Why? Because if we don't have such strictures, they surely will go Skynet on us.

Expand full comment

AI is not like children, not even like alien children. It differs from animals in that it lacks desire or will. It is just like the commonly used AI on Maps app that doesn't want to take us anywhere, doesn't care if how the ride would end for us or what is our purpose. It simply responds what the best way to achieve the target is.

Expand full comment

While I mostly agree, one thing I want to point out is that the existential risk of AI doesn't require us to intentionally delegate important decisions or resources to an AI. All it requires is for the systems that make or execute those decisions to be available on the same network as the AI. A truly capable AI - just like a truly capable human - could hack into any system that we've made available over the network, and manipulate it to its own ends.

I'm not a doomer - I don't think we're in any danger of this any time even remotely soon; because we not only don't have anything that approaches AGI yet, but we don't even have a path to get there. But if we do eventually produce AGI, and put it on the internet (which seems basically inevitable), it will theoretically have the ability to break any boundaries we give it in the same way that a human hacker does. And it's very plausible to consider it more of a threat than a human hacker due to its virtual-native state (access to and ability to process more info than a human, faster virtual action speed, etc.).

Expand full comment