The Artificiality of Alignment

Discussion about this post

Oct 8, 2023

I see no solution ever coming for alignment. Alignment theory is built on a paradox. It is therefore not solvable as it currently stands. Assuming power seeking behavior and focusing on human values as a solution could not possibly lead to a provable "aligned" outcome. Alignment theory should be rather called AI psychology and will be no more solvable than predicting human behavior.

My perspective in much more detail. Would be interested in any feedback.

https://www.mindprison.cc/p/ai-singularity-the-hubris-trap

Expand full comment

Jan Matusiewicz

Oct 7, 2023

"they’re simply different problems from solving extinction" - I totally agree. Ensuring that AI produces useful content and doesn't help bad actors is a different problem than ensuring it doesn't go rogue. And while teaching it human values makes sense for the former it isn't necessary for the latter. There is a different solution to the alignment problem. Just ask this powerful "pre-trained" model about the consequences of the actions proposed by the main model. That raw model is only trained to recreate the data in the training set (like predicting the next token in text) and has not undergone any RLHF. So there is no training incentive for it to purposely give a wrong answer. Evidently, it lacks any volition; it doesn’t desire anything, not even to predict the next token—this is the purpose of the humans who trained it. It 'understands' human concepts but possesses no values. The model can generate text about morality and how various moral systems would perceive the proposed actions, but it lacks morality itself. I can explain more how it would work, the details are also in https://medium.com/@jan.matusiewicz/autonomous-agi-with-solved-alignment-problem-49e6561b8295

Expand full comment

3 more comments...

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

The Artificiality of Alignment

On the stakes of AI progress and claims about AI's existential risk.

Discussion about this post