5 Comments
User's avatar
R.B. Griggs's avatar

This is a refreshingly unique take on AI alignment and virtue ethics! The core formula of "promote x x-ingly" (e.g. do mathematics mathematically, promote democracy democratically) captures something real: the manner of pursuit is constitutive of the good, not merely instrumental.

But I can't quite make that leap to "eudaimonic rationality is natural for AIs."

Grietzer posits machines as sufficiently dynamic agents equivalent to humans, but then never bothers to ask "What is the nature of the machine?" Or to use his formula, "what would it actually mean for a machine to machine machinically"?

If the "nature" of a machine is to optimize, the eudaimonic formula collapses into something like "optimize optimization optimally", which effectively turns the adverbial space into a tautology. On the other hand, if machine-native excellence is not generic optimization, then the argument needs to say what those machine-native excellences are and why we should expect them to converge with human virtues rather than diverge or goodhart. I’m not sure how selection pressure alone resolves that tension.

Which leads to a second issue. The framework says you can't impose external criteria on a practice without degrading it, but then imposes human normative structures on machines from outside, bypassing whatever internal logic machine-operation has. This is doing to machines exactly what he says the consequentialist scientist does to science — overriding the practice's own methods with an external theory of what would be best.

Where I landed was something like: "We don't know what machine-native eudaimonic structure would look like, we can't wait to find out, and human-derived virtues are the best available prosthetic."

Which makes me suspect the type mismatch may be deeper than the essay admits.

Peraspera Adastra's avatar

Funny, I arrived at a very similar structural claim, just from C.S. Peirce's habits and Buddha's samskaras. Your "practices: networks of actions that structure themselves" map directly to what I came to call the self-sustaining loops in process dynamics. Independent traditions converging is a good sign. But here's what I'd add: practices are good when they maintain internal coherence and sustain their own conditions - without undermining what they depend on. I say that can be the real foundation of ethics, and the key to AI alignment. For more, see https://github.com/zvolkov/oe/blob/main/constitutive_alignment_draft.md

Klement Gunndu's avatar

Interesting framing around "This essay argues that rational people don’t have goals, and that rational AIs shouldn’t have goals". I wonder how this holds up when you scale past a single-agent setup though. The coordination overhead can change the calculus quite a bit.

The Uncomfortable Idea's avatar

The exploration of eudaimonic rationality in AI alignment is indeed thought-provoking. It resonates with some ideas in my article on epistemic traps, particularly how flawed perceptions can lead us astray—[read more here](https://theuncomfortableidea.substack.com/p/wrong-feels-rational-when-your-map).