2 Comments

Another great interview. A very cool series of projects. I must admit I am a little wary of these generative agents. I enjoyed your line of questioning about safety concerns. The philosophical angle is fascinating. LLMs are not meant to replicate the totality of human consciousness--both rationality and irrationality--rather a subset of consciousness, one that is designed to be helpful and patient. Thus, when generative agents are set loose in a simulation that are "too cooperative" and "do not fight." At that point, I was thinking, "And it is good thing they don't act like us?" Here the safety and security angle becomes more than a little concerning. Let's say a tech company wants a better simulation and asks a researcher to create a LLM that is disruptive, manipulative, and unpredictable. The only safeguard Park evokes is an IRB process. Daniel, I am wondering if you think IRB committees would be forward thinking enough to put a halt to such research. I personally would hope so. I guess a lot has to do with context and use. Still, I wonder how far we should push toward programming LLMs to be disruptive and maladjusted. I guess fundamentally one's response hinges on (1) whether one believes machines can become conscious, and (2) whether human beings can create adequate safeguards to protect themselves from conscious machines. Two big what-ifs if there ever where some...

Expand full comment
author

Thanks for writing! On your (1) there was just an interesting paper reviewing some ideas about consciousness (but starting from computational functionalism as a base assumption?) recently. I don’t think anyone would want maladjusted LLMs, but there are use cases like: what I’d you wanted a chatbot that would help someone who has extreme social anxiety figure out how to talk to people? You’d want a system that’s cordial, sure, but you wouldn’t want it to be so deferential that someone acting rude doesn’t get proper feedback. Scott Aaronson tried a simulation like this and found the instruction tuning for ChatGPT made it unable to call him out when he purposely acted overly forward. So broadly, I think the space of desirable behaviors might be broader than just helpful/harmless to the point of being deferential as they are, but as you said, it’s hard to proscribe exact limits without knowing a lot more!

Expand full comment