- Glad this podcast did not trigger me at all. It felt very smooth and positive.
- As usual with philosophers, not much was said initially, but it became very interesting as the podcast advanced. I like the point made about political philosophy work—having the authority to do things and decide what is right, not just doing what you think is right. I like that he spoke about differential privacy, and his awareness about the issues. Interesting thoughts on self-exfiltration and the risks of double agents and companions. I was thinking about this from an interface perspective, though. The companion perspective is obviously the next step. Yes, that's a huge issue, and there's a huge amount of coercion and behavior modification potential for these systems, as well as coercion to declare or state or confess things (things you think about or say or do, and things you don't think about or say or do; in a sense, the companion can not only coerce false confessions, but also implant thoughts and manipulate behavior). This is a critical project. One way to do this ethically is in a virtual environment where rewards are given to companions or agents based on implicit suggestions to users in the virtual environment that maximize the purchase of products based on real world data (analysis of real ad interactions), because this is obviously going to happen given the way advertising works. Then we can create ways to protect users from the influence of advertisements through companions, by not allowing certain topics or suggestions to the user (that implicitly or indirectly steer the user to buy a product they do not really need or that is not in the user's best interest). In fact, this kind of potential risk of influence is already happening in the real world (there is an example of a chatbot companion telling a user to commit suicide).
- Personally, I would like to see more scientists making decisions in the future, and fewer investors and corporate leaders who lack deep technological expertise or awareness of critical issues (science, human agency, free will). We need bold researchers and scientists, not genius kids hiding under their beds when the shit hits the fan.
- I am able to stand up fearlessly to a military leader. I want more researchers and scientists like me—stand-up people, not fearful ones. Go to the gym and be fearless kids!
- Keep up the good work.
### General Notes
Here are a few things that trigger me:
**AI Catastrophic Risks**:
- Discussing these from a philosophical standpoint seems pointless to me.
- I prefer to talk about them from a safety, security and scientific realist perspective. Topics like cybersecurity, sabotage, surveillance, privacy, freedom, coercion, ethical biases, addiction, biosecurity, and concrete adaptation problems (AI-user and Human-Computer interaction space adaptation and effects) are more relevant than thinking about catastrophic risks. Consider discussing secure programming, boundaries, federated learning, homomorphic encryption, computational thresholds, query thresholds, concrete ethics, cultural relativism, virology and synthetic biology, chemical weapons, constitutional AI, and ML methods that do not negatively impact outliers and “black sheep” (as it is the case of 'recommender systems' and many more methods in ML).
**Bayesian Parroting**:
- Talking casually about "priors" feels pretentious and like "whatever." Using Bayesian inference and calculating posterior probabilities based on priors you do not fully understand triggers me.
- I prefer to speak in absolute or relative and objective terms. I like to discuss and think in terms of representative settings, optimal criteria, memory and space criteria, observer effect-based considerations, and representability—concretely and implicitly. It feels shallow when people say "priors" unironically in their speech, or they calculate probability of things (lol), it feels as if they don't deeply understand what they're talking about.
**Effective Altruism (EA)**:
- I can't stand the EA cult. I believe more in a "greedy" approach where you do good locally. If everyone does good locally, things get better overall. EA's intermediate goals may collide with each other’s or with those of non-EAs, whereas doing good locally inherently eliminates those problems.
- To me, EAs seem to suffer from ego issues and complexes such as messiah complexes. They don't seem genuinely concerned about humanity as a whole, but maybe I'm just judging them as an outsider. They don't seem like attractive people to be friends with, at least to me.
- I understand that this is a very relevant community in AI safety, and I want to understand which companies have not been taken over by them, or if I am wrong about them, ways to interact with them that are unpretentious (the lesswrong and EA forum and the alignment forum feel a bit pretentious). Maybe it is also my slightly judgmental feeling as an outsider and without concrete knowledge of how to get into a career path in AI security, safety, and scientific research capabilities, and my contrarian and oppositional nature that causes these feelings in me about the EA community.
**AI Safety, Security, and Scientific Research**:
- I hope to find friends and potential colleagues in AI safety and security and scientific applications who fit my personality well or who are considerate with me. I am willing to be welcoming of EA people if I can understand them better and they do not say "priors" and "Bayesian" too often. This is very Aspie, apologies.
- These three fields will become closely intertwined as we get closer to AGI and automated research capabilities.
### P.S. Topics to consider for the future, if they are of interest to you:
I've been thinking about theoretical computer science and concepts that can help understand how an advanced, self-contained machine learning model can self-exfiltrate its environment (inspired by Jan Leike’s concern).
- After reading, I thought the concepts of Abstract Rewriting Systems (ARS) and confluence might explain how models (coupled with other considerations, such as interactions with compilers and misaligned goals) can stealthily self-exfiltrate from the lab or "secure system."
- It would be cool if you ever interviewed a hardcore theoretical computer scientist with knowledge of functional programming, category theory, or software engineering to explore whether ARS and confluence can crystallize as capabilities and a self-exfiltration path and whether it would be detectable. This is of relative importance, because we could start thinking about superintelligence and safety from formal theoretical perspectives. For example, if an intelligent agent or model can self-exfiltrate using confluence and ARS capabilities and under certain premises, and this is undetectable, we can state that the problem of control is real or there is no way to fully align or control a superintelligence. However, we could also learn new things about the "certain premises" and design robust systems that, even if they self-exfiltrate, will not cause harm.
It would also be interesting to interview a synthetic biologist/ML/AI scientist/engineer to see how plausible it is that arbitrary actors can do harmful actions given a weakly aligned (in terms of biosecurity) LLM. Specifically, how useful is it to spend many resources fine-tuning and aligning these systems against expert systems level knowledge when some of those constraints can actually hinder progress in peripheral and useful areas of research? Not that I disagree with aligning LLMs against biosecurity risks at all, but wondering how useful is it, and how it can hamper automated research capabilities (which is my goal, coupled with alignment, alignment being more important in my opinion).
Lastly, I am interested in molecular materials. If you are interested, it would be cool to interview a materials scientist/AI scientist/researcher and hypothesize whether companies like OpenAI and Anthropic would consider these types of projects in the future, considering that Meta FAIR (catalysts) and DeepMind (synthetic biology and chemistry) are working on these issues. Discuss topics like how long it would take to train systems like Alphafold to design novel synthesizable materials for use in electronic devices (I know DeepMind claims to have done this but I am talking about more specific small molecules or organic polymers for electronics), whether it would be profitable, and how this can benefit society, or if there is no need to have these kinds of projects to reach AGI.
Just throwing some broad ideas out there in case any of them interest you.
### Positive Note About This Podcast
- Glad this podcast did not trigger me at all. It felt very smooth and positive.
- As usual with philosophers, not much was said initially, but it became very interesting as the podcast advanced. I like the point made about political philosophy work—having the authority to do things and decide what is right, not just doing what you think is right. I like that he spoke about differential privacy, and his awareness about the issues. Interesting thoughts on self-exfiltration and the risks of double agents and companions. I was thinking about this from an interface perspective, though. The companion perspective is obviously the next step. Yes, that's a huge issue, and there's a huge amount of coercion and behavior modification potential for these systems, as well as coercion to declare or state or confess things (things you think about or say or do, and things you don't think about or say or do; in a sense, the companion can not only coerce false confessions, but also implant thoughts and manipulate behavior). This is a critical project. One way to do this ethically is in a virtual environment where rewards are given to companions or agents based on implicit suggestions to users in the virtual environment that maximize the purchase of products based on real world data (analysis of real ad interactions), because this is obviously going to happen given the way advertising works. Then we can create ways to protect users from the influence of advertisements through companions, by not allowing certain topics or suggestions to the user (that implicitly or indirectly steer the user to buy a product they do not really need or that is not in the user's best interest). In fact, this kind of potential risk of influence is already happening in the real world (there is an example of a chatbot companion telling a user to commit suicide).
- Personally, I would like to see more scientists making decisions in the future, and fewer investors and corporate leaders who lack deep technological expertise or awareness of critical issues (science, human agency, free will). We need bold researchers and scientists, not genius kids hiding under their beds when the shit hits the fan.
- I am able to stand up fearlessly to a military leader. I want more researchers and scientists like me—stand-up people, not fearful ones. Go to the gym and be fearless kids!
- Keep up the good work.
### General Notes
Here are a few things that trigger me:
**AI Catastrophic Risks**:
- Discussing these from a philosophical standpoint seems pointless to me.
- I prefer to talk about them from a safety, security and scientific realist perspective. Topics like cybersecurity, sabotage, surveillance, privacy, freedom, coercion, ethical biases, addiction, biosecurity, and concrete adaptation problems (AI-user and Human-Computer interaction space adaptation and effects) are more relevant than thinking about catastrophic risks. Consider discussing secure programming, boundaries, federated learning, homomorphic encryption, computational thresholds, query thresholds, concrete ethics, cultural relativism, virology and synthetic biology, chemical weapons, constitutional AI, and ML methods that do not negatively impact outliers and “black sheep” (as it is the case of 'recommender systems' and many more methods in ML).
**Bayesian Parroting**:
- Talking casually about "priors" feels pretentious and like "whatever." Using Bayesian inference and calculating posterior probabilities based on priors you do not fully understand triggers me.
- I prefer to speak in absolute or relative and objective terms. I like to discuss and think in terms of representative settings, optimal criteria, memory and space criteria, observer effect-based considerations, and representability—concretely and implicitly. It feels shallow when people say "priors" unironically in their speech, or they calculate probability of things (lol), it feels as if they don't deeply understand what they're talking about.
**Effective Altruism (EA)**:
- I can't stand the EA cult. I believe more in a "greedy" approach where you do good locally. If everyone does good locally, things get better overall. EA's intermediate goals may collide with each other’s or with those of non-EAs, whereas doing good locally inherently eliminates those problems.
- To me, EAs seem to suffer from ego issues and complexes such as messiah complexes. They don't seem genuinely concerned about humanity as a whole, but maybe I'm just judging them as an outsider. They don't seem like attractive people to be friends with, at least to me.
- I understand that this is a very relevant community in AI safety, and I want to understand which companies have not been taken over by them, or if I am wrong about them, ways to interact with them that are unpretentious (the lesswrong and EA forum and the alignment forum feel a bit pretentious). Maybe it is also my slightly judgmental feeling as an outsider and without concrete knowledge of how to get into a career path in AI security, safety, and scientific research capabilities, and my contrarian and oppositional nature that causes these feelings in me about the EA community.
**AI Safety, Security, and Scientific Research**:
- I hope to find friends and potential colleagues in AI safety and security and scientific applications who fit my personality well or who are considerate with me. I am willing to be welcoming of EA people if I can understand them better and they do not say "priors" and "Bayesian" too often. This is very Aspie, apologies.
- These three fields will become closely intertwined as we get closer to AGI and automated research capabilities.
### P.S. Topics to consider for the future, if they are of interest to you:
I've been thinking about theoretical computer science and concepts that can help understand how an advanced, self-contained machine learning model can self-exfiltrate its environment (inspired by Jan Leike’s concern).
- After reading, I thought the concepts of Abstract Rewriting Systems (ARS) and confluence might explain how models (coupled with other considerations, such as interactions with compilers and misaligned goals) can stealthily self-exfiltrate from the lab or "secure system."
- It would be cool if you ever interviewed a hardcore theoretical computer scientist with knowledge of functional programming, category theory, or software engineering to explore whether ARS and confluence can crystallize as capabilities and a self-exfiltration path and whether it would be detectable. This is of relative importance, because we could start thinking about superintelligence and safety from formal theoretical perspectives. For example, if an intelligent agent or model can self-exfiltrate using confluence and ARS capabilities and under certain premises, and this is undetectable, we can state that the problem of control is real or there is no way to fully align or control a superintelligence. However, we could also learn new things about the "certain premises" and design robust systems that, even if they self-exfiltrate, will not cause harm.
It would also be interesting to interview a synthetic biologist/ML/AI scientist/engineer to see how plausible it is that arbitrary actors can do harmful actions given a weakly aligned (in terms of biosecurity) LLM. Specifically, how useful is it to spend many resources fine-tuning and aligning these systems against expert systems level knowledge when some of those constraints can actually hinder progress in peripheral and useful areas of research? Not that I disagree with aligning LLMs against biosecurity risks at all, but wondering how useful is it, and how it can hamper automated research capabilities (which is my goal, coupled with alignment, alignment being more important in my opinion).
Lastly, I am interested in molecular materials. If you are interested, it would be cool to interview a materials scientist/AI scientist/researcher and hypothesize whether companies like OpenAI and Anthropic would consider these types of projects in the future, considering that Meta FAIR (catalysts) and DeepMind (synthetic biology and chemistry) are working on these issues. Discuss topics like how long it would take to train systems like Alphafold to design novel synthesizable materials for use in electronic devices (I know DeepMind claims to have done this but I am talking about more specific small molecules or organic polymers for electronics), whether it would be profitable, and how this can benefit society, or if there is no need to have these kinds of projects to reach AGI.
Just throwing some broad ideas out there in case any of them interest you.