(Music: “No One Is Perfect” by HoliznaCC0)
Anne Brice (intro): This is Berkeley Talks, a UC Berkeley News podcast from the Office of Communications and Public Affairs that features lectures and conversations at Berkeley. You can follow Berkeley Talks wherever you listen to your podcasts. We’re also on YouTube @BerkeleyNews. New episodes come out every other Friday. You can find all of our podcast episodes, with transcripts and photos, on UC Berkeley News at news.berkeley.edu/podcasts.
(Music fades out)
Holly Elmore: I’m here, and I’m going to start with a little disambiguation. So I’m going to talk about the deep worldview behind PauseAI and the Theory of Change behind PauseAI. So I’m Holly Elmore, and I have a Ph.D. in evolutionary biology, so that’s my intellectual background.
I also want to mention that I didn’t go into academia after I graduated. I worked at a think tank for three years on the topic of wild animal welfare, how do wild animals feel, maybe could we make them feel better? And then it was from there that I left that job to start PauseAI US. So that’s the legal entity that I am the executive director of.
I’m also a co-founder of the PauseAI Movement, and the big other co-founder is Joep Meindertsma, who’s in the Netherlands, and he runs the organization now called PauseAI Global, which runs more of the digital resources. And the big Discord is the PauseAI Global Discord.
And was there something else I wanted to say about that? I had a lot of caveats to front load. Right, right. So just bear in mind that there’s PauseAI, the movement, which spiritually includes everyone who wants to pause AI. Then there’s PauseAI, the legal entities. And in PauseAI US, there’s a couple levels of membership.
So there’s a volunteer agreement you have to sign before you do a protest with us or before you run a local org or participate in our events. So that’s a level. I’d call those volunteers. And then there’s the paid staff of the org, which is quite small. It’s me and two other people right now. So there might be a number of times when it would be tempting to think of PauseAI as one thing. It’s probably not that big a deal for this talk, but I just want to let you know.
Should I be able to increment this? OK. So to be in PauseAI, what it means to be in PauseAI are only these stipulations, which is we don’t know what we’re doing with frontier AI, and this could be catastrophically dangerous. Also, even if it’s not exactly catastrophically dangerous, the people don’t want their lives radically disrupted in lots of ways, so the consent of the people matters on this.
Two, the default should be pausing, and we should have the possibility of never unpausing from AI development, frontier AI development.
And three, it’s feasible to pause through international agreement. So this is all you have to agree with to join PauseAI, if that feels right to you.
But how we get there is, I’m going to, again, make some fine distinctions at the beginning. So this is like, I don’t know, for people who aren’t familiar with Theory of Change language, there’s a difference between the vision, which is the world that we want and your mission. And then I’m listing two missions here, which is PauseAI the movement, and then this is PauseAI US’s mission.
So the vision of PauseAI is a world where there’s been a global treaty to pause frontier AI and society is thriving. But we don’t think that our actions alone are going to create that world. This is just the end state that we want.
PauseAI, the movement’s mission is grassroots activities and education to move the Overton Window to support this treaty.
And then PauseAI US is taking place in the U.S. So what our mission is to influence U.S. politicians to actions supporting a treaty and domestic safety measures, so better than nothing safety stuff just within the U.S., via grassroots and education.
So why is this what we want? Why is that the world that we want without dangerous frontier AI with the treaty, and why are we pursuing it through grassroots political actions and education? Now I’m going to get into worldview section. So one big thing, and this is I should also say, PauseAI, the reason I broke out those three things that just like that’s all it is with PauseAI, is that lots of people could want to pause AI for lots of reasons. And as long as you agree to not do violence and not break the law, whatever reason you have for wanting to PauseAI is good and you’re in the team. So it’s a big tent movement.
So to fill in why you would want to do things this way, I’m going to mostly draw from my own worldview experience and why I made the org this way. But it’s also something that’s pretty in common among people who think deeply about this. And especially since you guys are more familiar with the AI risk thinking and the various intellectual strains behind it, it’s what distinguishes us from the other groups in AI safety.
So our world is fragile and species go extinct as a rule. This is something that I definitely, definitely understood after doing a Ph.D. in biology. My goodness. So I just picked an image to show this. This is extinction since 1500 due to us, basically. So this is the rate of extinction of species in these different groups since 1500. And then even this background rate here, even though it seems pretty low, it’s enough to ensure that more than 99% of all species that ever existed are extinct now.
So this is the idea that we’re imperturbable, of course, nothing can be so bad as to affect the world that it ruins the things we can count on, like humans are here, is a pretty deep part of the worldview. And a lot of people who are into ex risk have the fragile worldview. That’s Nick Bostrom. You may recognize his fragile world hypothesis. That’s one thing he lists as a crux in this.
And then I found this cool little demo of competitive exclusion, just one of the many things that … So the idea of competitive exclusion is that if two species are trying to occupy exactly the same niche, the more fit one will outcompete the other, and it drives them to extinction. So there’s just so many ways other than just habitat destruction or anything that it’s very normal for a species to not last forever. It’s hard. The unusual thing is a species persisting.
So that leads into this idea that there are lots of equilibria that we depend on and that a more powerful intelligence could disrupt and we wouldn’t even know. Like these animals that go extinct in these categories because of mostly human development, so like habitat loss and then affecting the climate. They didn’t know and they’re not trying to maintain certain equilibria. They just live in a world where there used to be something dependable and now they’re not. So environmental equilibria, there are things that we just did not know we were doing even before we’ve done it already.
There might be social world equilibria. So perhaps it feels like you’re getting companionship from a chatbot, but you’re not missing some vital nutrient we don’t know about. At first, people thought it was cool to eat radium. Have you guys ever seen the Radium Girls? They thought they were reasoning, it’s like the sun, it’s like power, it’s good, it’s bright. And they thought it was really neat. The Radium Girls who lost their jaws eventually, it was considered a benefit of that job that they got to lick the radium paintbrush to sharpen it because they got to eat the radium. It wasn’t even a side effect. It was the point.
So we don’t know things about how the world works because our body assimilates, it thinks it’s calcium, it assimilates radium. Then it decays and your bones decay. There’s so much like this. And definitely, as a scientist, deep appreciation for how much you don’t, especially a biologist, for how much you don’t realize what is in the empirical world.
And then societal equilibrium, I think a great example of this is the institution of jobs. Is it possible to have a better equilibrium than working to live? I think so, probably. But if you blow up the institution of jobs without having an idea of how to replace it, it could be pretty horrible. So right now we have actually a pretty solid people need each other based on their abilities, something like being themselves. If that went away, that could be very bad.
So overall, it’s not that better equilibria are impossible. It’s just we need to know how to reach them, and it could be very, very, very hard. So the way that we are doing development now is clearing a minefield one and time by taking little steps at a time. When you hit a mine, what do you expect to happen? You blow up. That’s how you learn, and you can’t afford that. I’m saying we can’t afford it with technology as powerful as AI.
So the deadass expectation of many people in AI safety for many years has been that when we got to this point, the AI, once it was aligned, would figure out the answers for us. This is not good enough. This is not going to happen because we need to be the source of truth about what is good for us. You might think the AI is aligned and not know. Really there’s only one source of truth and that’s being the entity whose experience you’re trying to protect. And as I say, in general, experiments are costly. And so experiments are costly in the sense that experimenting with where the mines are in minefield on foot is costly.
But the probability of accidents, even accidents that aren’t big enough to destroy the world, but the probability of lots of accidents that will probably happen with increasing AI capabilities up until that level of power are more likely the more that we increase AI capabilities, and each one reduces our capacity to do better and respond better to accidents in the future. And then, of course, with capabilities high enough, one day the accidents could one shot.
So the scale of the danger really could cripple civilization or cause extinction, and the possibility of this alone is reason enough to pursue pausing frontier AI development. So this is a difference frequently between us and other people in the ex risk world. Ex risk doesn’t have to be these crazy high numbers. There was a vogue a couple years ago for in the 90s, everybody had P(doom) in the 90s, and it does not have to be that high to justify not rolling the dice for extinction. Just don’t roll them. 5% chance of extinction is extremely high. It’s really intolerably high.
And also there’s this idea of burden of proof issue where people will come at you and ask about, “Well, what exactly is going to happen? I don’t get it.” And if they’re just asking for how could it happen so I can understand it all, good. Give them an example. Otherwise, there’s this sense that, well, if you don’t know exactly how we’re going to die down to shot-for-shot, then why should we listen to you? I think I need to know how we’re going to live shot-for-shot from you before we make this technology. And PauseAI, as we’ll talk about in the Theory of Change section, is about reestablishing that proper burden of proof. So we already know we’re vulnerable because the world is fragile, because we know there’s lots of equilibrium we don’t know about because we know that developing AI is bringing us into higher territory of capability and intelligence.
And then finally, the people of the world do not consent to have frontier AI forced on them. So this is just one of dozens of polls I could have selected, but this is showing 80% agreement this year in April and May that the people’s priorities are safety, even if it means going slower to develop AI.
So our worldview includes, and this is sometimes whether ex risk people, a little bit of matter of debate, whether it matters that people want this technology. I would love to talk about this in question time if you guys are interested, but I didn’t put it in this talk. I think there’s a lot of worldview stuff that sees it as you can’t tell somebody not to do something. Unless you can prove that it’s the same as violence or something, you can’t tell somebody not to do something. And they also tend to believe that everybody else thinks that. But actually the people don’t generally have a problem with saying, “Don’t do things that could hurt me.” So PauseAI is trying to bring the worldview is that people don’t want this and they have a right to advocate for what they don’t want.
Oops. I think that should say worldview, sorry.
So what about alignment? The PauseAI position is different from my personal position. The PauseAI position is agnostic. It could happen or it couldn’t. We just know that we need the time and the governance to be able to pursue and see if alignment is possible. And having a pause in place protects us either way because if it’s not, we don’t unpause. Good. Phew. If we have the pause in place, then we get the quality alignment, the time to get the proper kind of research and development, then good.
My personal view, and I anticipate this could be a big question time thing, is that the idea of alignment of AI is philosophically confused. There’s no state of being aligned that isn’t contingent on external factors constantly to continue to persist.
So yeah, how much? I really tried hard to constrain what I talk about in this part of the talk about this because I could say a lot. And then I further think that alignment between entities of vastly different capabilities may be impossible. So it may be that anything that seems like alignment that we’re familiar with today is contingent on roughly equal levels of possibilities. I have never done rigorous work to try to prove this or go after it, but this is my worldview, and I’d love to talk about it if people are interested in this part of it.
And a reason that I think this is because of my background in evolutionary theory, a lot of it. So I did a great deal of thinking about genetic misalignment, which is called meiotic drive, the form of genetic misalignment called meiotic drive. A very, very quick background on meiosis. I would also love to get into this. I do have, it’s on the next page, I have something you can follow to find a long blog post I did on making this comparison.
But genetic meiosis is the process that governs our genes and makes the likelihood of getting to the next generation a fair process. And long story short, it allows natural selection to work because it means that genes have to work together to make an organism, and that’s the only way to reproduce because genes could actually reproduce themselves through other means. And if they could do that successfully, then it would harm the integration of the organism. And in the world … Oh, you can’t see this at all. OK, sorry.
In just the world of organisms, so this axis is cooperation, more cooperation to less cooperation. And this is more conflict to less conflict. And this is about how well-associated the pieces of a whole are into … So this paper, I love this paper. It’s Queller and Strassman 2009. It’s something like toward a view of organismality. I would highly recommend this paper. But the idea is that you get different levels of cooperation and different levels of integration.
Our cells, pretty integrated. But if you look at other animals, you will see … So what’s a great example of this? This is maybe getting a little in the weeds. Sorry, I don’t want to get too confusing. So we think usually of our genes as their job is to be in the genome making the organism. But actually, no, they’re one level of organization of this whole. And our genes are fairly tight, but there are some configurations of genes, like alliances of genes, which are not very tight. Amoeba can come together and work together sometimes, and then they separate. Our cells can’t separate and be their own thing.
So genes can also do this. And this is a very mind trippy thing to learn, and it’s tough. I’d love to say more if people are confused, but I also don’t want to confuse you. The upshot is that when we think of alignment, we’re thinking about unitariness that probably doesn’t really exist, even in things that we think are aligned. Like a person with their own interests, that sort of thing.
And I do also want to say from the extinction point, this is a tree that this is one of 28 individuals left in this species, and it’s rare to catch an extinction that you know is caused by genetic misalignment. But this species is going to go extinct when these individuals die, basically, because it does this thing where there’s conflict between parts of the partitions of the genome. So the paternal genome kicks out the maternal genome early in development. And because of that, it’s accumulating mutations and stuff, and that’s going to cause this species to go extinct.
Just to give you a flavor of what can happen even at the gene level and how much our ideas about alignment that are being applied to alignment, like humans or a human with AI, are very simplistic and very broad brush about who are individuals and things like that, and topics that I think are enough to make it so that any seemingly adequate model breaks down.
So this is on my Substack. I have a blog post to compare meiotic drive to gradient hacking. I don’t know who’s all into all of those pieces, but I did write this. And then the point I want to make here quickly is that evolutionary theory gives uncommon insights on this sort of stuff, like thinking about misalignment and technical safety. So almost always misalignment is talked about in terms of agents being misaligned. But based on meiotic drive and based on a looser understanding of what makes something an organism, how integrated it is, I think that circuits not getting with the program is a relatively neglected possibility.
So I probably won’t get into that too deep, but the circuits can … The simplest example would be it’s possible, based on all of my thinking about meiotic drive, I then came to looking at ML systems for safety reasons. And the first thing I thought is about little cabals of circuits that resist updating. So the simplest example of meiotic drive is two genes that want to, they both want to be inherited more in the next generation more than the rule, which is like they can’t go over 50%.
And one way to do it is, so in mouse sperm, there’s this thing called the t-haplotype that has a poison that it makes. And then also if you have the whole T locus, you have the antidote. But if that gets broken up by recombination, then the gametes, the sperm that don’t have both will die. So it can make a poison that kills everybody but the ones that have both. And then that way even though the organism is less fit because now it has less sperm overall, some of it’s been killed, the relative fitness of that genotype is 100% or close to 100%, so that’s rewarded by natural selection.
So I mean, I haven’t gone further into empirically looking into these things, but it seems to me very likely that there are circuits that just resist updating by means like this. Whatever is done to disturb them increases the loss, and so they just don’t get updated.
Will this matter or not? There are things about … I shouldn’t get into it. You should read this blog post if you’re interested in how that could possibly matter or not. But there’s just a lot that’s left on the table I feel like currently with what the field is. The evolutionary knowledge about the world can really help out.
A deeper worldview thing is I believe a lot in ecological validity, and that’s something that almost never comes up in ex risk discussions. So in biology, you can have a model that works and is internally consistent, and that’s one good thing, but in order for people to really care, it has to also be ecologically valid. So there’s many things that could exist and make sense that are not ecologically valid. It has to be true according to empirical measurements in the real world.
I have put so many confusing terms in here. I’m sorry. I’m going to explain what I mean by … Most of these I made up. So deep Goodharting, everybody knows what Goodharting is, right? No, OK. So Goodharting is it’s like making a measure into a target. So it’s like you have a real goal, and then you’re like, well, in order to approximate my goal of having a good marriage, I’m going to go on one date a week. And then if it gets to the point where you’re going on a date at the expense of a good marriage, you’ve lost sight of what you were trying to optimize for. That’s Goodharting. It can be similar to reward hacking. It’s like a form of Goodharting.
So this is probably even too confusing. I almost want to cover it up while I’d say this one because I don’t want to distract you guys. Sorry. So I have an appreciation that things can really seem, like the models and the abstract things we come up, with can seem true. And it’s really important that things we do try to find abstract models that make sense because that’s good for prediction.
But with ourselves, we need to be careful not to do that. We are the source of truth on what actually is what we like, what is good for us, our thriving. Our experiences are the source of truth. And so if you have an idea that some have about AI that, well, the AI would know how to be better, and so I would like to be changed and be better, I think what makes you happy is probably an extremely complex utility function. And if we think we’ve captured the utility function, we probably haven’t.
So proposals for alignment, generally there have been different proposals over the years. So you go back to the ’90s and the proposals are about programming this AI correctly, an analytic symbolic AI, and giving it the right utility function, finding the Von Neumann utility function of a human and making sure that AI has the same one, and then it’s like they’re the same entity at that point. So it’s all about finding the utility function. I’m very skeptical about getting it at all in the first place, so I don’t think you can give up. And then of course, in that scenario, once you give the AI the utility function and let it go, then it’s in charge. It’s more powerful always.
Even more today’s scenarios of alignment proposals, like scalable oversight and super alignment, they’re more ecologically valid. They keep more data. They’re not trying to nail down what is the utility function. They have no idea. But they’re taking humans out of the loop so that the process, it’s not synced up with what our utility function is. It is a more complicated … It’s not trying to simplify what is the utility function, what gets rewarded, but it’s still not connected to the source of truth.
So I think the term I have put on this is ecological validity for utility functions. And then the thing I’m contrasting it to I call it deep Goodharting. The idea that, well, I know what my terminal values are, so I’m just going to change myself in the direction or I’m going to allow myself to be changed in the direction, or I’m going to make AI that is a version of me that is in that direction, I think it’s very, very likely losing sight of truly what our true utility function is and the thing we’re actually trying to preserve and optimize.
So long story short on here, I don’t think that through AI right now, we’re very likely to get what we want out of alignment. So alignment feels like a fantasy. To me, most days I think alignment is a fantasy. The level of alignment or the way that alignment has been talked about in the past is probably a fantasy.
But the org is officially agnostic. You can have a different opinion on this. The point is, no matter what, we should be pausing now, figuring this out during the pause and not unpausing until it’s safe enough.
OK, so now to the Theory of Change section. So what are we doing about this worldview? Why have we picked the means that we have, the mission that we have? OK, so I wish I had covered this.
So the Overton Window, first of all, what is the Overton Window? It is the window of thinkable sentiments. And I liked this graphic. This is just the one on Wikipedia. But it’s if you’re smack in the middle of the Overton Window, then obvious. Of course, this is how things are. It will enforce how things are in the middle. And then as you move away, things seem less and less thinkable. Finally, when it’s outside of the overtime window, it’s unthinkable.
So our belief, the PauseAI Theory of Change, is that the people actually want a pause. They don’t all know it yet. They don’t know enough about this issue to know that’s the name of what they want. But according to polls, we see their priorities are safety. They don’t want catastrophic problems. Even if they don’t know about the possibility of catastrophic extras, they know that they don’t want a harmful disruption to society.
So when they understand, through education, what’s going on, they’re going to tell their representatives, they’re going to exert their power, they’re going to tell the people around them, and the Overton Window will shift, and it’s going to compound. It’s going to make it so that now the people who were one step over are exposed and so on and so on.
And this is the basis of our Theory of Change is that we have this untapped will, but the way people are thinking about it, they don’t know enough or there’s certain emotional hangups or there’s pressure, obviously from industry and motivated actors, to stop them from realizing this. But if we educate them and then we also present the pause, use techniques to push the Overton Window, then we can get what we want, which I think is … Nope. Then we can move on to the next idea, which is …
So one thing that’s keeping pause outside of the center of the Overton Window, outside of becoming policy is that it’s very difficult emotionally to think about a lot of the issues involved in ex risk. And I’m sure I’m probably not telling you guys that for the first time. You’ve probably experienced it. So a lot of it is holding space so that it’s safe for people to think about it long enough to learn about it, to think, to decide, is this what I think?
And it’s difficult for a number of reasons. It’s difficult because you have to think about the possibility of being in a lot of danger, which of course we don’t want to be true. It’s difficult because there’s a lot of pressure from people around you not to pull them into something scary. Or especially probably more our circles, there’s a lot of pressure not to push against a lot of … I personally know many people who work at AI labs. It’s very difficult for me to tell them, “You are doing a bad thing. I think you’re doing a bad thing.” People don’t want to do that with their friends.
But if this issue were more in the middle of the Overton, or if pause were more in the middle of the Overton Window, then it would be their friends who we were like, “I know.” They would be the ones who felt like, “It’s not really OK that I’m working at the AI lab, but do you accept me anyway?” That’s the effect of the Overton Window, just what everybody else is thinking around us, what’s OK to think.
And then I have a short blog post on this that defines everything I just said, but by contrast, which is this concept of rhetorical answers, which is another coinage of mine … So the example I give there is people saying, “Well, humanity deserves to die if that’s true.” They don’t believe that. And you wouldn’t be able to get away with saying that about something that was more central in the Overton Window. You wouldn’t be able to get away with saying, “Well, murder should be OK. It’s really hard to deal with.”
But because pause and AI danger and handling AI danger through governance is on the edges of the Overton Window, people can, instead of having to go through the hard work of thinking the scary things, they can go … wave it off and say something like, “Well, we deserve to die,” or say something like, “Well, AIs can’t make new knowledge, so there you go, so nothing will ever happen.” So I have a list of that kind of rhetorical answer on AI.
And I think probably the single biggest rhetorical answer I hear is something along the lines of, we’re cooked, it’s over, it’s too late, it’s inevitable. And generally, if you answer a few questions for the person, they don’t really think that. Or you dig a little deeper, it’s not that they really think that. It’s that they don’t know what to do next. They don’t know what the next step is. They feel like they would be unpopular. They feel like they would be missing out on the cool AI stuff. If they even entertain that possibly it’s bad, they want to be free to just think it’s cool and keep playing with their friends. There’s a lot going on.
So a part of our Theory of Change, it’s simply to hold space. And the way we hold space is by having a compassionate education, education that is not fearmongering or especially too … It’s emotional. I mean, it is an emotional issue, but trying to keep a level head, set a tone that allows people the psychological safety to consider what we’re saying.
And then also just making the pause position more popular is a way to hold space for people because the more that they have heard of this position before, they know people who hold it, or they just know that other people will know if they hold the pause position that it’s one of the positions. All of those things make it easier for them to then deal with the difficulty you have to go through to really understand it or decide if you believe that this is what to do.
I also want to cram in the concept of inside and outside game real fast, and then I’ll tell you what rebalancing the center is once I’ve done that. So inside game is working within industry, within … Generally, it’s like working within a system that you want to change for the purpose of trying to change it from the inside. Outside game is putting external pressure on that system from the outside for the purpose of trying to get it to change. And you can get a beautiful synergy between those two. What the people do outside affects what the people do … It makes whatever’s inside look a lot more moderate, and so you can really play off each other.
This is taken from a bigger talk where these colors mean something, but the ball is supposed to represent the Overton Window and where it is. And just by the way, in AI safety, for some reason for the last 10 years, it’s very unusual for a cause to evolve this way, but pretty much all of the AI safety work is an inside game, which is weird. Most social movements or issues start with people on the outside saying, “Hey, this is bad,” or trying to raise awareness about it. And that’s what the public understands more. They think pretty much any member of the public thinks, well, if I thought something like AI danger was happening, then I would, of course, tell everybody. I would scream it from the mountaintops.
There’s historical reasons that it ended up this way, which if you like talking about it, we could talk about it in question time. But because of that, it was really valuable to start doing more outside game stuff. This is an example of Overton Window pushing. So this is one way you push the Overton Windows. So it goes from there to there just because you put a heavyweight far on the outside game side.
So PauseAI is talking directly to the public, trying to be understandable. It doesn’t do stuff within industry. We’re not trying to be diplomats. We are saying you’re doing the wrong thing. This is dangerous, and you need to stop in a way that’s really legible from the outside. So suddenly it makes a lot of good things. It makes for the entire system, it makes it more thinkable that there should be external regulation of the industry than if all of the people trying to do something good are within the industry and other dynamics there. So this is what’s called rebalancing the center by having a radical flank.
I put this in quotes because I think PauseAI is supremely moderate. We’re totally nonviolent. Don’t even do anything illegal. Literally, I assign a volunteer at demonstrations to make sure we don’t block the sidewalk because that would be illegal. We’re scrupulously law-abiding and our line is just we shouldn’t make dangerous AI. So in a more objective sense, I think PauseAI is very moderate, but because of how loaded AI safety was toward academics and in the industry, it has a big effect to even just be moderate, be more outside.
Geez, I thought I made this short. So going on with the Theory of Change, we’re trying to shift the burden of proof back onto AI developers to prove that it’s safe to proceed rather than on the person saying, “Hey, this could be dangerous to prove that it would definitely drive everyone extinct.” So what I really want to emphasize about this burden of proof shifting is that it is not a technical discussion. So this is a trick that people pull all the time. They’d be like, “Well, where is your ML Ph.D.? How are you qualified to say that what’s going to happen?”
And that is not the discussion at all. The discussion is about what level of risk is acceptable, and who gets to decide. There’s no answer. There’s no because you’re in ML, you know what the right level of safety is. And who gets to decide should be the people at risk. Yes, there’s things that scientists understand that the public doesn’t, so you could be wrong in what you tell your representative you want or something, it could be unnecessary. But this is mostly not a technical discussion. This is about what risk is acceptable.
And shifting the Overton Window towards these more conservative safety standards is how within the industry, how industry with it, ideally, the pause would impose externally to shift … The risk tolerance is just crazy now. We’re just totally frog boiled on it. Elon Musk says the risk is 20%. His PDM is 20%, and people are like, “Oh, it’s low. That’s why he is doing all these risks because it’s low.” That’s one in five. That’s worse than Russian roulette odds. We don’t have to do this, and we don’t have to listen to them just because they want to make it. OK, it’s our lives. So this is about just taking our power back and not being rhetorically tricked into feeling helpless or like there’s nothing we can do.
Another thing is who’s heard of warning shots? Should I explain what that is? OK, I’ll explain because it’s not it. So there’s this idea that … It’s a funny story. When I first started doing PauseAI organizing, I went to some big names, which I won’t name in AI Safety at the time, like` funders. And they told me, they were like, “Oh, why don’t you just wait until there have been warning shots because then the people will just rise up.” They think there’s nothing to organizing. So, ha.
But it’s been part of the AI safety worldview for a long time that there will be these smaller catastrophes, hopefully, they hope for it, you can read more about this in my blog post about it, but they’re hoping that there will be small catastrophes that just show everybody, basically, we are right and you should do what we say, and that’ll make it easy. And so it’s not uncommon to hear in the AI safety world people saying, “Well, we just have to hope for warning shots.” And I think, one, that’s just the wrong headspace to be hoping for a disaster at all. We should always be trying to stop them.
But two, they’re not just going to happen by themselves. Even if there are these disasters which could well happen, people aren’t going to know what they mean unless they’ve been told, unless they have the education, unless they have the ability to interpret the events themselves. It’s not just obvious what it means. So a part of our education, the reason that we educate people is so that they will be able to interpret events as they happen. Maybe you’re not convinced now exactly by my story, but I tell you what you would expect and you have a deep enough understanding. And then when the moment happens, when the thing you were waiting for to answer your question happens, you have a, whoa, OK, I know it’s true.
And I had one of these with ChatGPT. So I had known about AI danger for a long time or the possibility, but there were, I don’t know, I had mildly negative feelings toward it. I didn’t like the way that people talked about it, but I knew enough about it. And I had this background in neuroscience and animal minds and things like that. And ChatGPT, then I especially thought that I might never see a machine be able to speak in natural language. I knew a lot about linguistics. I knew a lot about the Chomsky position and his debates with people.
And so when I saw ChatGPT talking like a human, I was like, wow. Oh my God, so much my … And I have these images. It’s like setting up for people … For a warning shot to work, you have to have a lot of dominoes set up, and I had my dominoes set up really well. I got a lot of dominoes set up. And when that warning shot happened, it was just like boop, boop, boop, this means this, means this. And I thought … and then six months later I started PauseAI US. A lot changed. I went from being like, I don’t love this field to quitting my job and doing it.
But if people don’t have that background, then … So an example here is somebody learns that a chatbot can help a person assemble a bioweapon. Well, if they don’t have any educational reason that this lands, then they’re just like, well, then it’s not … I mean, so what? It all seems like their fault. I don’t understand. It’s not like the tool made them evil.
And so then when an actual bioweapons-powered attack happens, they’re like, “Yeah, I heard about this before,” and they think it’s not connected. So the idea, the whole point of warning shots or hoping for warning shots was supposed to be that people would start to act as if the real thing were happening and start to get prepared, but it can go the other way too, I fear that … You know. But when you educate them, you increase this person’s domino so that they do know what it means when the warning shot happens, then maybe now they get it.
So with our education, we’re thinking … Whoop, sorry. Just in general with education, it would be great if we could just say, “We predict this,” and then it would happen, and then everybody would know we were right. If I did know what to predict, I would be trying to stop it more directly than that, probably. But if I didn’t know what to predict, I would do that. I would tell people like, “Hey, look out your window on this day and you’re going to see this thing and you’re going to know I’m right.” That would be great.
But we don’t know what’s going to happen with it. This is really the nature of the danger of AI. Is it’s intelligence. It’s like creative decision-making. It’s finding ways to get what it wants that you didn’t think of and having the ability to do powerful stuff that causes lots of bad side effects.
So the education strategy has to be general. It has to be understanding why we’re already worried and what we fear coming up. And then people have to have their own ideas that they arrived at themselves about what they would expect, what do they predict so that they can have that experience of like, “Yep, here it is. It was right.”
So warning shots are and they aren’t a part of our Theory of Change because we’re not … I think that in the absence of … I think the best thing we can do both to take advantage of warning shots, and if there aren’t any warning shots, is just to prepare and educate people straightforwardly.
Education is not the only thing we do. Cut for time. Everything else we do I would love to tell you. But what can you do? What can academics and students do to help? On research, just a note, I’m increasingly black-pilled on this, which is that pretty much every time we get knowledge about these systems, it is dual use. So there’s really not knowledge that’s only good about these systems.
And we don’t always know. So cautionary tale, mechanistic interpretability. Many, many years, this was always the good one. The thing that would definitely … If we just had mechanism interpretability, then it would be fine. I wish I had thought about this more deeply at the time because, for one, who’s reading it and who are they using it for? And two, can’t it be used for recursive self-improvement? Probably cheaper than doing more training runs, right? Especially if AI can interpret things that we can’t interpret and it knows how to make changes, and then that’s out of our hands, and the opportunities there for explosive growth are really scary.
So even the thing that everybody thought was the most benign is actually dangerous. Now it looks like it will be dangerous. I mean, just here’s a prediction from me. You will see interpretability being used in recursive self-improvement. I think probably soon. I think that’s why Dario wrote that essay about interpretability and tried along with leading around for an interpretability group. I think he sees that as a potential future use.
So be careful is all I’m saying. I would love to talk more about this if people have specific questions, but I just fundamentally, just all of this stuff is dual use, and we don’t know where things are going. And until I think what will make research safe is governance, external authority and oversight and accountability to what’s good for people. And I think just like we can make useful nuclear technologies because we have that governance, I think we will be able to have useful AI in a world with the proper governance, but not before.
I really, really strongly caution against trying to use technical means to shortcut that process. I mean, there’s a lot of temptation to do that in AI safety, especially because the people in it are researchers, and I really think there’s no shortchanging this process. To really have safe AI, we do have to have good governance no matter what.
But directly, what can you do, which I think is pretty much great all the time, is joining PauseAI Bay Area, and I’ll have info for the organizer below. Also, if you live somewhere else, I can help you find other places too. We have 30 groups now in the U.S. You can help to start, especially a university group. We’re really working on our pipeline for getting people started with university groups. And so we have some stuff to offer you, but also you would get a chance to help us to learn the process of doing that.
Writing op-eds with the authority of, I’m a student of this, I’m a professor of this, that’s often a great place to start with op-eds. And then the local angle, then it’s like we have general guidance for op-eds. We can definitely help you if you are interested in doing that. It’s really nice to have a lot of different people, like multiple votes from multiple places.
And then just talking to people around you about AI danger. And if you do believe in Pause, talking about Pause, because every time you do that, every time people hear this from someone they respect, that shifts the overtime window in that direction.
OK, this is just something you can take a picture of if you want to do … These are from, increasing difficulty slacktivist actions where you’re not having to be in the group or having to be a volunteer or anything like that. And then the hardest one listed here is starting a local group, and then you can see where our listing would be. You can also talk to me if you’re interested in that. We have an application and you would go through an application and onboarding process.
Everybody get that? Great.
Speaker 2 (off camera): [Inaudible]
Holly Elmore: Oh, yes. You also may have just the slides, if you want. And then here is our next event coming up on Thursday. That is a happy hour. OK, so the … That is not right. I’m sorry, CPM. I hope you guys don’t make that mistake yourselves. OK, so it’s definitely just at Gmail. So this is Alvaro Cuba is the local organizer for the Bay Area, and this one is going to be in SF on Thursday. And there’s about 30 people signed up currently. So he does an amazing job. He’s better than I ever was at driving … That’s already bigger than any half hour I ever ran, so.
And then, oh yes, so if you want to just … The lowest commitment thing is just getting on our Discord and talking. There’s our join link that never expires. And to donate, this is the one you should use if you just want to make a small donation. If you want to add bigger donations, please feel free to do so. Talk to me.
And then I want to open it up to questions, but I first want to say what my question to you is, which is, what resources do you want from me, particularly thinking teaching resources? What would you want to put on a syllabus? I’ve been asking people that question lately. So what do you want from me to help you advocate for PauseAI should you be interested in doing that?
(Music: “No One Is Perfect” by HoliznaCC0)
Anne Brice (outro): You’ve been listening to Berkeley Talks, a UC Berkeley News podcast from the Office of Communications and Public Affairs that features lectures and conversations at Berkeley. Follow us wherever you listen to your podcasts. You can find all of our podcast episodes, with transcripts and photos, on UC Berkeley News at news.berkeley.edu/podcasts.
(Music fades out)