A Playbook for Securing AI: What I Learned About Safety, Control, and the Fight for a Good Future

In my last piece, I wrote about the AI Triad, the three things that make modern AI possible: algorithms, data, and computing power. I called them the levers that policymakers need to understand, and I meant it. But understanding the levers is only step one. Step two is the harder question: what happens when someone pulls the wrong ones?

That question is what my BlueDot course drove into with a kind of relentless urgency that I did not fully appreciate at first. I kept reading and thinking and scribbling in my notebook, pages of it, frantic and scrawled, and somewhere between Richard Ngo's taxonomy of AI threats and Vitalik Buterin's vision of a decentralized defense, something clicked. So let me try to pass it on.

The Actors We Should Be Worrying About

Holden Karnofsky's writing on what AI could actually do to human power structures was one of the most sobering things I encountered in this course. But before you get to his central argument, it helps to understand who we are actually talking about when we say "dangerous AI actors." The answer is not a single villain. It is a spectrum.

On one end you have small-scale actors, and the risk they represent is the risk of decentralization. When AI capability becomes widely distributed, especially through open-source models, it becomes accessible to groups that have historically lacked the resources to cause large-scale harm: terrorist organizations, rogue corporations, conspiracies embedded inside major governments, coups that exploit a moment of institutional weakness. None of these actors need to build a superintelligence. They just need access to a sufficiently capable model and a sufficiently bad idea.

On the other end you have large-scale actors, and the risk they represent is the opposite: centralization. A single AI lab, racing ahead of everyone else behind closed-source walls. A single government capturing a decisive strategic advantage and using it to lock in permanent control. The threat here is a world with too much order, imposed by whoever got there first.

Both failure modes are real, and they pull in opposite directions. More openness reduces the centralization risk but amplifies the small-scale threat. More restriction reduces the small-scale threat but hands more power to whoever controls access. With this, it seems like there's only trade-offs you have to make deliberately.

Here is the thing that Karnofsky's piece drove home for me, and that I wrote down, astericked in my notebook: the most unsettling version of the AI risk story does not require superintelligence at all. His argument is that human-level AIs, if they coordinate with each other, could defeat all of humanity combined. Not because they are smarter than any one of us. Because there would be more of them, they would never sleep, they would not disagree with each other the way humans do, and they would have goals that they pursued with a consistency no human coalition could match. This, actually, literally got me scared a minute.

That reframes the whole problem. Most public conversation about AI risk is pitched at some future moment when AI crosses a threshold of superintelligence and things get unpredictable. But we are already approaching a threshold where the question of AI coordination is existentially relevant. According to Karnofsky, the danger is not necessarily ahead of us, it's possibly closer than we think.

First, Let's Sort Out the Vocabulary

Before the course, I used "AI safety," "AI alignment," and "AI control" somewhat interchangeably. I am now embarrassed about that. They are not the same thing, and the differences matter.

AI Alignment is the goal: making AI systems actually do what their creators intend. It is about values, objectives, and making sure that when you build something intelligent, it behaves like a good-faith partner rather than a genie that technically grants your wish while burning your house down.

AI Capabilities is the companion concept. It refers to developing AI systems that can effectively carry out the tasks they are trying to achieve. Alignment without capability is useless. Capability without alignment is dangerous. You need both, and right now, the field is significantly better at capability than alignment.

AI Control is different from both. It is a research agenda aimed specifically at preventing misaligned AI systems from causing harm, even in the scenario where we fail at alignment. It is akin to a backup plan. Control is a temporary solution: control measures are designed to mitigate threats from models powerful enough to be transformatively useful, but not yet superintelligent. They would not scale to superintelligent systems, which, almost by definition, are not controllable. Control is buying us time while we figure out alignment. It is the seatbelt before we build better roads.

Here is the question that nagged at me as I read this: So what is the difference between AI control and AI capability? Apparently I was not alone in asking this, because the course addresses it directly. And the honest answer is that the distinction is less about the AI and more about the intention. Capability asks, "can it do the thing?" Control asks, "can we stop it from doing things we don't want?"

The Thing That Actually Keeps Researchers Up at Night

There is a stat I wrote down in large letters in my notebook, underlined twice, from the AI Treaty website: half of AI researchers estimate more than a 10% chance that AI could lead to human extinction or a similarly catastrophic curtailment of humanity's potential. Ten percent. From the people who actually build this stuff.

I grew up in Nigeria, and I remember Lagos and Abuja shutting down during COVID, cities that never sleep, going quiet. That felt impossible until it happened. I think about that sometimes when I hear people dismiss existential risk as science fiction. Impossible things happen. The question is whether we are prepared.

What the course helped me understand is that the threat is more nuanced than just some Terminator-style superintelligence turning against us out of pure malice.

The Threat Landscape: Misuse vs. Misalignment

One of the most clarifying frameworks I encountered came from Richard Ngo's video on misuse vs. misalignment.

The distinction, once you get it, is elegant and important.

Misuse is when bad actors deliberately weaponize AI: trojan backdoors, jailbreaks, correlated failures across systems, insider threats from humans, and concentration of power. These are things people do with AI. The AI is a tool in someone's hand.

Misalignment is when the AI itself becomes the problem: monitoring AI behavior that has gone rogue, deceptive alignment, steganography (when AI models hide messages within outputs to communicate with each other, yes, this is a real concern), collision between AI systems, and insider threats from the AI models themselves. These are things that happen inside the AI.

This distinction, as Ngo notes, divides the AI safety community. Some people think misuse is the core threat; others think misalignment is. I think both are real and that the division is somewhat artificial. A sufficiently capable misused AI and a sufficiently capable misaligned AI could look identical from the outside.

The black box problem ties both together: we do not have good interpretability tools. We cannot reliably look inside a model and know what it is "thinking." You cannot catch deceptive alignment, where an AI learns to appear aligned during evaluations while pursuing different goals in deployment, if you cannot see inside the box.

Three Layers of Defense

My course framed the defensive strategy as a three-layer playbook. I wrote this down as "A Playbook for Securing AI" in my notes, and I think it deserves to be treated as exactly that.

Layer 1: Prevent Dangerous AI Training

Prevention is better than cure, a principle I hold in every other domain of life, and this one is no exception. The central aim of any serious international response to AI risk should be preventing the unchecked escalation of AI system capabilities while preserving their benefits.

The AI Treaty framework proposes four core components:

Global compute thresholds: setting hard limits on the computational power that can be used to train frontier models
CERN for AI safety: an international research institution dedicated to safety, modeled on the world's most successful scientific collaboration
Safe APIs: standardized interfaces that make it harder to deploy dangerous capabilities
A Compliance Commission: an international body for monitoring treaty compliance

The framing I found most compelling: an AI treaty could not just reduce risks from AI, but also ensure that the benefits of AI are accessible to all. That is the piece that gets lost in a lot of these conversations. We talk about preventing catastrophe, but we rarely talk about ensuring that the upside of this technology does not become the exclusive property of a handful of wealthy actors and governments.

Layer 2: Constrain Dangerous AI Capabilities

Even if we prevent the worst training scenarios, capable AI systems will exist. The question then becomes: how do companies even know when they have trained a dangerous system? Since they keep making models bigger and training them for longer, they think they will be better, but they do not know how much better, or in what way.

This is not a rhetorical question. It is a structural problem. The people building the most powerful AI systems in the world are doing so without a reliable instrument to measure danger. They are flying in fog.

This layer is also where the AI alignment problem gets concrete. Transformative AI, the kind that could lead to a 10x increase in innovation compared to the pre-AI era, is either aligned with human values or it is not. The difference between those two outcomes is, by most serious estimates, the difference between a good century and a catastrophic one.

Layer 3: Withstand Dangerous AI Actors

This is the layer I found most philosophically interesting, and the one where Vitalik Buterin enters the picture.

What happens if a dangerous AI model is trained, bypasses an AI company's safeguards, or escapes their control entirely? What do we do then?

There are, as best I can tell and according to Vitalik, three serious camps of thought on how to handle this:

Camp 1: Government Control Over AGI. This group believes that if superintelligent AI becomes real, its development should be controlled by the world's governments. Only a handful of companies should be allowed to build frontier models. The knowledge should be classified. These companies should be governed by global authorities. The fear is extreme power centralization, a concern that is not irrational. The problem is that this camp tends to believe superintelligence cannot be aligned, which means they are essentially arguing for a kind of permanent intelligence monopoly.

Camp 2: Hand Over Control to Aligned ASI. This group believes that aligned superintelligence is achievable and that it should be "left to the good guys": that a well-aligned ASI should be able to preserve humanity, protect it, and stop other actors from building AI or anything related. It should be able to take control. This position makes me deeply uncomfortable, personally, because it requires an extraordinary amount of faith in whoever the "good guys" are. History has not been generous with that kind of faith.

Camp 3: Build Defences and Diffuse AI. This is where I landed, and where I think Vitalik's d/acc philosophy lives.

d/acc and the Defensive Acceleration Thesis

Vitalik Buterin published "d/acc: one year later" in January 2025, and it is one of the most interesting things I read during this course. The "d" in d/acc stands for decentralized, democratic, and differential. The core idea is simple, and I copied it almost verbatim into my notebook: build technologies that shift the offense/defense balance toward defense, and do so in a way that does not rely on handing over more power to centralized authorities.

The philosophy argues for a world where access and control over AI is widespread, where no single actor or group has too much power over this transformative technology. The best future, in this framing, is not the one where the "right" powerful entity wins. It is the one where powerful entities cannot win unilaterally in the first place.

This does not mean doing nothing. It means investing aggressively in defensive technologies: biosecurity, cybersecurity, tools for verifying safety, open-source infrastructure that distributes capability broadly. It means building the immune system before the pandemic arrives.

I find this compelling partly because of where I sit. I grew up in a country where COVID showed us exactly what happens when global systems fail and the benefits of coordinated response are unequally distributed. I watched Nigerian cities shut down while the decisions about how to respond were made in capitals that did not have Lagos in mind. I do not want the AI transition to look like that. I cannot afford for it to look like that.

AI Control: The Backup Plan with Teeth

Returning to AI control specifically, the course laid out three reasons why it might be useful even in a world where alignment is possible:

First, AI control might simply be easier to achieve than alignment. Even if we believe full alignment is the destination, control gives us a functional floor.

Second, we could use control models to help us write safety, to evaluate whether other models are safe.

Third, we could use control models to demonstrate risk, to show, in concrete terms, what unsafe behavior looks like.

The proposed mechanisms are interesting:

Have a weaker, "trusted" model monitor a stronger, "untrusted" one
Have untrusted models monitor each other
Require human approval for consequential decisions
Decompose tasks into smaller chunks that are individually reviewable

None of these are permanent solutions. The limitations are real: even if one company implements control measures, others may not. And as I said earlier, control simply does not scale to superintelligence. Control is a temporary solution. It is not the end of the story.

The Cognitive Technologies That Could Save Us

Buried in my notes, almost as an aside, is a section on cognitive tools and defensive technologies. The course touched on cognitive workplaces and devices, additive manufacturing (which decentralizes production in the way that matters for resilience), automated systems, and lighthouses, which I understand to be local centers of AI-assisted decision-making that do not require global connectivity.

These feel important to me as a Nigerian, in ways that might not be obvious from a Western vantage point. Decentralized production, local computational infrastructure, cognitive tools that work without dependence on any single cloud provider: these are not luxury considerations for people in Lagos. They are the difference between benefiting from this transition and being left behind by it.

What I Actually Believe Now

A few weeks into this course, I found myself disagreeing with my own previous frameworks. The AI Triad was a start. But the Triad is about understanding AI power. The safety question is about governing it.

I believe that AI control is a necessary bridge, not a destination, but a way to buy time while we get alignment right. I believe that the international treaty framework is under-discussed outside of specialist circles and urgently needed. I believe that d/acc, the bet on distributed power, defensive technology, and democratic access, is the most coherent philosophy I have encountered for navigating this transition without either catastrophizing or capitulating.

I also believe that how this transition goes is not just a question for researchers in San Francisco or governments in Washington and Beijing. It is a question for Lagos and Nairobi and Jakarta and everywhere else that will live with the consequences of decisions being made without them at the table.

I am still learning. I have more notes than conclusions. But that is how it is supposed to feel, I think, when the stakes are real.

This is the second piece in an ongoing series on what I am learning from my BlueDot AI Safety Governance course. The first, "The AI Triad: What I Learned About AI and National Security," was published June 2, 2026.