AE Studio Bytes
Posts
AI Alignment Weekly #8: Meet the researchers (Pt. 2)

AI Alignment Weekly #8: Meet the researchers (Pt. 2)

Feat. “The Philosophers”

April 10, 2025

Welcome back to “Meet the researchers”!

If you missed part 1 last week, this is a 3-part mini-series that’ll give you a short, sweet, and simple field guide to the top minds in AI alignment.

👉 Today’s theme: The Philosophers

These are the thought leaders who were warning about rogue AI decades ago, back when Clippy from Microsoft Office was considered cutting-edge 👀📎

Without further ado:

Eliezer Yudkowsky

Archetype: 🔔 The Original Doomer
Perspective: If we don’t make AI provably safe, we’re probably dead.

Yudkowsky has been writing about the existential risk of AI since the early 2000s, way before it was a hot topic in the tech world.

He argues that if we build a misaligned superintelligence, we won’t live to fix it... so alignment has to come first.

Quick facts:

Founded the Singularity Institute in 2000 (now MIRI) to accelerate AI development. But as Yudkowsky grew concerned about its risks, the org pivoted in 2005 to focus on alignment instead
Believes misaligned superintelligence = extinction
He is a leading voice in theoretical alignment
Founded LessWrong, a rationalist blog that helped birth the modern alignment movement
Popularized the idea of “FOOM” (fast takeoff) and coined the term “seed AI” (a theoretical AI that can improve itself recursively)
Thinks today’s machine learning tools won’t cut it to solve alignment, and wants AI development paused until we have real safeguards

Nick Bostrom

Archetype: 🔮 The Oracle of Existential Risk
Perspective: Powerful AI demands powerful governance. Otherwise, we’re rolling the dice with humanity’s future.

Bostrom isn’t a builder or lab guy. He’s a philosopher through and through.

His 2014 book Superintelligence catapulted existential AI risk into the mainstream. It shifted the conversation from can we build smarter-than-human machines to should we, and how do we make it safe?

Quick facts:

Founded Oxford’s Future of Humanity Institute in 2005 (dissolved in 2024)
Popularized the term “superintelligence” in Superintelligence: Paths, Dangers, Strategies (2014). (Bill Gates and Elon Musk gave the book a glowing recommendation when it came out, and Sam Altman called it “the best thing I’ve seen” on the topic of AI risk)
Came up with the famous “paperclip maximizer” thought experiment we covered in AI Alignment Weekly #1
Believes we need global coordination to prevent a chaotic AI arms race
Proposed the idea of an “AI singleton”: a powerful, centralized AI or regime that governs AI, to avoid uncontrolled competition
Recently proposed a framework for how society should treat digital minds in the future (like sentient AIs). They might deserve rights and protections — or at least a say in their own shutdown schedule!

Stuart Russell

Archetype: 👨‍🏫 The O.G. Alignment Academic
Perspective: The problem isn’t that AI might rebel. It’s that it’ll do exactly what we ask, in the worst possible way.

Russell isn’t just any computer scientist — he co-wrote the most popular AI textbook ever, used in 1,500+ universities worldwide.

He’s concerned that today’s AI systems blindly pursue the exact goals we give them, which can end up backfiring in spectacular ways.

(He calls this the King Midas problem.)

His proposed fix is “provably beneficial AI”: systems that know they don’t fully understand what we want, and defer to human input.

Quick facts:

Founded the Center for Human-Compatible AI (CHAI) at UC Berkeley in 2016
In 2023, signed the Future of Life Institute’s open letter calling for a 6-month pause on training any AI systems more powerful than GPT-4
Publicly warned that the current “AGI race“ between companies and nations mirrors the Cold War nuclear escalation... and that it could lead to human extinction
Wrote Human Compatible in 2019, which lays out his concept of “provably beneficial AI.” (This book was way ahead of its time. For a summary of the main ideas, check out this post from Rohin Shah.)

What’s Next?

Next week’s issue is the last one for this mini-series... but not the end of AI Alignment Weekly!

We’ll be shifting focus to three researchers working to crack open the “black box” of neural networks, to figure out how AI really thinks.

In the meantime...

If you’re working on an AI project, and want a top-tier team to help you move fast and get it done right:

Book a call with us here.

(To learn more about what we do, check out this page.)

— The AE Studio team