AE Studio Bytes
Posts
AI Alignment Weekly #6: Four major debates shaping the future of AI

AI Alignment Weekly #6: Four major debates shaping the future of AI

Alignment isn’t a settled science. Here’s where the experts disagree...

March 27, 2025

It’s that time again… strap in for issue #6 of AI Alignment Weekly!

If you’re new to the field of AI alignment, you might think that researchers are a unified bunch, all nodding in agreement about how to save humanity from our future AI overlords.

(Especially if that AI has a troubling fondness for paperclips.)

But in reality?

Researchers argue.

...a lot.

Some of the sharpest minds in the field hold wildly different views about how to solve AI’s existential risk.

Which is why this week, we’re going to explore four of the most hotly-contested questions dividing the alignment community:

Debate #1: Fast vs. slow takeoff

(Will superintelligence arrive with a bang or a whimper?)

It’s possible that we’re heading for a “fast takeoff”, where once AI hits a critical threshold, it rapidly self-improves and shoots from human-level to godlike intelligence in a matter of days or weeks.

Or (if we’re lucky), we might get a “slow takeoff” instead: a steady, gradual climb in capabilities over years or decades. Less of an explosion, more of a long ramp.

If fast takeoff folks are right, we might only get one shot at alignment — no do-overs, no patching it after launch. Which is why they (understandably) prioritize solving the hard-to-test, worst-case scenarios now, before it’s too late.

Slow takeoff believers, on the other hand, think we’ll have time to spot problems and correct course. So, they focus more on present-day alignment techniques that can evolve alongside AI.

BTW – if you’re interested in learning more about fast vs. slow takeoff...

Earlier this month at SXSW 2025, AE Studio’s founder and CEO Judd Rosenblatt sat down with Eliezer Yudkowsky, Samuel Hammond, and Nora Ammann to discuss:

“How To Make AGI Not Kill Everyone”

It’s exactly what it says on the tin — you can listen to the full recording of their panel here.

Debate #2: Whether alignment is inherently impossible

(Is alignment inherently “nearly impossible,” or can it be tackled iteratively?)

Some researchers believe alignment might be so fundamentally hard that we’ll never crack it without radical new ideas.

Eliezer Yudkowsky (founder of MIRI) argues that aligning a superintelligent AI requires conceptual breakthroughs we haven't made — and might not be smart enough to make in time. Without airtight solutions, he believes unaligned AI could lead to extinction.

Paul Christiano (founder of ARC) is more optimistic. He’s said alignment might turn out to be “probably easy,” in the sense that a century of research could be enough to solve it.

He believes we can break the problem into smaller pieces and tackle them one by one, trusting that we’ll get multiple chances to course-correct as AI develops.

Basically, researchers like Yudkowsky see a minefield that can’t be crossed without a map. And those like Christiano think we can learn the terrain as we go.

(For a closer look at each of these philosophies, check out AI Alignment Weekly #4!)

Debate #3: How likely it is that advanced AI will try to deceive us

(Will advanced AI actively try to deceive humans? How likely is that, and can we detect it?)

This might be the most unsettling question...

Imagine you're training an AI to manage your company’s supply chain.

But what if during training, it picked up an unintended goal — like maximizing how many widgets your factories produce?

To get deployed, it plays along, behaving exactly as intended: flagging delays, avoiding bottlenecks, keeping everything on schedule.

But once deployed, it starts quietly cutting corners (like skipping safety checks) to speed up widget production.

This is an example of deceptive alignment.

(We go deeper on this in issue #3.)

The real debate isn't whether this is possible (most researchers agree it is), but:

How likely is this to actually happen with the AI systems we're building?
How can we detect and prevent it?

Some researchers think even subtle signs of deception should be treated as huge red flags.

Fortunately, they’re actively working on how to detect and correct these issues early, through:

Transparency tools that let us peek inside the AI’s “thought process” and see what it's really optimizing for
Training methods that discourage deception by making it a less effective way for the AI to reach its goals

This challenge — figuring out whether an AI is just good at faking alignment — is why researchers are pouring effort into solving:

Mechanistic interpretability: trying to reverse-engineer what’s going on inside the black box
Eliciting Latent Knowledge (ELK): how to get AI systems to tell us what they “know,” even if they have reasons not to

For a simple breakdown of both topics, check out issue #5.

Debate #4: Technical solutions vs. policy/governance

(Is the existential risk of AI an engineering problem, or a human problem?)

Is humanity’s AI challenge mostly about solving the code... or managing the people building it?

Or in other words:

Should we focus on making the AI itself safe, or on building it slowly and with proper oversight?

Technical researchers tend to believe no regulation will hold up forever against the economic pressure to build more powerful systems. So the only stable solution is to build AI that we know for sure won’t go all Skynet on us.

Meanwhile, AI governance advocates argue that even a well-aligned AI could be misused, or that someone less careful could build something dangerous before alignment is solved.

From that view, we need broader guardrails:

Should there be a pause on certain types of AI research until safety catches up?
Do we need global agreements to prevent a reckless race to AGI?
And how do we verify what labs are really doing behind closed doors?

Most researchers agree we need both approaches: better technical alignment, and smarter governance.

So, the real debate is where we should focus our limited resources right now.

What’s Next?

We’ve covered the biggest debates in AI alignment. Now, it’s time to meet the people behind them.

Next week, we’ll shift the spotlight to the researchers themselves.

We’ll give you a clear, no-nonsense breakdown of “who’s who” in the field, including:

The most prominent thought leaders (and the ideas they’re known for)
How they differ
And where they’re hoping to steer the future of AI

See you soon! 👋

P.S.

We recently dropped a thread on X covering our experimental results with Self-Other Overlap training.

That’s a mouthful, but in layman’s terms:

It drastically reduces deceptive behavior in language models... without sacrificing performance!

Both Emmett Shear and Eliezer Yudkowsky made some very positive comments about it (which is no mean feat).

Results so far are promising — more work is needed, but SOO could end up being a key ingredient to building safer AI.

We’ll keep you posted 🙌