AI Alignment Weekly #4: The Great Divide

There are two main approaches to preventing misaligned AI — but which one is right?

Welcome back to AI Alignment Weekly!

So far in this series, we’ve covered:

Now, (as promised) we’re going to explore a massive philosophical divide in the field of AI alignment.

The Great AI Alignment Divide

If you've been following this series, then you know that aligning advanced AI with human values is a really tough problem.

In fact, it’s so hard that the smartest minds on the planet can’t agree on how to solve it.

So the field has essentially split into two major schools of thought:

Prosaic alignment vs. theoretical alignment.

This debate has shaped the field's research priorities, funding decisions, and development since the beginning.

Prosaic Alignment: Learning By Doing

Researchers in this camp take a hands-on approach.

They believe we might be able to achieve alignment through current machine learning paradigms, without needing exotic new AI designs or breakthroughs in theoretical computer science.

Paul Christiano (who previously led the language model alignment team at OpenAI and founded the Alignment Research Center) coined this as “prosaic alignment.”

Organizations like OpenAI, Anthropic, and DeepMind generally lean this way.

The big idea is that by working with current AI systems — training ever more capable models and trying to align them — we can learn how to solve alignment incrementally.

Or in other words:

“Learning by doing” and gathering empirical data through iteration, instead of relying on theory.

This philosophy gained traction in the 2010s, when deep learning led to a dramatic spike in AI progress.

On the other side, we have...

Theoretical Alignment: We Need New Frameworks

These researchers believe that aligning a superintelligent AI might require fundamentally new scientific insights or mathematical frameworks.

Organizations like the Machine Intelligence Research Institute (MIRI) champion this approach.

(It’s worth noting that MIRI in particular is an old-school pioneer in AI alignment — since the year 2000!)

In a nutshell:

  • Current machine learning techniques aren't sufficient to guarantee safety in a superintelligent AI

  • We need rigorous, formal proofs and models before we build extremely powerful systems

  • Without solid theoretical foundations, we might end up with a catastrophically misaligned AI on our hands

Eliezer Yudkowsky, MIRI's founder, has voiced doubts that aligning something smarter than us can be done without a huge conceptual breakthrough.

His position is essentially: “We need to solve the deepest, most difficult parts of the alignment problem first, before we build something we can't control.”

Where they diverge

The prosaic and theoretical camps disagree on fundamental questions like:

How urgent is our timeline?

  • Prosaic: “AGI (Artificial General Intelligence) might arrive sooner than we think, so we need practical solutions now.”

  • Theoretical: “We need years of fundamental research before we can safely build AGI.”

Can we learn enough from current systems?

  • Prosaic: “Working with real AI systems reveals practical insights we'd never get from thought experiments alone.”

  • Theoretical: “Today's deep learning systems are too different from future superintelligence. The problems we're solving today won't prepare us for tomorrow's challenges.”

Is alignment a technical or philosophical problem?

  • Prosaic: “Let's focus on concrete engineering challenges we can tackle right now.”

  • Theoretical: “We need to solve deep philosophical questions about values and agency before we can build safe AI.”

How transparent does AI need to be?

  • Prosaic: “Perfect understanding may be impossible, but we can still build safeguards that work.”

  • Theoretical: “We need to fully understand how an AI reasons before we can trust it with significant power.”

Where they overlap

Despite their differences, there's still a ton of common ground:

  • Both camps agree that AI alignment is crucial for humanity's future

  • Both recognize that many different research directions are needed

  • Most researchers acknowledge value in both theoretical and practical work

Hybrid approaches are emerging as well, with researchers using theory to inform experiments and vice versa.

But wait—there’s more to the debate...

Beyond the prosaic vs. theoretical split, there are other alignment research agendas that take a different approach.

For example:

Brain-based AGI research (e.g., Steve Byrnes’ work) explores whether reverse-engineering human cognition could provide a pathway to aligning AI. Instead of training black-box models, this approach aims to develop AGI architectures that closely mirror the brain's structure and learning processes.

Prosocial alignment work (such as some of AE Studio’s efforts) focuses on developing AI systems that are designed from the ground up to align with human cooperation, rather than just technical safety constraints.

TurnTrout’s “shard theory” suggests that instead of thinking about alignment as an all-or-nothing challenge, we should investigate how AI learns values in a more granular way—much like how humans develop their moral intuitions in response to experiences.

These ideas offer alternative perspectives that don't fit neatly into the prosaic/theoretical divide but still shape the landscape of alignment research.

So, who's right?

Nobody knows yet!

There’s no easy answer — the field of AI alignment is still young, and we're being faced with questions humanity has never encountered before.

So most organizations working on AI alignment are somewhere on a spectrum between these two approaches.

Survey says...

One final tidbit for you:

In May 2024, AE Studio conducted a survey of alignment researchers.

Here’s one key finding that jumped out at us: most alignment researchers don't believe current research is on track to solve the alignment problem.

In other words, the debate is far from settled... and perhaps all sides still have a lot to learn from each other.

What's Next

In next week’s newsletter, we're going to look at some of the most recent breakthroughs in AI alignment, plus where the research is headed next.

We’ll talk about:

  • What researchers are doing to try and open up the “black box” of large neural networks...

  • Cracking the code on inner alignment...

  • The problem of transparency — and whether it’s possible to develop a “truth serum” for misaligned AI

More to come soon!