AI Alignment Weekly #7: Meet the researchers (Pt. 1)

Feat. “The Pragmatists”

Would you look at that — we’re already at AI Alignment Weekly #7!

(Time flies when you’re trying to prevent a robot uprising, eh?)

Today, we’re kicking off a fun little field guide to the most influential researchers in the field of AI alignment.

There are way too many brilliant people in this space to squeeze into just one email.

So this’ll be a 3-part mini-series, rolling out over the next few weeks.

Each email will be a themed “starter pack” cheat sheet of a few names you should know — the ones you’ll hear again and again in articles, podcasts, and panel discussions.

Each entry comes with an archetype (their role in the ecosystem), their perspective, and a “quick facts” rundown of what they’re known for.

👉 Today’s theme: The Pragmatists

These researchers are trying to solve alignment through sheer engineering grit, with less focus on grand philosophical theories.

Let’s crack into it:

Paul Christiano

Archetype: 🔧 The Optimistic Engineer
Perspective: If we just keep iterating, we might actually pull this off!

Before founding ARC, Christiano helped steer OpenAI’s early safety work.

Compared to the usual doom-and-gloom you’ll find in some parts of the AI safety community, he’s fairly optimistic about the future.

He argues we can solve alignment through iteration and elbow grease — without having to overhaul the foundations of today’s AI.

Quick facts:

  • Currently the Head of Safety at the U.S. AI Safety Institute

  • Led the language model alignment team at OpenAI from 2019-2021

  • Founded the Alignment Research Center (ARC) in 2021 (a nonprofit tackling core problems in AI alignment)

  • Believes there’s a 1-in-3 chance we solve AI safety on paper in advance

  • Coined the term “prosaic alignment

  • Helped pioneer RLHF, a feedback-based training method now standard across most LLMs (i.e. when ChatGPT asks whether you prefer “response 1” vs. “response 2”)

  • One of the minds behind ELK, a research problem that could lead to creating a “truth serum” for deceptive AI (more on that in issue #5)

  • Sees alignment as a series of tough (but solvable) engineering challenges — not an unsolvable philosophical mystery

Jan Leike

Archetype: 🪜 The Alignment Stair-Stepper
Perspective: Forget trying to align superintelligence. We should focus on aligning the next generation of AI.

Before joining Anthropic, Leike spent years leading safety efforts at OpenAI.

His current focus: how to train AI systems to follow human intent on tasks that are hard for humans to evaluate directly.

He believes that aligning superintelligence might be unsolvable for humans living today... but, maybe we could align the next generation of AI that will eventually get us there.

Quick facts:

  • Co-leads the Alignment Science team at Anthropic (joined in 2024).

  • Previously Head of Alignment at OpenAI + co-led OpenAI’s now-defunct Superalignment team (along with Ilya Sutskever)

  • Was involved in the development of ChatGPT and InstructGPT, and the alignment of GPT-4

  • Developed OpenAI’s approach to alignment research

  • Resigned from OpenAI in May 2024, citing a shift in focus from safety to “shiny products.”

  • Worked on early RLHF prototypes at DeepMind

Ilya Sutskever

Archetype: 🧠 The Superintelligence True Believer
Perspective: Superintelligence is coming, and alignment has to move faster.

Sutskever has been at the forefront of AI for over a decade, and is widely considered one of the most respected minds in the field.

After leaving OpenAI in 2024, he founded Safe Superintelligence Inc., a startup laser-focused on building — wait for it — a safe superintelligence.

He calls this “the most important technical problem of our time.”

Quick facts:

  • Co-founded OpenAI in 2015 and served as Chief Scientist until his departure in 2024.

  • Co-led OpenAI’s Superalignment project (with Jan Leike), before resigning from OpenAI in 2024 after internal turmoil and growing concerns about safety being sidelined

  • Now leads Safe Superintelligence Inc.

  • Trained under AI legend Geoffrey Hinton, often called the “Godfather of AI”

  • Co-invented AlexNet in 2012 (with Geoffrey Hinton and Alex Krizhevsky), a neural network that shocked the AI world by crushing a major image recognition challenge, and kickstarted the modern AI boom

  • Publicly warned (alongside Jan Leike) that superintelligence could arrive within the decade

What’s Next?

In the next issue, we’ll zoom out from the engineering trenches and look at the big-picture philosophers who helped put AI alignment on the map in the first place.

See you then,

— The AE Studio team

P.S.

Need help building an AI product for your organization? Book a consultation with us here, and let’s talk.

(For more info on our work and what we do, check out this page.)