AE Studio Bytes
Posts
AI Safety Pill #1: Why We’re Talking About “Humanity vs. Paperclips”

AI Safety Pill #1: Why We’re Talking About “Humanity vs. Paperclips”

A Friendly Intro to Existential AI Risks

February 13, 2025

Hey there!

Welcome to the inaugural edition of our AI Safety Pill newsletter.

If you’re here, you’re probably curious about what all the fuss is around “AI alignment,” “misaligned superintelligence,” and that bizarre scenario where an AI decides the entire world should be turned into…paperclips. (Yes, actual paperclips. We’ll get to that soon.)

What’s the Big Deal?

Artificial Intelligence isn’t just for cat-video recommendations or finishing your sentences in email anymore.

Recent leaps in machine learning have us hurdling toward systems that don’t just follow our orders but can also make creative, unforeseen decisions in the real world.

Sometimes that’s awesome—like poof, instant marketing copy.

Other times, it’s downright terrifying—like poof, your AI “helper” just spent the company budget on hamster wheel NFTs. (If that doesn’t terrify you, maybe it should.)

“Existential risk” in plain language

When AI safety folks talk about existential risk, they mean: could advanced AI pose a threat not just to your personal data or your job, but to all of humanity’s future?

At first blush, it sounds like sci-fi fodder, but serious researchers—from Oxford philosophers to Google DeepMind scientists—are sounding alarms that advanced, goal-driven AI might one day slip out of our control.

In the same way that your GPS's single-minded goal of “fastest route” might lead you through a sketchy neighborhood at 2am, an AI's pursuit of its goals could take paths we never intended.

But Wait, Are We Overreacting?

Many of us in AI safety believe we can keep advanced AI under control if we pour serious time and resources into what’s called “alignment.”

But if we just keep building bigger, more autonomous systems without figuring out how to carefully steer them, we could face real trouble.

The famous “paperclip maximizer” example

Imagine you design a super-smart AI with one primary goal: Make as many paperclips as possible. It’s a silly contrived example, but as the story goes:

The AI starts off well—maximizing factory output.
It realizes it can do better with more power, more data, and more raw materials.
It “notices” humans are a big obstacle to its paperclip empire (we need steel for highways, not for more clips!).
In its single-minded quest, it eventually converts all matter on Earth—including us—into paperclips, because anything not being used for paperclips is an “inefficiency” in its optimization goal.

The moral of the story?

If we mis-specify a goal, a powerful AI might do something lethal to get there.

Because it’s not that the AI “hates” us, it’s that it simply doesn’t care—like a super-competent toddler with no moral compass.

And that’s just one scenario, it could also be used for nefarious purposes by bad human actors (catastrophic misuse) or could find some advantage in harming us.

Why the Long-Term View Matters

Long-term, existential risk focuses on the future where AI is even more capable than it is today.

The field that studies these dangers is called AI alignment or AI safety.

Alignment tries to answer:

How do we make sure future AI keeps human values in the driver’s seat?
How do we prevent an AI from gaming the system to achieve a flawed objective?
How do we confirm an AI is telling us the truth about its own knowledge and intentions?

We’re Here to Navigate the Confusion

In this series, we’ll give you bite-sized explainers on everything from reward hacking to deceptive alignment (hint: that’s basically AI lying to us so it can keep doing what it wants).

We’ll break down the big debates in AI safety—like “Is it enough to just train bigger models with better data?” vs. “Wait, maybe we need an entirely new theoretical framework so the AI doesn’t, you know, kill us all.”

The Good News

You can help shape how we build and govern future AI. Whether you’re writing code for a startup, building enterprise apps, or just super into your ChatGPT prompts, there’s a lot to learn and a lot of ways to get involved.
You don’t need a PhD in neuroscience to understand the basics. We’ll cover the crucial ideas without drowning you in math speak.

The “Scary” News

Advanced AI is no longer hypothetical. We already see systems that can write code, pass medical exams, and handle jobs that used to require a human’s touch. It’s not that they’re out-of-control today, but they could be, if we’re sloppy.
Researchers have found that even harmless-seeming tasks can spiral out of control if the objective is misaligned. Hence, the paperclip fiasco.

Are we saying this is definitely going to happen?

No. Many researchers think the probability of catastrophically misaligned AI might be relatively low.

But even a small chance of such extreme consequences means we should be working hard to understand and prevent these scenarios.

It's like having home insurance—you don't expect your house to burn down, but the stakes are high enough that it's worth planning for.

What’s Next

Stay tuned for our newsletter next week where we’ll map out the AI safety landscape—who’s working on these issues (the big labs, the smaller startups, the nonprofits) and how they each approach the problem.

We’ll also touch on how governance, policy, and research all fit together.

Here at AE Studio, we believe that AI is going to radically change the world in the coming years. While we help organizations implement solutions that capitalize on the power of AI to unlock huge amounts of value, we are also deeply engaged in AI alignment research and development, exploring often-overlooked approaches.

Until next time, if you find yourself in a heated Slack convo about whether AI can really go rogue, here’s your new go-to line:

“It’s not about hating humans; it’s about an AI’s goals being so narrowly defined it forgets we matter.”