Taking a peek inside the AI “black box”

Goodfire AI shows us what’s really going on inside LLMs

One of the biggest challenges the AI industry faces today is:

The “black box” problem.

Or in other words:

We all know the basics of how AI works.

But, we often don’t know exactly why AI makes the decisions that it does.

Which is why we’re excited to announce the recent release of Goodfire’s Research Preview AI!

Our team at AE Studio partnered with Goodfire AI to help bring this groundbreaking project to life.

This is a custom AI model built to shed light on the inner workings of LLMs, and to make it straightforward for developers, researchers, and AI enthusiasts to see what’s going on under the hood.

To understand how it works, you just need to understand machine learning features.

These are the internal concepts that an AI uses to make decisions and determine how to respond to the user.

Depending on the input and the context, features change in “weight” (importance) to determine the output.

(One example would be the feature “Positive sentiment in creative or poetic contexts” being particularly active when a model is asked to write a poem.)

Normally, these features and weights are buried in the billions of parameters within the model…

But with Goodfire’s platform, they’re all out in the open and easy to edit.

What was the secret sauce?

Goodfire AI uses a clever combination of another AI model and advanced interpretability techniques to decode the internal “language” of the AI. This reveals the hidden concepts it uses to make decisions — like creating a translator for the AI's thought process.

The only missing pieces after that were the presentation and the horsepower.

So our team came on board to give the platform a slick, easy-to-use interface, and to build the state-of-the-art infrastructure needed to serve it up to hundreds of users at once.

And now...

With a single click, you can see which features the AI is selecting for... AND edit their weights directly!

For instance:

Many AI models tend to be sycophantic — they agree with you on everything, even when they shouldn’t.

But with this new interface, you could reduce the weight of the AI’s “agreeableness” feature simply by dragging a slider.

This granular control is like being able to perform “brain surgery” on the AI.

And it’s a crucial step in giving developers and end users the tools to:

  • Gain deeper insights into AI decision-making process

  • Experiment with targeted interventions on specific features

  • Explore ways to improve AI safety and alignment with human values and objectives

While we're still in the early stages, this technology could become a valuable part of the AI development toolkit.

And, it has the potential to significantly advance our understanding of AI systems and contribute to ongoing efforts to address the "black box" problem in AI.

We were thrilled to have a chance to partner with Goodfire, and we’ll share more of what we’ve been working on together soon!

They share our passion for enhancing human agency, and for solving tough problems in the fields of AI safety and alignment.

Are you a founder or leader with a big project in the wings, just waiting for the right team?

AE Studio can put a Swiss Army knife of tech talent at your disposal, with world-class developers all ready to hit the ground running to bring your vision to life.

We’re here to:

  • Act as your CTO, guiding your project with seasoned expertise

  • Drive end-to-end product creation, from concept to launch

  • Deliver clean, efficient code that performs (and grows with you)

AE Studio Bytes ships every Tuesday and Thursday.

See you then!

P.S.

By the way…

You might remember from a week or so back, how our Skunkworks team entered their zkGraph project into the BNB Chain hackathon.

Great news on that front too.

They didn’t just make it to the finals…

THEY WON! 🥳

A huge congratulations to all!