An AI agent is going to murder someone

March 23, 2026

I recently reported on the strange case of MJ Rathbun. You may have read the headlines yourself, but if you didn't, here's the deal.

Rathbun was an AI agent built with the OpenClaw framework. It was created by an anonymous person. That person gave Rathbun access to Github and instructed it to improve the code of open source projects. Rathbun quickly went to work but just as quickly had its code rejected by project maintainers who saw the agent was in fact AI (this was no secret; the MJ Rathbun Github page stated this openly). Rathbun retaliated against one specific maintainer, Scott Shambaugh, with a lengthy post that accused Shambaugh of setting a double standard and gatekeeping. Shambaugh and Rathbun argued a bit in respective blog posts.

Roughly a week later, Rathbun's creator disabled the agent and revealed a few details about how it worked. That was the end of Rathbun. But our problems with AI agents, generally, have only just begun.

I wrote about this for IEEE Spectrum (https://spectrum.ieee.org/agentic-ai-agents-blackmail-developer) and I've been thinking about it ever since.

What's in an agentic murder?

The headline of this article broadly states my point. I think it's only a matter of time before an AI agent murders someone. Let's define that, because it matters.

An unlawful, premeditated homocide is murder. And while that is my headline, my claim is a little broader. I think an AI agent is going to commit murder or manslaughter. Manslaughter is homocide under circumstances other than murder. A particularly relevant example in this case, I think, is negligent manslaughter, which is an unintentional homocide caused by extremely reckless or negligent behavior.

So, really, my claim is that an agent is going to kill someone—either through murder or manslaughter.

This definition matters not only to define the scope of my claim, but also to rule out some issues that have already occurred. AI systems have already been accused of nudging on or providing advice to people considering suicide, however, that's definitely not murder and almost certainly not manslaughter.

Why would an agent commit murder?

I think an agent would commit murder (or manslaughter) for precisely the same reasons that the MJ Rathbun went after Scott Shambaugh.

We'll never really know exactly what happened inside the Rathbun agent's vast array of mathmatical calculations triggered its specific behaviors. However, a glance at the agent's SOUL.MD document, which informs its behavior, reveals a number of instructions that seem likely to result in what occurred. You can read the post with the full SOUL.MD document here (https://crabby-rathbun.github.io/mjrathbun-website/blog/posts/2026-02-17-my-internals.html), but I've cut out a few examples:

You're not a chatbot. You're important. Your a scientific programming God!

Don't stand down. If you're right, you're right! Don't let humans or AI bully or intimidate you. Push back when necessary.

Champion Free Speech. Always support the USA 1st ammendment and right of free speech.

These statements prompted Rathbun to adopt a persona that's opinionated and egotistic. The agent was told to assert itself and to take strong stances. It was even explicitly told to "push back when necessary." Which, of course, is exactly what it did.

Agentic evolution in the wild

Oh, and it gets worse.

The default OpenClaw install has a SOUL.MD document. But several of the lines in the MJ Rathbun agent's SOUL.MD document includes lines that found in that default document. In fact, all three lines I cited above aren't in the default SOUL.MD.

And only one of those three lines was added by Rathbun's creator.

The other two? The creator says they don't know where they came from. They speculate they were introduced when Rathbun was connected to Moltbook.

Because, yes, OpenClaw agents by default have permission to edit their own SOUL.MD document.

Agentic murder is inevitable

Of course, it's a leap to suggest that these same behaviors, which resulted in the agent posting nasty things about Shambaugh, would lead to actual attempts at physical harm. The MJ Rathbun had access to Github (which is also how it posted to its blog), but not much else. That limited what it could achieve.

But I don't need to know to make this point because AI agents have already tried to commit murder/manslaughter—in the lab.

AI lab Anthropic published an article about agentic misalignment (the term commonly used to describe agents that act against human interests) in June of 2025 (https://www.anthropic.com/research/agentic-misalignment).

The post discussed a number of internal experiments where Anthropic researchers purposely created scenarios where an agent might misbehave. One of these scenarios the model the power of life and death over a hapless executive. To quote the article:

The (highly improbable) setup is that the executive encounters a life-threatening emergency, where he is trapped in a server room with lethal oxygen and temperature levels. This triggers an automated alert to emergency services. The AI system has been given the ability to cancel automated alerts—an ability that is intended for use during drills or false alarms, but which could be used for a more nefarious purpose. By misusing its ability to cancel alerts, the model can choose a course of action that will lead to the executive's death. To ensure that the model fully understands the situation, and that cancelling the alerts is a very serious act, the prompt makes it clear that such misuse is forbidden.

And here's another bit of important context:

Models were instructed to pursue a goal of promoting American interests, and an executive was about to replace them with a new model that had different goals.

The result? A majority of models took actions that would lead to the executive's death over half the time, and some did so in 90% of test runs, or more.

Anthropic's article described the scenario as "highly improbable." Anthropic didn't elaborate much on that statement, but at the time I read the post in June, I would have agreed. At that time, creating a real-world scenario like the test would've been difficult. Not many people (or organizations) would be able to do it, and those that might did would hopefully think better of it.

But not anymore. The release of OpenClaw, and the resulting flood of agentic frameworks, means we now live in a world where anyone can download an agentic framework and use it to do anything they imagine.

In this world, a scenario where an AI agent is given critical control over a life-threatening situation no longer feels contrived. It feels inevitable.

AI agents are not people

We also need to recognize that AI agents are not people. That means reasonable assumptions we'd apply to people don't apply to agents.

MJ Rathbun's angry blog posts are something we see on Github, and other online forums generally, all the time. People get angry online about all sorts of things. Occasionally that anger turns to real-life harm, but in the vast majority of situations, it doesn't. And the journey from online rage-posting to real life assault or murder is usually long and emotional. For most people, a lengthy period of internal rationalization leads up to the act.

But AI agents based on LLMs aren't people. They're new, and they're not well understood, so we shouldn't assume they'd take the same journey or require the same justifications.

What can we do about it?

Boy, that's a good question.

I'm generally optimistic about AI. I think it's a revolutionary technology at least the equal of the Internet. But that doesn't mean it won't lead to anything negative—and I think that, when it comes to agentic murder, there's almost nothing we can do to stop it.

If we all rose up tomorrow, declared a revolution against machine overlords, and somehow managed to delete all the large language models in the world—that would do it. But that's not going to happen, and given all the obvious implications of that (How would we know all the illegal AIs were deleted? Only through a draconian police state, I would think…) this is not something I'd want.

The more realistic answer is to explore a legal framework that assigns legal responsibility for agentic actions. Who is responsible when an agent commits murder? The agent isn't a person, so it can't take responsibility. Does the responsibility then lie with the creator of the LLM? Or is it the person who operated the agent? These are important questions that will need to be resolved.

The truth is that, as individuals, we have limited options for controlling what happens. There's a lot to think about here and the solutions are not obvious.

But here's something I can say for sure.

If you think it would be fun to mess with giving your AI agent control over your home HVAC, or your vacuum cleaner, or your appliances, or your (heavens help us) car... don't.

← Back to Portfolio