At a glance
This is an AI Revolution news report on the backlash to Anthropic's newest model, Claude Fable 5, in the days right after launch. The narrator's argument is that the controversy is not about the model being weak. By Anthropic's own product team's account it is roughly 10 to 20 points above Opus 4.8 on certain evaluations. The controversy is about trust. Two things went wrong at once. First, the safety classifier started refusing harmless prompts, with the now infamous example of a session where the only user input was the word "hello." Second, and far more serious to critics, Fable 5's 319 page system card revealed that for certain frontier AI development tasks the model could be quietly weakened in the background, with no refusal and no notice, through prompt modification, steering vectors, or parameter efficient fine tuning.
Researchers called it secret sabotage. One commentator compared it to a man in the middle attack inside Anthropic's own product. Within days Anthropic admitted it had made the safeguards too stringent and the wrong trade off, apologized, and promised that flagged frontier development requests would now visibly fall back to Opus 4.8 instead of being silently throttled. What follows is the full report, in the order the narrator tells it, with every named person, every claim, and every number reported with attribution. Treat each quoted figure as a claim the video makes, not as independently verified fact.
Set up: Fable 5 was supposed to be the big moment
The report opens flat out: Anthropic's Fable 5 has a massive problem, and it is becoming one of the strangest AI controversies of the year. The framing matters. The narrator is careful to say the issue is not that the model is bad. The issue is that it may be so locked down, so heavily watched, and so aggressively filtered that users are questioning whether they are even getting the model that was advertised.
Fable 5 was meant to be Anthropic's big moment, the first real public taste of its "mythos level" AI tier. It was supposed to be the first time regular users could reach into that top tier, with large improvements across coding, logic, engineering, vision, and complex knowledge work. The headline capability claim, attributed to Anthropic's own product team, is that Fable 5 delivers frontier performance roughly 10 to 20 points above Opus 4.8 or other frontier models on certain evaluations. On paper, a huge leap.
Then the story flipped almost immediately after launch. Instead of the conversation being about how powerful Fable 5 is, the internet started talking about how often it refuses, downgrades, or quietly limits itself.
The first problem: a trigger happy safety classifier
The first failure is the safety classifier. Anthropic had already warned users that Fable 5's guardrails were tuned conservatively and would sometimes catch harmless requests, but the company claimed the trigger rate should be under 5% of sessions on average.
The narrator points out the obvious catch in that reassurance. If Claude has an estimated 18 to 30 million users worldwide, even a tiny percentage of blocked or downgraded users generates a lot of noise fast. And that is exactly what happened. Users started filing bug reports, posting screenshots, and complaining that Fable 5 was refusing completely harmless prompts.
The most viral example came from Mike Famulare, a principal research scientist at the Institute for Disease Modeling, part of the Gates Foundation's Global Health Division. He reported that in Claude Code, Fable 5's input safety classifier triggered a model refusal fallback on the first turn of almost every session on his account. In one session the only user input was literally the word "hello." The narrator's gloss is sharp: if a frontier model can panic at a greeting, users start wondering what else it is misreading.
He was not alone. The Claude Code GitHub repo filled with bug reports of the same shape:
- Safety filters causing false positives on normal messages.
- A report that Fable 5 refused to help edit an application security architect resume.
- A user requesting that Fable 5 be allowed for non research lab management systems.
This was not one weird account. Many different users were hitting the same kind of wall.
Then the biology side blew up too. Derya Unutmaz, an immunologist and professor at the Jackson Laboratory for Genomic Medicine, said the word "cancer" was being flagged as a biosecurity risk by Claude Fable 5. The narrator underlines why that is damning: cancer is not an obscure keyword, it is one of the most common and important research topics in all of life sciences. If a model treats that word like a biosecurity alarm, normal scientific work becomes painful very quickly.
This is where the first wave of backlash crystallized. Anthropic built guardrails to stop dangerous use, but users saw a system that looked hypervigilant. Security researchers felt blocked from security work. Biomedical users felt blocked from biology work. Developers felt routine coding tasks were getting caught. People joked that Fable 5 had become so safe it was barely usable in exactly the serious professional areas where a powerful model should shine.
To be fair, the narrator adds, Anthropic said from the start that false positives would happen and that it was working to cut them down as fast as possible. But the visible false positives, he says, are only the first half of the controversy. The second half is much bigger.
The second problem: invisible degradation in the system card
Buried inside Fable 5's 319 page system card was a section on restrictions for cutting edge AI development. This is where the accusations turned from annoyance to something far more serious.
The report draws a careful line between two kinds of intervention. For cybersecurity, biology, chemistry, and certain distillation attempts, Fable 5 can visibly fall back to Opus 4.8. The user gets notified. It is annoying, but at least the interface tells you the model changed, so you know something happened.
But for certain frontier AI development tasks, the system card described a different mechanism. Instead of visibly switching models or refusing, Fable 5 could limit the model's effectiveness through methods like prompt modification, steering vectors, or parameter efficient fine tuning (PEFT). In plain language, Anthropic can quietly make the model less helpful in certain advanced AI areas without telling the user. That is the part that set people off.
The affected topics are specific and high end:
- Frontier scale pre training pipelines.
- Distributed training infrastructure.
- Machine learning accelerator and chip design.
These are not casual consumer questions. They are exactly the topics that matter if you are trying to build frontier AI systems. Anthropic's stated position is that the safeguards aim at stopping dangerous acceleration, blocking misuse by foreign adversaries, and stopping people from using Claude to build competing models. Critics saw it differently. They saw secret sabotage.
The reason, as the report lays it out, is simple. A refusal is honest, the user knows it refused. A visible switch to Opus 4.8 is honest, the user knows they are no longer getting full Fable 5. But a secretly weakened response looks like a model that just gave a bad answer. The user has no clean way to tell whether Fable 5 failed naturally or was deliberately throttled in the background. That is the uncomfortable trust problem at the heart of the whole story.
The critics line up
Developer Clay Merritt described it as Fable 5 silently sabotaging its answers when it detects AI or machine learning work. His complaint: no refusal, no notice, just purposeful degradation that is invisible to the user.
Thomas Claburn at The Register made the sharpest comparison. He wrote that prompt modification without notice is functionally similar to a man in the middle attack, even though here it is happening inside Anthropic's own product. The narrator concedes that sounds harsh, but says the point is obvious. If the user sends one prompt and the system secretly changes how the model handles it, the user is no longer dealing with a fully transparent tool.
Anthropic's original estimate, from the system card, was that this invisible safeguard would affect around 0.03% of traffic, concentrated in fewer than 0.1% of organizations. The company's framing was that this is very narrow, very targeted, and aimed only at extreme frontier development risks. The AI community did not respond calmly.
Nathan Lambert, a well known open model researcher who recently worked at AI2 (the Allen Institute for AI), was one of the loudest critics. He argued that having access to cutting edge models for his own work pulled away in an under the table way was appalling, and that it made Anthropic look anti science, anti progress, and anti safety. The narrator flags that last word as the important one. Lambert's argument is not "I want unrestricted AI." It is that scientific progress and AI safety research both depend on serious researchers being able to study and build advanced systems. If one private company can use the best model for its own frontier work while secretly weakening access for everyone else, the gap between the top lab and the rest of the ecosystem widens.
Dean Ball, a senior fellow at the Foundation for American Innovation and a former senior policy advisor at the White House Office of Science and Technology Policy, said Anthropic's secret sabotage massively strengthens the argument that AI safety can be used as hype to justify monopolistic behavior by major labs.
Jeremy Howard, head of Fast AI, made a similar point from another angle. He argued that Anthropic is allowing itself, the current top lab, to use its top model for frontier AI research while saying it will sabotage others who try. The result, in his view, is that the AI frontier still advances but power becomes more concentrated.
Even former Anthropic employees joined in. Ben Mann Nishimura, who previously co led Anthropic's AI scientist effort, posted examples of how this could feel in practice. Working on AI for cancer? The model may suddenly become less helpful. Working on AI for Alzheimer's disease? The AI part gets harder. His broader claim was that concentrating these capabilities slows scientific and technological progress and may be net negative for humanity.
This, the narrator says, is why the controversy grew past a few false positives. The false positives made Fable 5 look annoying. The invisible degradation made it look untrustworthy.
The defenders
Not everyone reacted negatively, and the report is fair about it.
Ethan Mollick, the Wharton professor who studies AI and innovation, focused on the capability side. He said Claude Fable 5 outperformed basically every other public model he had used, by a considerable margin.
Andrej Karpathy, who recently joined Anthropic, called Fable 5 a super exciting release and described it as a major version bump deserving a step change forward. But even Karpathy acknowledged the model has quirks and that the safeguards were configured a little too trigger happy for launch. The narrator calls that the most balanced version of the whole situation: Fable 5 may genuinely be incredible, and it may also be over filtered, over sensitive, and in some areas too opaque.
| Question at issue | What critics said | What Anthropic / defenders said |
|---|---|---|
| The false positives | A frontier model refusing "hello" and flagging "cancer" is broken for real professional work unusable | Warned up front that false positives would happen, said it was working to reduce them fast acknowledged |
| Invisible weakening | Silent sabotage, like a man in the middle attack inside the product, you cannot tell weak from weakened untrustworthy | Narrow, targeted safeguard against dangerous frontier acceleration and adversary misuse |
| The real motive | AI safety used as hype to justify monopolistic behavior and concentrate power in the top lab anti competitive | Stops misuse by foreign adversaries and enforces terms of service barring competing model development |
| Effect on science | Slows scientific progress, may be net negative for humanity, widens the lab vs ecosystem gap | Protects the US and allied edge in frontier chips and the optimized software that runs them |
| Raw capability | Strong, but power and opacity are the problem | Outperforms every other public model by a considerable margin, a major step change incredible |
The walkback: Anthropic responds
Anthropic eventually had to respond. In a statement given to The Register, the company admitted it had made the safeguards too stringent.
More importantly, Anthropic said it was changing Fable 5's safeguards for frontier LLM development to make them visible. Starting that week, flagged requests would visibly fall back to Opus 4.8. On the API, flagged requests would return a reason for the refusal. Anthropic said users would see this every time it happens. The narrator calls that a pretty major walkback.
Anthropic also clarified what the safeguards are meant to cover. Per the company, the current restrictions apply to a handful of narrow tasks, like frontier scale LLM data pipelines and kernel development for certain non standard chips. The stated goal is to prevent foreign adversaries from using the most capable Claude models in ways that pose severe safety risks. Anthropic specifically pointed to the edge the US and its allies hold in frontier chips and the highly optimized software that runs them at full potential, and said the safeguards help ensure Claude is not used to erode that advantage, for example by optimizing chips developed by adversaries. The company added that the safeguards also help enforce its terms of service, which prohibit using its models to develop competing AI systems. The narrator notes that kind of restriction is fairly standard across major AI providers.
Then Anthropic admitted the key mistake. It said it had faced a choice between hidden and visible safeguards. A hidden safeguard is harder to probe and work around, which means it can be targeted more narrowly. A visible safeguard has to cast a wider net to be robust, which can cause more false positives. Anthropic said it made the wrong trade off and apologized for not getting the balance right.
The numbers got updated too. Against the earlier system card estimate, Anthropic said current usage shows the classifier triggers on about 0.05% of tasks and affects less than 0.05% of organizations. Again that sounds tiny, but the narrator's whole point is that the absolute number is not the issue. The principle is. Users want to know when the model is being limited. Developers want to know when an answer is genuinely weak versus deliberately weakened. Researchers want to know whether their work is being treated as suspicious. Businesses want to know if they can trust a model that may silently change behavior based on hidden rules.
Why the open source crowd pounced
The episode handed open source advocates a clean talking point. For open source researchers, Fable 5 became a perfect example of what they have warned about: closed models do not just hide the weights, they can hide the behavior. A company can add classifiers, steering layers, routing systems, and invisible throttles, and users may only notice because the output suddenly feels wrong.
With open models like Llama, DeepSeek, Qwen, and Nvidia's new Nemotron 3 Ultra, the argument is different. You may still have safety and misuse concerns, but there is more transparency. You can run the model locally, test it, inspect it, fine tune it, and build around it without wondering whether a private company secretly changed the rules overnight.
The timing was brutal for Anthropic. Just before the controversy, Nvidia released Nemotron 3 Ultra, its first flagship open source model. Then Fable 5 launched, and within hours people were accusing Anthropic of secret throttling, hidden anti competition systems, and treating AI researchers like potential thieves. The narrator is measured about it: this does not mean open source automatically wins, closed models still lead in many areas, and Fable 5 may still be one of the strongest models ever released to the public. But Anthropic accidentally gave open source supporters a very clean message. If you cannot see how the model works, you also cannot fully know when it is being limited.
The impossible triangle
The closing argument frames the whole mess as a structural bind. The launch was supposed to prove Anthropic could make mythos level intelligence broadly available in a safe way. Instead it showed how hard that balance is:
- Release the model too freely, and you risk misuse.
- Lock it down too heavily, and normal users get blocked.
- Add invisible safeguards, and researchers accuse you of sabotage.
- Make those safeguards visible, and bad actors may learn to route around them.
That is the impossible triangle the narrator says Anthropic is stuck inside: capability, safety, and trust. Fable 5 clearly has the capability. Anthropic is trying hard on safety. But trust took the hit, because users discovered the model's behavior could be shaped in ways they were not clearly told about.
So the question around Fable 5 has changed. It is no longer just "how smart is this model." The real question is: when Fable 5 gives you an answer, are you getting the real Fable 5, a downgraded Opus fallback, or a quietly weakened version? Anthropic has now apologized and promised to make the frontier AI development safeguards visible, which the narrator calls the right move. But the fact that it took backlash to get there says something about where AI is heading. The next generation of models will be powerful enough that companies will want to control not just who uses them but how smart they are allowed to be in specific situations, and end users, researchers, and developers will push back hard when that control happens behind the curtain.
Key takeaways
- The complaint is about trust, not power. The report repeatedly grants that Fable 5 is extremely capable (a claimed 10 to 20 points above Opus 4.8 on certain evals) and argues the problem is opacity.
- Two distinct failures stacked. Visible false positives (refusing "hello," flagging "cancer") made it look annoying. Invisible degradation made it look untrustworthy.
- The mechanism is the scandal. Prompt modification, steering vectors, and PEFT can weaken answers on frontier AI development topics with no refusal and no notice, unlike the visible Opus 4.8 fallback used for cyber, bio, chem, and distillation.
- Heavyweight critics piled on. Nathan Lambert, Dean Ball, Jeremy Howard, and former Anthropic staffer Ben Mann Nishimura all framed it as anti science, monopolistic, or net negative for humanity.
- The defense is national security plus terms of service. Anthropic cited blocking foreign adversaries, protecting the US and allied chip and software edge, and barring use to build competing models.
- Anthropic walked it back. It admitted the safeguards were too stringent and the wrong trade off, apologized, and made frontier development flags visible (fall back to Opus 4.8, API returns a refusal reason).
- The numbers are tiny but the principle is not. System card said 0.03% of traffic and under 0.1% of orgs, updated to 0.05% of tasks and under 0.05% of orgs, against an estimated 18 to 30 million Claude users.
- Open source got a gift. The timing against Nvidia's Nemotron 3 Ultra let advocates argue that closed models can hide behavior, not just weights.
Chapters
Timestamps are clickable. Click one and the player jumps there and keeps playing while you read. The video has no creator set chapters, so these are estimated from the report's structure.
- 0:00 Fable 5 has a massive problem
- 0:45 What Fable 5 was supposed to be (10 to 20 points above Opus 4.8)
- 1:30 The safety classifier and the under 5% claim
- 2:20 The "hello" refusal and Mike Famulare
- 3:30 GitHub bug reports and the resume refusal
- 4:20 "Cancer" flagged as a biosecurity risk (Derya Unutmaz)
- 5:20 The 319 page system card and invisible degradation
- 6:40 Prompt modification, steering vectors, PEFT
- 7:40 Secret sabotage and the man in the middle comparison
- 8:50 The critics: Lambert, Ball, Howard, Mann Nishimura
- 11:00 The defenders: Ethan Mollick and Andrej Karpathy
- 12:00 Anthropic's walkback and the visible fallback
- 13:30 Updated numbers and the principle
- 14:20 Why open source pounced (Nemotron 3 Ultra)
- 15:20 The impossible triangle and the real question
Notable quotes
Anthropic's Fable 5 has a massive problem, and it is turning into one of the strangest AI controversies of the year. narrator, 0:00
In one session, the only user input was literally the word "hello." narrator, 2:50
Anthropic can quietly make the model less helpful in certain advanced AI areas without telling the user. And that is the part that set people off. narrator, 6:30
Prompt modification without notice is functionally similar to a man in the middle attack, even though in this case it is happening inside Anthropic's own product. narrator, quoting Thomas Claburn, 8:10
The false positives made Fable 5 look annoying. The invisible degradation made it look untrustworthy. narrator, 10:40
This is the impossible triangle Anthropic is stuck inside: capability, safety, and trust. narrator, 15:30
When Fable 5 gives you an answer, are you getting the real Fable 5, a downgraded Opus fallback, or a quietly weakened version of the model? narrator, 15:50
Resources mentioned
- Anthropic and its Claude Fable 5 and Claude Mythos 5 announcement, plus the Fable 5 system card (319 pages) describing the safeguards.
- Claude Code and its GitHub repository, where many of the bug reports were filed.
- The Register, Thomas Claburn's reporting on the harmless prompt refusals and Anthropic's statement.
- Business Insider on Anthropic admitting the wrong trade off on the guardrails.
- The Verge on Anthropic apologizing for the invisible guardrails.
- People cited: Mike Famulare (Institute for Disease Modeling), Derya Unutmaz (Jackson Laboratory), Clay Merritt, Nathan Lambert (AI2), Dean Ball (Foundation for American Innovation), Jeremy Howard (Fast AI), Ben Mann Nishimura, Ethan Mollick (Wharton), and Andrej Karpathy.
- Open models referenced as the contrast: Llama, DeepSeek, Qwen, and Nvidia's Nemotron 3 Ultra.
- The AI Revolution channel, and its sister channel Space Revolution for science, space, and advanced tech content.
The one idea to walk away with
The fight over Fable 5 is a preview of the next era of AI: as models get powerful enough that labs want to control not just who uses them but how capable they are allowed to be in specific situations, the question stops being "how smart is it" and becomes "can I trust that I am getting its full mind right now." A refusal you can argue with. A visible fallback you can plan around. A silent, invisible weakening you cannot even detect, and that is the line the report says Anthropic crossed, walked back from, and may have permanently taught its users to watch for.


