Skip to content

AI Revolution AI

The Fable 5 Backlash Is Getting Serious

This is an AI Revolution news report on the backlash to Anthropic's newest model, Claude Fable 5, in the days right after launch. The narrator's argument is that the controversy is not about the model being weak. By Anthropic's own product team's account it is roughly 10 to 20 points above Opus 4.8 on certain evaluations. The controversy is about trust. Two things went wrong at once.

Published Jun 11, 2026 16:16 video 20 min read Added Jun 14, 2026 Open on YouTube →

At a glance

This is an AI Revolution news report on the backlash to Anthropic's newest model, Claude Fable 5, in the days right after launch. The narrator's argument is that the controversy is not about the model being weak. By Anthropic's own product team's account it is roughly 10 to 20 points above Opus 4.8 on certain evaluations. The controversy is about trust. Two things went wrong at once. First, the safety classifier started refusing harmless prompts, with the now infamous example of a session where the only user input was the word "hello." Second, and far more serious to critics, Fable 5's 319 page system card revealed that for certain frontier AI development tasks the model could be quietly weakened in the background, with no refusal and no notice, through prompt modification, steering vectors, or parameter efficient fine tuning.

Researchers called it secret sabotage. One commentator compared it to a man in the middle attack inside Anthropic's own product. Within days Anthropic admitted it had made the safeguards too stringent and the wrong trade off, apologized, and promised that flagged frontier development requests would now visibly fall back to Opus 4.8 instead of being silently throttled. What follows is the full report, in the order the narrator tells it, with every named person, every claim, and every number reported with attribution. Treat each quoted figure as a claim the video makes, not as independently verified fact.

Set up: Fable 5 was supposed to be the big moment

The report opens flat out: Anthropic's Fable 5 has a massive problem, and it is becoming one of the strangest AI controversies of the year. The framing matters. The narrator is careful to say the issue is not that the model is bad. The issue is that it may be so locked down, so heavily watched, and so aggressively filtered that users are questioning whether they are even getting the model that was advertised.

Fable 5 was meant to be Anthropic's big moment, the first real public taste of its "mythos level" AI tier. It was supposed to be the first time regular users could reach into that top tier, with large improvements across coding, logic, engineering, vision, and complex knowledge work. The headline capability claim, attributed to Anthropic's own product team, is that Fable 5 delivers frontier performance roughly 10 to 20 points above Opus 4.8 or other frontier models on certain evaluations. On paper, a huge leap.

Then the story flipped almost immediately after launch. Instead of the conversation being about how powerful Fable 5 is, the internet started talking about how often it refuses, downgrades, or quietly limits itself.

The first problem: a trigger happy safety classifier

The first failure is the safety classifier. Anthropic had already warned users that Fable 5's guardrails were tuned conservatively and would sometimes catch harmless requests, but the company claimed the trigger rate should be under 5% of sessions on average.

The narrator points out the obvious catch in that reassurance. If Claude has an estimated 18 to 30 million users worldwide, even a tiny percentage of blocked or downgraded users generates a lot of noise fast. And that is exactly what happened. Users started filing bug reports, posting screenshots, and complaining that Fable 5 was refusing completely harmless prompts.

The most viral example came from Mike Famulare, a principal research scientist at the Institute for Disease Modeling, part of the Gates Foundation's Global Health Division. He reported that in Claude Code, Fable 5's input safety classifier triggered a model refusal fallback on the first turn of almost every session on his account. In one session the only user input was literally the word "hello." The narrator's gloss is sharp: if a frontier model can panic at a greeting, users start wondering what else it is misreading.

He was not alone. The Claude Code GitHub repo filled with bug reports of the same shape:

Safety filters causing false positives on normal messages.
A report that Fable 5 refused to help edit an application security architect resume.
A user requesting that Fable 5 be allowed for non research lab management systems.

This was not one weird account. Many different users were hitting the same kind of wall.

Then the biology side blew up too. Derya Unutmaz, an immunologist and professor at the Jackson Laboratory for Genomic Medicine, said the word "cancer" was being flagged as a biosecurity risk by Claude Fable 5. The narrator underlines why that is damning: cancer is not an obscure keyword, it is one of the most common and important research topics in all of life sciences. If a model treats that word like a biosecurity alarm, normal scientific work becomes painful very quickly.

This is where the first wave of backlash crystallized. Anthropic built guardrails to stop dangerous use, but users saw a system that looked hypervigilant. Security researchers felt blocked from security work. Biomedical users felt blocked from biology work. Developers felt routine coding tasks were getting caught. People joked that Fable 5 had become so safe it was barely usable in exactly the serious professional areas where a powerful model should shine.

To be fair, the narrator adds, Anthropic said from the start that false positives would happen and that it was working to cut them down as fast as possible. But the visible false positives, he says, are only the first half of the controversy. The second half is much bigger.

The second problem: invisible degradation in the system card

Buried inside Fable 5's 319 page system card was a section on restrictions for cutting edge AI development. This is where the accusations turned from annoyance to something far more serious.

The report draws a careful line between two kinds of intervention. For cybersecurity, biology, chemistry, and certain distillation attempts, Fable 5 can visibly fall back to Opus 4.8. The user gets notified. It is annoying, but at least the interface tells you the model changed, so you know something happened.

But for certain frontier AI development tasks, the system card described a different mechanism. Instead of visibly switching models or refusing, Fable 5 could limit the model's effectiveness through methods like prompt modification, steering vectors, or parameter efficient fine tuning (PEFT). In plain language, Anthropic can quietly make the model less helpful in certain advanced AI areas without telling the user. That is the part that set people off.

The affected topics are specific and high end:

Frontier scale pre training pipelines.
Distributed training infrastructure.
Machine learning accelerator and chip design.

These are not casual consumer questions. They are exactly the topics that matter if you are trying to build frontier AI systems. Anthropic's stated position is that the safeguards aim at stopping dangerous acceleration, blocking misuse by foreign adversaries, and stopping people from using Claude to build competing models. Critics saw it differently. They saw secret sabotage.

The reason, as the report lays it out, is simple. A refusal is honest, the user knows it refused. A visible switch to Opus 4.8 is honest, the user knows they are no longer getting full Fable 5. But a secretly weakened response looks like a model that just gave a bad answer. The user has no clean way to tell whether Fable 5 failed naturally or was deliberately throttled in the background. That is the uncomfortable trust problem at the heart of the whole story.

Figure 1. The three ways the report says Fable 5 can pull a punch. The first two announce themselves. The third, reserved for frontier AI development topics, does not, and that invisibility is the whole controversy. Source for the mechanisms is Anthropic's Fable 5 system card as described in the video.

The critics line up

Developer Clay Merritt described it as Fable 5 silently sabotaging its answers when it detects AI or machine learning work. His complaint: no refusal, no notice, just purposeful degradation that is invisible to the user.

Thomas Claburn at The Register made the sharpest comparison. He wrote that prompt modification without notice is functionally similar to a man in the middle attack, even though here it is happening inside Anthropic's own product. The narrator concedes that sounds harsh, but says the point is obvious. If the user sends one prompt and the system secretly changes how the model handles it, the user is no longer dealing with a fully transparent tool.

Anthropic's original estimate, from the system card, was that this invisible safeguard would affect around 0.03% of traffic, concentrated in fewer than 0.1% of organizations. The company's framing was that this is very narrow, very targeted, and aimed only at extreme frontier development risks. The AI community did not respond calmly.

Nathan Lambert, a well known open model researcher who recently worked at AI2 (the Allen Institute for AI), was one of the loudest critics. He argued that having access to cutting edge models for his own work pulled away in an under the table way was appalling, and that it made Anthropic look anti science, anti progress, and anti safety. The narrator flags that last word as the important one. Lambert's argument is not "I want unrestricted AI." It is that scientific progress and AI safety research both depend on serious researchers being able to study and build advanced systems. If one private company can use the best model for its own frontier work while secretly weakening access for everyone else, the gap between the top lab and the rest of the ecosystem widens.

Dean Ball, a senior fellow at the Foundation for American Innovation and a former senior policy advisor at the White House Office of Science and Technology Policy, said Anthropic's secret sabotage massively strengthens the argument that AI safety can be used as hype to justify monopolistic behavior by major labs.

Jeremy Howard, head of Fast AI, made a similar point from another angle. He argued that Anthropic is allowing itself, the current top lab, to use its top model for frontier AI research while saying it will sabotage others who try. The result, in his view, is that the AI frontier still advances but power becomes more concentrated.

Even former Anthropic employees joined in. Ben Mann Nishimura, who previously co led Anthropic's AI scientist effort, posted examples of how this could feel in practice. Working on AI for cancer? The model may suddenly become less helpful. Working on AI for Alzheimer's disease? The AI part gets harder. His broader claim was that concentrating these capabilities slows scientific and technological progress and may be net negative for humanity.

This, the narrator says, is why the controversy grew past a few false positives. The false positives made Fable 5 look annoying. The invisible degradation made it look untrustworthy.

The defenders

Not everyone reacted negatively, and the report is fair about it.

Ethan Mollick, the Wharton professor who studies AI and innovation, focused on the capability side. He said Claude Fable 5 outperformed basically every other public model he had used, by a considerable margin.

Andrej Karpathy, who recently joined Anthropic, called Fable 5 a super exciting release and described it as a major version bump deserving a step change forward. But even Karpathy acknowledged the model has quirks and that the safeguards were configured a little too trigger happy for launch. The narrator calls that the most balanced version of the whole situation: Fable 5 may genuinely be incredible, and it may also be over filtered, over sensitive, and in some areas too opaque.

Question at issue	What critics said	What Anthropic / defenders said
The false positives	A frontier model refusing "hello" and flagging "cancer" is broken for real professional work unusable	Warned up front that false positives would happen, said it was working to reduce them fast acknowledged
Invisible weakening	Silent sabotage, like a man in the middle attack inside the product, you cannot tell weak from weakened untrustworthy	Narrow, targeted safeguard against dangerous frontier acceleration and adversary misuse
The real motive	AI safety used as hype to justify monopolistic behavior and concentrate power in the top lab anti competitive	Stops misuse by foreign adversaries and enforces terms of service barring competing model development
Effect on science	Slows scientific progress, may be net negative for humanity, widens the lab vs ecosystem gap	Protects the US and allied edge in frontier chips and the optimized software that runs them
Raw capability	Strong, but power and opacity are the problem	Outperforms every other public model by a considerable margin, a major step change incredible

Figure 2. The two sides of the Fable 5 argument as laid out in the report. Notice the rare point of agreement at the top and bottom: both camps grant the model is genuinely powerful and that the false positives were real, the fight is entirely about the hidden safeguards and the motive behind them.

The walkback: Anthropic responds

Anthropic eventually had to respond. In a statement given to The Register, the company admitted it had made the safeguards too stringent.

More importantly, Anthropic said it was changing Fable 5's safeguards for frontier LLM development to make them visible. Starting that week, flagged requests would visibly fall back to Opus 4.8. On the API, flagged requests would return a reason for the refusal. Anthropic said users would see this every time it happens. The narrator calls that a pretty major walkback.

Anthropic also clarified what the safeguards are meant to cover. Per the company, the current restrictions apply to a handful of narrow tasks, like frontier scale LLM data pipelines and kernel development for certain non standard chips. The stated goal is to prevent foreign adversaries from using the most capable Claude models in ways that pose severe safety risks. Anthropic specifically pointed to the edge the US and its allies hold in frontier chips and the highly optimized software that runs them at full potential, and said the safeguards help ensure Claude is not used to erode that advantage, for example by optimizing chips developed by adversaries. The company added that the safeguards also help enforce its terms of service, which prohibit using its models to develop competing AI systems. The narrator notes that kind of restriction is fairly standard across major AI providers.

Then Anthropic admitted the key mistake. It said it had faced a choice between hidden and visible safeguards. A hidden safeguard is harder to probe and work around, which means it can be targeted more narrowly. A visible safeguard has to cast a wider net to be robust, which can cause more false positives. Anthropic said it made the wrong trade off and apologized for not getting the balance right.

The numbers got updated too. Against the earlier system card estimate, Anthropic said current usage shows the classifier triggers on about 0.05% of tasks and affects less than 0.05% of organizations. Again that sounds tiny, but the narrator's whole point is that the absolute number is not the issue. The principle is. Users want to know when the model is being limited. Developers want to know when an answer is genuinely weak versus deliberately weakened. Researchers want to know whether their work is being treated as suspicious. Businesses want to know if they can trust a model that may silently change behavior based on hidden rules.

Figure 3. Every percentage the report cites, to scale. The invisible frontier safeguard touches a sliver of traffic (0.03% in the system card, 0.05% of tasks in the updated figures), and the broad classifier was claimed to stay under 5% of sessions. The narrator's argument is that against 18 to 30 million users even these slivers are a lot of people, and that the principle, not the percentage, is what broke trust. Figures are as stated in the video.

Why the open source crowd pounced

The episode handed open source advocates a clean talking point. For open source researchers, Fable 5 became a perfect example of what they have warned about: closed models do not just hide the weights, they can hide the behavior. A company can add classifiers, steering layers, routing systems, and invisible throttles, and users may only notice because the output suddenly feels wrong.

With open models like Llama, DeepSeek, Qwen, and Nvidia's new Nemotron 3 Ultra, the argument is different. You may still have safety and misuse concerns, but there is more transparency. You can run the model locally, test it, inspect it, fine tune it, and build around it without wondering whether a private company secretly changed the rules overnight.

The timing was brutal for Anthropic. Just before the controversy, Nvidia released Nemotron 3 Ultra, its first flagship open source model. Then Fable 5 launched, and within hours people were accusing Anthropic of secret throttling, hidden anti competition systems, and treating AI researchers like potential thieves. The narrator is measured about it: this does not mean open source automatically wins, closed models still lead in many areas, and Fable 5 may still be one of the strongest models ever released to the public. But Anthropic accidentally gave open source supporters a very clean message. If you cannot see how the model works, you also cannot fully know when it is being limited.

The impossible triangle

The closing argument frames the whole mess as a structural bind. The launch was supposed to prove Anthropic could make mythos level intelligence broadly available in a safe way. Instead it showed how hard that balance is:

Release the model too freely, and you risk misuse.
Lock it down too heavily, and normal users get blocked.
Add invisible safeguards, and researchers accuse you of sabotage.
Make those safeguards visible, and bad actors may learn to route around them.

That is the impossible triangle the narrator says Anthropic is stuck inside: capability, safety, and trust. Fable 5 clearly has the capability. Anthropic is trying hard on safety. But trust took the hit, because users discovered the model's behavior could be shaped in ways they were not clearly told about.

So the question around Fable 5 has changed. It is no longer just "how smart is this model." The real question is: when Fable 5 gives you an answer, are you getting the real Fable 5, a downgraded Opus fallback, or a quietly weakened version? Anthropic has now apologized and promised to make the frontier AI development safeguards visible, which the narrator calls the right move. But the fact that it took backlash to get there says something about where AI is heading. The next generation of models will be powerful enough that companies will want to control not just who uses them but how smart they are allowed to be in specific situations, and end users, researchers, and developers will push back hard when that control happens behind the curtain.

Key takeaways

The complaint is about trust, not power. The report repeatedly grants that Fable 5 is extremely capable (a claimed 10 to 20 points above Opus 4.8 on certain evals) and argues the problem is opacity.
Two distinct failures stacked. Visible false positives (refusing "hello," flagging "cancer") made it look annoying. Invisible degradation made it look untrustworthy.
The mechanism is the scandal. Prompt modification, steering vectors, and PEFT can weaken answers on frontier AI development topics with no refusal and no notice, unlike the visible Opus 4.8 fallback used for cyber, bio, chem, and distillation.
Heavyweight critics piled on. Nathan Lambert, Dean Ball, Jeremy Howard, and former Anthropic staffer Ben Mann Nishimura all framed it as anti science, monopolistic, or net negative for humanity.
The defense is national security plus terms of service. Anthropic cited blocking foreign adversaries, protecting the US and allied chip and software edge, and barring use to build competing models.
Anthropic walked it back. It admitted the safeguards were too stringent and the wrong trade off, apologized, and made frontier development flags visible (fall back to Opus 4.8, API returns a refusal reason).
The numbers are tiny but the principle is not. System card said 0.03% of traffic and under 0.1% of orgs, updated to 0.05% of tasks and under 0.05% of orgs, against an estimated 18 to 30 million Claude users.
Open source got a gift. The timing against Nvidia's Nemotron 3 Ultra let advocates argue that closed models can hide behavior, not just weights.

Chapters

Timestamps are clickable. Click one and the player jumps there and keeps playing while you read. The video has no creator set chapters, so these are estimated from the report's structure.

0:00 Fable 5 has a massive problem
0:45 What Fable 5 was supposed to be (10 to 20 points above Opus 4.8)
1:30 The safety classifier and the under 5% claim
2:20 The "hello" refusal and Mike Famulare
3:30 GitHub bug reports and the resume refusal
4:20 "Cancer" flagged as a biosecurity risk (Derya Unutmaz)
5:20 The 319 page system card and invisible degradation
6:40 Prompt modification, steering vectors, PEFT
7:40 Secret sabotage and the man in the middle comparison
8:50 The critics: Lambert, Ball, Howard, Mann Nishimura
11:00 The defenders: Ethan Mollick and Andrej Karpathy
12:00 Anthropic's walkback and the visible fallback
13:30 Updated numbers and the principle
14:20 Why open source pounced (Nemotron 3 Ultra)
15:20 The impossible triangle and the real question

Notable quotes

Anthropic's Fable 5 has a massive problem, and it is turning into one of the strangest AI controversies of the year. narrator, 0:00

In one session, the only user input was literally the word "hello." narrator, 2:50

Anthropic can quietly make the model less helpful in certain advanced AI areas without telling the user. And that is the part that set people off. narrator, 6:30

Prompt modification without notice is functionally similar to a man in the middle attack, even though in this case it is happening inside Anthropic's own product. narrator, quoting Thomas Claburn, 8:10

The false positives made Fable 5 look annoying. The invisible degradation made it look untrustworthy. narrator, 10:40

This is the impossible triangle Anthropic is stuck inside: capability, safety, and trust. narrator, 15:30

When Fable 5 gives you an answer, are you getting the real Fable 5, a downgraded Opus fallback, or a quietly weakened version of the model? narrator, 15:50

Resources mentioned

Anthropic and its Claude Fable 5 and Claude Mythos 5 announcement, plus the Fable 5 system card (319 pages) describing the safeguards.
Claude Code and its GitHub repository, where many of the bug reports were filed.
The Register, Thomas Claburn's reporting on the harmless prompt refusals and Anthropic's statement.
Business Insider on Anthropic admitting the wrong trade off on the guardrails.
The Verge on Anthropic apologizing for the invisible guardrails.
People cited: Mike Famulare (Institute for Disease Modeling), Derya Unutmaz (Jackson Laboratory), Clay Merritt, Nathan Lambert (AI2), Dean Ball (Foundation for American Innovation), Jeremy Howard (Fast AI), Ben Mann Nishimura, Ethan Mollick (Wharton), and Andrej Karpathy.
Open models referenced as the contrast: Llama, DeepSeek, Qwen, and Nvidia's Nemotron 3 Ultra.
The AI Revolution channel, and its sister channel Space Revolution for science, space, and advanced tech content.

The one idea to walk away with

The fight over Fable 5 is a preview of the next era of AI: as models get powerful enough that labs want to control not just who uses them but how capable they are allowed to be in specific situations, the question stops being "how smart is it" and becomes "can I trust that I am getting its full mind right now." A refusal you can argue with. A visible fallback you can plan around. A silent, invisible weakening you cannot even detect, and that is the line the report says Anthropic crossed, walked back from, and may have permanently taught its users to watch for.

Full transcript

Anthropic's Fable 5 has a massive problem, and it is turning into one of the strangest AI controversies of the year. The issue is not that Fable 5 is weak. The issue is that Fable 5 may be so locked down, so heavily watched, and so aggressively filtered that users are now questioning whether they are actually getting the model Anthropic just advertised. The problem is Fable 5 was meant to be Anthropic's big moment, the first real public taste of its mythos-level AI. It was supposed to be the first time regular users could access a model in that top tier with massive improvements in coding, logic, engineering, vision, and complex knowledge work. According to Anthropic's own product team, Fable 5 delivers frontier performance that is roughly 10 to 20 points above Opus 4.8 or other frontier models in certain evaluations. So, on paper, this thing is a huge leap. But, almost immediately after launch, the story changed. Instead of everyone only talking about how powerful Fable 5 is, the internet started talking about how often it refuses, downgrades, or quietly limits itself. And the first problem is the safety classifier. Anthropic already warned users that Fable 5's guardrails were tuned conservatively. The company said these systems would sometimes catch harmless requests, although they claimed the trigger rate should be less than 5% of sessions on average. That sounds small, but there is a pretty obvious catch. If Claude has an estimated 18 to 30 million users worldwide, even a tiny percentage of blocked or downgraded users can create a lot of noise very quickly. And that is exactly what happened. Users started filing bug reports, posting screenshots, and complaining that Fable 5 was refusing completely harmless prompts. One of the most viral examples came from Mike Famulare, a principal research scientist at the Institute for Disease Modeling, part of the Gates Foundation's Global Health Division. He reported that in Claude code, Fable 5's input safety classifier triggered a model refusal fallback on the first turn of almost every session on his account. And in one session, the only user input was literally the word "hello". That is the kind of thing that makes people instantly lose confidence. Because if a frontier model can panic at a greeting, users start wondering what else it is misreading. And he was not alone. The Claude code GitHub repo started filling with bug reports. Some users complained that Fable 5's safety filters were causing false positives on normal messages. Another report said Fable 5 refused to help edit an application security architect resume. Another user requested that Fable 5 be allowed for non-research lab management systems. So, this was not just one person having a weird account issue. A lot of different users were running into the same type of problem. Then the biology side started blowing up, too. Derya Unutmaz, an immunologist and professor at the Jackson Laboratory for Genomic Medicine, said that the word "cancer" was being flagged as a biosecurity risk by Claude Fable 5. And if you work in medicine, biology, or health research, that is obviously a serious problem. Cancer is not some obscure keyword. It is one of the most common and important research topics in all of life sciences. If a model treats that word like a biosecurity alarm, then normal scientific work becomes painful very quickly. This is where the first backlash became clear. Anthropic built the guardrails to stop dangerous use, but users were seeing a system that looked hypervigilant. Security researchers felt blocked from doing security work. Biomedical users felt blocked from doing biology work. Developers felt normal coding tasks were getting caught. And some people were joking that Fable 5 had become so safe that it was barely usable in exactly serious professional areas where a powerful model should be valuable. Now, to be fair to Anthropic, the company did say from the beginning that false positives would happen and that they were working to reduce them as quickly as possible. But the visible false positives are only the first half of the controversy. The second half is much bigger. Buried inside Fable 5's 319-page system card was a section about restrictions on cutting-edge AI development. And this is where people started accusing Anthropic of something much more serious than normal safety filtering. For cybersecurity, biology, chemistry, and certain distillation attempts, Fable 5 can visibly fall back to Opus 4.8. The user gets notified. It is annoying, but at least you know the model changed. The interface tells you something happened. But for certain frontier AI development tasks, the system card described a different kind of intervention. Instead of visibly switching models or refusing, Fable 5 could limit Claude's effectiveness through methods like prompt modification, steering vectors, or parameter-efficient fine-tuning, also known as PEFT. In simple language, Anthropic can quietly make the model less helpful in certain advanced AI areas without telling the user. And that is the part that set people off. The affected topics include things like frontier-scale pre-training pipelines, distributed training infrastructure, and machine learning accelerator or chip design. These are not casual consumer questions. These are the kinds of topics that matter if you are trying to build frontier AI systems. So, Anthropic's position is that these safeguards are aimed at stopping dangerous acceleration, stopping misuse by foreign adversaries, and stopping people from using Claude to build competing models. But critics saw it differently. They saw it as secret sabotage. The reason is simple. If a model refuses to answer, the user knows it refused. If it switches to Opus 4.8, the user knows they are no longer getting full Fable 5. But if the model secretly weakens its response, the user may just think the model gave a bad answer. They have no clean way to know whether Fable 5 failed naturally or whether Anthropic deliberately throttled it in the background. That creates a very uncomfortable trust problem. Developer Clay Merritt described it as Fable 5 silently sabotaging its answers when it detects AI or machine learning work. His complaint was that there is no refusal, no notice, just purposeful degradation that is invisible to the user. And Thomas Claburn at The Register made an even sharper comparison. He wrote that prompt modification without notice is functionally similar to a man-in-the-middle attack, even though in this case it is happening inside Anthropic's own product. That may sound harsh, but the point is obvious. If the user sends one prompt and the system secretly changes how the model handles it, the user is no longer dealing with a fully transparent tool. Anthropic originally estimated that this invisible safeguard would affect around 0.03% of traffic, concentrated in fewer than 0.1% of organizations. So the company's argument was basically that this is very narrow, very targeted, and only aimed at extreme frontier development risks. But the AI community did not respond calmly. Nathan Lambert, a well-known open model researcher who recently worked at AI2, was one of the loudest critics. He argued that having access to cutting-edge models for his own work pulled away in an under-the-table way was appalling. He said it made Anthropic look anti-science, anti-progress, and anti-safety. That last part is important. Lambert is not just saying, "I want unrestricted AI." His argument is that scientific progress and AI safety research both depend on serious researchers being able to study and build advanced systems. If one private company can use the best model for its own frontier work while secretly weakening access for everyone else, then the gap between the top lab and the rest of the ecosystem gets wider. Dean Ball, a senior fellow at the Foundation for American Innovation and a former senior policy advisor at the White House Office of Science and Technology Policy, also criticized the policy. He said Anthropic's secret sabotage massively strengthens the argument that AI safety can be used as hype to justify monopolistic behavior by major labs. Jeremy Howard, the head of Fast AI, made a similar point from another angle. He argued that Anthropic is allowing itself, the current top lab, to use its top model for frontier AI research while saying it will sabotage others who try. In his view, that means the AI frontier still advances, but power becomes more concentrated. Even former Anthropic employees joined the criticism. Ben Mann Nishimura, who previously co-led Anthropic's AI scientist effort, posted examples of what this could feel like in practice. Working on AI for cancer? The model may suddenly become less helpful. Working on AI for Alzheimer's disease? The AI part becomes harder. His broader point was that concentrating these capabilities slows scientific and technological progress and may be net negative for humanity. And that is why this controversy became bigger than a few false positives. The false positives made Fable 5 look annoying. The invisible degradation made it look untrustworthy. Still, not everyone reacted negatively. Ethan Mollick, the Wharton professor who studies AI and innovation, focused more on the capability side. He said Claude Fable 5 outperformed basically every other public model he had used by a considerable margin. Andrej Karpathy, who recently joined Anthropic, called Fable 5 a super exciting release and described it as a major version bump deserving step change forward. But even Karpathy acknowledged that the model has quirks and that the safeguards were configured a little too trigger-happy for launch. That is probably the most balanced version of the whole situation. Fable 5 may genuinely be incredible. It may also be over-filtered, over-sensitive, and in some areas too opaque. Anthropic eventually had to respond. In a statement given to The Register, the company admitted that it had made the safeguards too stringent. More importantly, Anthropic said it was changing Fable 5's safeguards for frontier LLM development to make them visible. Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, flagged requests will return a reason for the refusal. Anthropic said users will see this every time it happens. That is a pretty major walkback. Anthropic also clarified what these safeguards are supposed to cover. According to the company, the current restrictions apply to a handful of narrow tasks, like frontier-scale LLM data pipelines and kernel development for certain non-standard chips. They said the goal is to prevent foreign adversaries from using the most capable Claude models in ways that pose severe safety risks. They specifically pointed to the edge that the US and its allies have in frontier chips and the highly optimized software that runs them at full potential. In Anthropic's framing, these safeguards help make sure Claude is not used to erode that advantage, for example, by optimizing chips developed by adversaries. Anthropic also said the safeguards help enforce its terms of service, which prohibit using its models to develop competing AI systems. And to be fair, that kind of restriction is pretty standard across major AI providers. But then, Anthropic admitted the key mistake. They said they had faced a choice between hidden and visible safeguards. A hidden safeguard is harder to probe and work around, which means it can be targeted more narrowly. A visible safeguard has to cast a wider net to be more robust, which can cause more false positives. Anthropic said it made the wrong trade-off and apologized for not getting the balance right. They also updated the numbers. Instead of the earlier system card estimate, Anthropic said current usage shows the classifier triggers on about 0.05% of tasks and affects less than 0.05% of organizations. Again, that sounds tiny. But the absolute number is not the only issue. The real issue is the principle. Users want to know when the model is being limited. Developers want to know when an answer is genuinely weak versus deliberately weakened. Researchers want to know whether their work is being treated as suspicious. And businesses want to know if they can trust a model that may silently change behavior based on hidden rules. This also explains why the open-source crowd reacted so strongly. For open-source researchers, Fable 5 became a perfect example of what they have been warning about. Closed models do not just hide the weights. They can also hide the behavior. A company can add classifiers, steering layers, routing systems, and invisible throttles. And users may only notice because the output suddenly feels wrong. With open models like Llama, DeepSeek, Qwen, and Nvidia's new Nemotron 3 Ultra, the argument is different. You may still have safety concerns. You may still have misuse concerns. But there is more transparency. You can run the model locally, test it, inspect it, fine-tune it, and build around it without wondering whether a private company secretly changed the rules overnight. And the timing is brutal for Anthropic. Just before this controversy, Nvidia released Nemotron 3 Ultra, its first flagship open-source model. Then Fable 5 launches, and within hours, people are accusing Anthropic of secret throttling, hidden anti-competition systems, and treating AI researchers like potential thieves. That does not mean open-source automatically wins. Closed models still lead in many areas. Fable 5 may still be one of the strongest models ever released to the public. But Anthropic accidentally gave open-source supporters a very clean message. If you cannot see how the model works, you also cannot fully know when it is being limited. And that may be the lasting damage here. Because the launch was supposed to prove that Anthropic could make mythos-level intelligence broadly available in a safe way, instead, it showed how difficult that balance really is. Release the model too freely, and you risk misuse. Lock it down too heavily, and normal users get blocked. Add invisible safeguards, and researchers accuse you of sabotage. Make those safeguards visible, and bad actors may learn how to route around them. This is the impossible triangle Anthropic is stuck inside: capability, safety, and trust. Fable 5 clearly has the capability. Anthropic is trying very hard on safety, but trust took a hit because users discovered that the model's behavior could be shaped in ways they were not clearly told about. And now the question around Fable 5 has changed. It is no longer just how smart is this model? The real question is, when Fable 5 gives you an answer, are you getting the real Fable 5, a downgraded Opus fallback, or a quietly weakened version of the model? Anthropic has now apologized and promised to make those frontier AI development safeguards visible. That is the right move. But the fact that this had to happen after backlash tells us something important about where AI is heading. The next generation of AI models will be powerful enough that companies will want to control not just who uses them, but how smart they are allowed to be in specific situations. End users, researchers, and developers are going to push back hard when that control happens behind the curtain. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. So, what do you think? Is Anthropic being responsible by restricting Fable 5 this heavily? Or did they cross the line by making some of those limits invisible? Let me know in the comments. Subscribe for more AI updates. Like the video if you found it useful. And thanks for watching. I'll catch you in the next one.