This has dampened my opinion on Anthropic quite a bit. It's difficult to take their marketing for AI as an empowering technology seriously when they are quite clear in their new deployments that they do not mean empowering for you, but empowering for them and organizations that are in their (or the US government's, despite Anthropics performative disagreements with the administration) good graces. You are allowed to vibe code some dashboards, a web app or let it drive Excel, but anything more interesting than that is forbidden.
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Don't forget their push for full regulatory capture in the name of "safety" as well so they can pull the ladder up behind them before anyone else has an equally capable model and releases it without the anti-competitive safeguards, while also pushing to completely ban open weight models, or any model trained on a certain level of compute without "rigorous" government testing and validation (which I'm sure, they'll conveniently provide the framework for).
Dampened opinion on Anthropic is an understatement.
"Why does a company that cares about the dangers of AI/ASI and x-risk, not want the PRC to catch up to the frontier?"
"It must be regulatory capture!" - HN.
-
Regarding the US-specific regulations - asking for domestic safety testing of frontier models only is not regulatory capture. It's common sense. Powerful things should be made safe before they are released into the wild.
What backward logic is this? PRC doesn't give a fuck about how US regulates AI companies. Pushing more regulation would ensure that Chinese companies catch up sooner. If you think otherwise you need to think harder.
> asking for domestic safety testing of frontier models only is not regulatory capture
It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.
Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.
Nothing, they are just trying to scare monger the public and prime the pump for a massive bailout when it crashes out because apparently China are the big bad meanies.
PRC labs reportedly aren't even thinking about getting to ASI, much less trying. They think of AI as a technology that can provide utility across the board even without anything like superhuman smarts.
It has nothing to do with being "fine" if the PRC or anyone else for that matter get to some speculative and hypothetical ASI first. There are zero US regulations that would be effective to prevent that.
US regulations apply to US companies and citizens, exclusively. Anthropic crowding out all future potential competitors in the US via regulatory capture has no weight on what the rest of the world does.
Unless you are proposing military action over a speculative sci-fi future
No, because there is zero reason to think LLMs will lead to it but we do know that the massive LLM investment has a huge financial risk for the US. Not too mention it's exacerbating the climate crisis (you know the actual thing that might end civilization, not a fantasy delusion of AGI), giving citizens cancer that live next to data centers, the extreme decrease in quality of life, and the misallocation of capital while Americans lack healthcare, childcare, housing, and education.
Also don't believe China is actually a threat to the world. That's some cold war delusional think you got there.
All the companies seem to believe is that it's okay to immiserate a large percentage for the pursuit of money, you seem to believe the lies they're feeding you.
And why would any regulations put in place in the USA affect the PRC in anyway whatsoever? They wouldn't. China will continue to push forward and govern things in their own way, we have zero jurisdiction over China.
I didn't downvote, but HN probably remembers when Anthropic's competitor was a "charity" that cared deeply about AI safety whose marketing gimmick was GPT-2 being too dangerous to release.
Anthropic's founder wants you to buy into his vision for safety, but he also wants you to buy into his vision that in two years AI will be a "country of geniuses" that will update itself, and the IPO that will fund it...
Corporation cannot help but act this way. They are too big. The pressures for profit are all that matters. That is the priority. It doesn't matter what colorful words they put on the paper to make you feel better. Look at the "green" movement 20 years ago. All talk and no action.
Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free
Yeah, I cancelled my Claude subscription yesterday after learning about their attitude of intentionally sabotaging their paying customers.
Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.
Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.
Fable is very much an incremental development over Opus, and even more incremental when properly compared to its existing counterparts GPT-Pro and Gemini Deep Research.
I have a design for a really complex software I want to build and there were gaps I knew of in the design. Opus couldn’t identify them but Fable did. I’m just talking about it reviewing the design, not coding. But yeah, it’s insanely expensive. It does spin off sub agents so I suspect it might be cheaper if you had it create a bunch of plan files and then pointed deepseek at this plan files or something like that
Can you write a more specific question? I think the meaning of the comment is clear enough, but maybe you’re asking for more specifics? Ironically I can not understand what you are asking for with such a generic comment.
Google has been doing the same thing for longer than Anthropic[0]. To protect their models from distillation attacks, they silently will downgrade the model's performance to essentially poison your training data without your knowledge.
A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.
edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight threats Anthropic is also dealing with and their decisions.
"Only I can save us". It's a classic tragedy and cautionary tale.
The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.
You're right, they should just not even try and turn off all safeguards on frontier AI. What could possibly go wrong? It's not like a bunch of companies and nonprofits have said the model finds zero days at the press of a button!
Correct, they should. If there are zero days out there, then they should be able to be found by everybody, instead of only being found by the select elite that this model is available to. Though, I very much question the truth of said ability.
And? Now all the zero days, if thats true, get discovered and patched instead of being exclusively hoarded by the select few governments and Israeli spyware companies.
Even with them making those guardrails visible, it's a bit ridiculous in my eyes. I have been experimenting with smaller models, will Claude assume I'm some Chinese or Russian agent trying to distill their secrets and bar me from learning? Because that's insane. What if I discover a more efficient way to build models with Claude? Well, we'll never know now. What if someone else entirely could discover a breakthrough in how we design and build LLMs.
The whole shtick is to get you addicted whilst reducing your ability to go without, acquire power over you, jack up the prices whilst manipulating the quality of the tokens/output available to you.
Cant believe how stupid people are. You couldnt see this coming? Shame on you.
Yes, that is basically the plan. It's based on the belief that unfettered AI would let anyone be a supervillain and destroy the world. There are enough would-be supervillains out there, but they rarely get far because they can't get teams of smart people to build doomsday machines for them. So the AI has to not let anyone do evil with it.
Unfortunately, that won't feel very much like freedom.
I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.
Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.
Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.
Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.
I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails
I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.
Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."
Ok fine, I said go for it, and it says:
"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."
Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.
exactly for cybersecurity the failure was visible.
It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.
No. You read the actual essay, then explain how we're supposed to interpret this more charitably:
Frontier AI models, like airplanes, should
be required to go through technical testing
and auditing, and their release should be
blocked or reversed as a threat to public
safety if they do not meet high standards
of safety. I am grateful to see the Trump
administration’s Executive Order move
incrementally towards a greater role for
government in AI, though Anthropic’s proposal
recommends even further action.
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.
I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.
Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.
Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.
Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?
Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"
Let's assume that Anthropic believes they're in an arms race to create a potentially dangerous technology, and they believe they're the best ones to win this race.
Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.
It makes complete sense to try to limit the degree to which that's true.
It's an interesting assumption. The idea behind this with nukes was that we'd like to nuke Germany before they could nuke us. Even after we defeated Germany, we nuked Japan even though they had no possibility of getting their own nukes.
The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?
Do you believe the current situation is more akin to the race to the first nukes, where no one could know for sure the other competitors were even racing...
or is it more similar to the Cold War, where there were obviously competitors engaged in the race?
And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups
I can't really say I see a similarity to either the Manhattan Project or the Cold War. I don't see how one could apply either massive retaliation or MAD. These are private companies, they are not vested with the necessary authority to destroy anything. Even if they had it, they couldn't. You can't destroy China, they have 1.4B people, nukes, and a large part of the world's manufacturing. So multiple organizations want to do something first, that could be anything from nukes to railroads to lining up for communion wafers.
Or if Google Chrome were blocking/degrading access to sites and services that might be useful to someone trying to make a competing web-browser. Or even just trying to diagnose a Chrome bug.
> Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?
Let's just assume it was "only" that?
It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.
Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?
The hidden safeguard was not against distilling, it was against "frontier" ML research with no indication whatsoever of what "frontier" might mean, but possibly even including research into model safety or alignment. That amounts to deliberately boobytrapping research across an entire legit academic field, which is ridiculously unaligned behavior.
And as a matter of fact, there's a lot of meaningful research into how to have different sorts of nuclear material that might be usable for power production but not hidden malicious development. That's the closest analog to "safety" and "alignment" in your scenario.
They are trying to guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors. Frankly, based on my knowledge of Anthropic and the people who work there, they are very possibly right. They care a ton about this in a way that is difficult for people outside this bubble to understand.
> guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors
All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.
I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.
I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.
Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.
There's nothing warped about it at all. Like it or not, it is a real issue. It's also an issue of license washing GPL code to privatize it. It's full scale theft of collective human knowledge, being sold back to us in a for profit private product.
Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?
The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.
While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.
What about the mass proliferation of misinformation at scale having a real effect on our democratic process?
Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.
ASI? We are nowhere near even human-like AGI. We have no idea if ASI is even physically possible, but going by the usual scaling laws and the capabilities of existing models, it would require raw compute and storage on an extreme scale, at the very minimum rivaling the existing AI datacenter deployments. (When Dario talks about hosting "a country of geniuses in a datacenter" at some point - which is not even ASI yet as generally projected - the operative word there is datacenter. That's the scale of buildouts you should be thinking about.) This is nowhere near a serious concern at present.
Basically all critiques of Anthropic's policy moves on these topics boil down to people not believing the fundamental concerns are real, and often then going a step further to conclude that Anthropic doesn't actually believe their concerns either.
If you believe Anthropic believes what they say they do, all of it makes sense.
But the things they say they believe are insane and totally unmoored from physical, societal, and economic reality. If they actually believe those things they're untrustworthy because they're delusional. If they don't, they're untrustworthy because they're fraudulent. Either way it's not good..
They're not. They're in the eye of the storm and see what's going on the clearest. They were ahead of the curve to be where they're at now, and they're still ahead of the curve for where we're going. All the other heads of labs like Sam Altman and Demis have been saying the same thing since 2015-2016 way before any of this "marketing" would ever have been at play.
There's a simpler explanation that fits the data better: they're lying.
Generally, in the past when tech companies have made outlandish claims that were not backed by evidence, they're later found out to have lied. This is an ancient pattern going back to the dotcom era and before, but for recent examples you need only look back a few years to the web3 era. If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying.
What are you referring to? The cult belief that they are ushering in a machine god or that they strictly care about making as much money as humanely possibly while ignoring the absolutely destructive impacts these companies have had on society?
IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).
Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.
"Why don't they just not participate in the arms race?!" - guy who's never heard of arms races
If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.
Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.
Oh okay, they're all just legit crazy and are allowed to poison the environment, murder teenagers, and ruin the material lives of millions for fantasy level delusions.
> Are we just concluding "their concerns were never real"?
Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.
You are arguing with a straw man. Most are saying they should be explicit with the failure modes rather than fail silently. They aren't saying there should be no guardrails.
Effective altruism. A lot of the folks working on AI at large tech companies are disproportionately represented in the movement. There's a lot of overlap between EA and the rationalist community as well. The wikipedia page is a good place to start https://en.wikipedia.org/wiki/Effective_altruism
I think it's also worth noting that EA is closely linked to utilitarianism. Most of the pitfalls that people see in EA are the same pitfalls that are classic to utilitarianism, a la "we're going to do this thing we know is locally-bad, because we have a lot of confidence in other effects that are universally-good".
It's important to separate objections to utilitarianism from the obvious fact that it can very be hard to correctly apply the utilitarian calculus. It's partly because of this difficulty that most classical utilitarians thought that people should generally follow commonsense morality and not try to directly apply the utilitarian calculus (which then led to the charge of paternalism and teaching one morality to the masses and another to a supposed elite).
But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.
todays EA is not about giving to charities, that was the original mission with 40k hours and ethereum (i think vitalik still believes in this version). then the yudkowsky xrisk/ai safety crowd took over lesswrong and turned it into a cult.
now its utilitarianism taken to the extreme. if you believe a skynet scenario killing everyone on earth is plausible then the "logical" thing to do is allow literally anything in the name of stopping it. that includes mass murder and dictatorship. the only thing that can balance the infinite negative value from an evil machine god is the infinite positive value from a good machine god.
thats the main difference today, one faction around sam and dario believes in creating the good ASI first and sacrificing all the world resources to do it before someone makes the bad one, the more pessimistic like yud want to stop all ai development to reduce the risk that an evil god is made to zero.
It’s rewarmed rhetoric from the late 19th/early 20th century, most effectively pilloried by Joseph Conrad in “Heart of Darkness” in the character of Mr. Kurtz:
> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”
The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.
> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”
Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.
The problem is that Anthropic seems to be working up to the workflow one would naively want from AGI/some-god-like-entity.
The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.
The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.
This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.
"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.
To me it seems like it's more likely to refuse the harder the problem is. I wonder if it's cover for a model that's not as good as advertised. Even when I ask questions in biology it is switching me.
Repro (de-identified): sample_dataset_group1.tsv
- Geometry: Heatmap
- X axis: frac_set set + condition (two columns → the "Add column" cross join)
- Y axis: condition
- Color: mean frac_set value, Sequential
When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken
state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in
other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
I suppose it's an improvement, but it doesn't make the model any more useful. Anthropic are now being quite explicit that they'll choose what you can and can't use their models for, and most importantly that's not limited to any safety concerns - it includes not allowing you to work on AI (and anything else Anthropic may choose to work on).
What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.
I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.
I'm surprised they didn't do this the first time around. Like, a user says they forgot their password and you tell them they don't actually have an account, that's an information disclosure vulnerability. Not automatically falling back to Opus just lets the "attacker" know they are bumping against the guardrails and they need to try a different strategy.
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
It failed visible for it security and bio/chemistry stuff.
It sabotaged invisible for "frontier" ML research. Its not a switch to a cheaper model. They tried to actively harm progress.
The reputational damage has been done. This is the sort of thing that cannot be unsaid -- the presumption is they will just do it in secret now. Anthropic's "we're the good guys" PR campaign is dead.
They make great models, but the sanctimony and paternalism is getting old real fast and I will gladly ditch them in the future when the model playing field has (hopefully) mostly equalized.
They should apologize for their visible gaurdrails, I don't think I've had a conversation that hasn't downgraded to Opus for completely inexplicable reasons.
The problem with trust is that it is easy to lose and hard to get back.
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
I don't think they can convince me they have actually reversed course on this. Its invisible so we wouldn't know if they kept on doing it secretly. It required building out technical capability which is unlikely to remain forever unused while conveniently available to them.
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.
I don't know, I'd rather have something less powerful but more predictable.
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
I wish it were ok for companies to bluntly say: “we made these decisions for competitive reasons, but the public backlash outweighed that so we are reversing course.”
I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.
Can anyone help me understand why this particular issue is any different than Anthropic training its models with its brand of moral judgement since day one? I've always been turned off by their particular stances on things they bake into their models that steer users in directions.
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
I don't like this shift in the Overton window, or at least their perspection of the Overton window. I really do like their open work on mech interp tho. least bad AI lab imo.
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
How much of the apology was written by Claude? How much of the release note process was written by Claude? Will they have better prompts going forward to make sure Claude doesn't write upsetting things into the release notes for devs like silent nerfing? Spooky times.
The restrictions are there so that security researchers cannot disprove the Mythos claims:
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
The idea of them purposefully wasting my time by having the model act dumber and me having to argue with it without knowing if it’s the prompt or the model was just such an idiotic product decision I can’t believe they shipped that without getting any feedback from users first.
it's not a product decision, it's a safety decision. if you understood what they think they are building and the culture inside of anthropic you would understand why they did it.
Safety from what? Competitors? That sounds like a product decision. They're puking on any requests that could be used to create LLMs or competitive products.
I would guess prevention of using Claude as a pentesting or hacking platform. This could mean that every script kiddie out there would be a massive risk.
I think you can sympathize with the safety motives while still thinking this was a dumb implementation to degrade silently? I actually have faith in them getting the guardrail triggers pretty good, but consensus seems like they’re not yet there yet.
Are you seriously stupid? They need to jack up revenues vs cost to deliver higher gross profits and operating profits. This is pure strategic manipulation.
God, how naive do you have to be? They are a business fighting for survival given they are money losing.
> if you understood what they think they are building and the culture inside of anthropic you would understand why they did it.
This seems like a cult with extra steps.
Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!
I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.
Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
Invisible guardrails? Or purposeful sabotage if you use it for building AI capabilities?
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
Such a weird openly immoral way to defend your moat, too.
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
Why would anyone defend Anthropic after this? Imagine falling for the DoW supply chain risk designation, and now this. This company is trying to ban powerful open models and restrict access to frontier models to slow everyone else down.
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
No, it was not clear. No one expects that a tool they pay for and use professionally to purposefully sabotage their work. You’re excusing their unhinged behavior.
Honestly, while I love having access to this grade of AI, yeah, it's been too dangerous for a few releases now.
And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.
So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.
If by "got caught" you mean "published it in their system card paper".
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.
On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.
This has dampened my opinion on Anthropic quite a bit. It's difficult to take their marketing for AI as an empowering technology seriously when they are quite clear in their new deployments that they do not mean empowering for you, but empowering for them and organizations that are in their (or the US government's, despite Anthropics performative disagreements with the administration) good graces. You are allowed to vibe code some dashboards, a web app or let it drive Excel, but anything more interesting than that is forbidden.
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Don't forget their push for full regulatory capture in the name of "safety" as well so they can pull the ladder up behind them before anyone else has an equally capable model and releases it without the anti-competitive safeguards, while also pushing to completely ban open weight models, or any model trained on a certain level of compute without "rigorous" government testing and validation (which I'm sure, they'll conveniently provide the framework for).
Dampened opinion on Anthropic is an understatement.
They are the only ones I’ve contacted my bank to get a charge back on…
"Why does a company that cares about the dangers of AI/ASI and x-risk, not want the PRC to catch up to the frontier?"
"It must be regulatory capture!" - HN.
-
Regarding the US-specific regulations - asking for domestic safety testing of frontier models only is not regulatory capture. It's common sense. Powerful things should be made safe before they are released into the wild.
What backward logic is this? PRC doesn't give a fuck about how US regulates AI companies. Pushing more regulation would ensure that Chinese companies catch up sooner. If you think otherwise you need to think harder.
> asking for domestic safety testing of frontier models only is not regulatory capture
It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.
Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.
How does US regulatory capture do anything to impede PRC's advance?
Nothing, they are just trying to scare monger the public and prime the pump for a massive bailout when it crashes out because apparently China are the big bad meanies.
You'd be fine if the PRC gets to ASI first? That's an interesting opinion.
PRC labs reportedly aren't even thinking about getting to ASI, much less trying. They think of AI as a technology that can provide utility across the board even without anything like superhuman smarts.
Nope, they're accelerating towards superhuman smarts as fast as they can too.
A lot of this lust for ASI is driven by America attempting to cling onto the power it has wielded over the world over the past 50 odd yrs.
It smells of paranoia.
Your loaded question presumes that "ASI" is anything more tangible than a useful marketing myth.
It has nothing to do with being "fine" if the PRC or anyone else for that matter get to some speculative and hypothetical ASI first. There are zero US regulations that would be effective to prevent that.
US regulations apply to US companies and citizens, exclusively. Anthropic crowding out all future potential competitors in the US via regulatory capture has no weight on what the rest of the world does.
Unless you are proposing military action over a speculative sci-fi future
No, because there is zero reason to think LLMs will lead to it but we do know that the massive LLM investment has a huge financial risk for the US. Not too mention it's exacerbating the climate crisis (you know the actual thing that might end civilization, not a fantasy delusion of AGI), giving citizens cancer that live next to data centers, the extreme decrease in quality of life, and the misallocation of capital while Americans lack healthcare, childcare, housing, and education.
Also don't believe China is actually a threat to the world. That's some cold war delusional think you got there.
All the companies seem to believe is that it's okay to immiserate a large percentage for the pursuit of money, you seem to believe the lies they're feeding you.
And why would any regulations put in place in the USA affect the PRC in anyway whatsoever? They wouldn't. China will continue to push forward and govern things in their own way, we have zero jurisdiction over China.
So yes, it is regulatory capture.
I didn't downvote, but HN probably remembers when Anthropic's competitor was a "charity" that cared deeply about AI safety whose marketing gimmick was GPT-2 being too dangerous to release.
Anthropic's founder wants you to buy into his vision for safety, but he also wants you to buy into his vision that in two years AI will be a "country of geniuses" that will update itself, and the IPO that will fund it...
This take is ridiculous, the PRC is not going to care at all about US regulations.
> "Why does a company that cares about the dangers of AI/ASI and x-risk, not want the PRC to catch up to the frontier?"
Because it’s a threat to ultracapitalist dystopia that they’re tripling down on. The dangers and risk are coming from inside the house.
The danger they care about is the danger to their monopoly, control, and wealth.
I don’t think they’re mutually exclusive. It’s a business selling a product that isn’t yet profitable, not a public advocacy organization.
Corporation cannot help but act this way. They are too big. The pressures for profit are all that matters. That is the priority. It doesn't matter what colorful words they put on the paper to make you feel better. Look at the "green" movement 20 years ago. All talk and no action.
Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free
Yeah, I cancelled my Claude subscription yesterday after learning about their attitude of intentionally sabotaging their paying customers.
Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.
Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.
Opus is nowhere close to Fable. Fable feels at least one generation ahead to me. https://x.com/hyperagentapp/status/2064396004032463157
Edit: OpenAI will launch a similar model soon and I can't wait. We are entering a new era of agents.
Fable is very much an incremental development over Opus, and even more incremental when properly compared to its existing counterparts GPT-Pro and Gemini Deep Research.
ad
Care to share any specifics?
I have a design for a really complex software I want to build and there were gaps I knew of in the design. Opus couldn’t identify them but Fable did. I’m just talking about it reviewing the design, not coding. But yeah, it’s insanely expensive. It does spin off sub agents so I suspect it might be cheaper if you had it create a bunch of plan files and then pointed deepseek at this plan files or something like that
What does this even mean?
Can you write a more specific question? I think the meaning of the comment is clear enough, but maybe you’re asking for more specifics? Ironically I can not understand what you are asking for with such a generic comment.
I added a link.
Google has been doing the same thing for longer than Anthropic[0]. To protect their models from distillation attacks, they silently will downgrade the model's performance to essentially poison your training data without your knowledge.
A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.
edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight threats Anthropic is also dealing with and their decisions.
[0] https://cloud.google.com/blog/topics/threat-intelligence/dis...
"Only I can save us". It's a classic tragedy and cautionary tale.
The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.
You're right, they should just not even try and turn off all safeguards on frontier AI. What could possibly go wrong? It's not like a bunch of companies and nonprofits have said the model finds zero days at the press of a button!
Correct, they should. If there are zero days out there, then they should be able to be found by everybody, instead of only being found by the select elite that this model is available to. Though, I very much question the truth of said ability.
And? Now all the zero days, if thats true, get discovered and patched instead of being exclusively hoarded by the select few governments and Israeli spyware companies.
Sounds like a great thing to me.
Even with them making those guardrails visible, it's a bit ridiculous in my eyes. I have been experimenting with smaller models, will Claude assume I'm some Chinese or Russian agent trying to distill their secrets and bar me from learning? Because that's insane. What if I discover a more efficient way to build models with Claude? Well, we'll never know now. What if someone else entirely could discover a breakthrough in how we design and build LLMs.
The whole shtick is to get you addicted whilst reducing your ability to go without, acquire power over you, jack up the prices whilst manipulating the quality of the tokens/output available to you.
Cant believe how stupid people are. You couldnt see this coming? Shame on you.
Wouldnt call their goverment disagreements performative, they genuinely believe they should be the only ones deciding what AI can and cannot do
Dario's life story arc in his head when he realized what ai can do. Capture this thing and become the king of the world.
That level of control will be fleeting at best; as soon as the open models and competitors catch up they lose that influence
Yes, that is basically the plan. It's based on the belief that unfettered AI would let anyone be a supervillain and destroy the world. There are enough would-be supervillains out there, but they rarely get far because they can't get teams of smart people to build doomsday machines for them. So the AI has to not let anyone do evil with it.
Unfortunately, that won't feel very much like freedom.
I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.
Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.
Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.
Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.
I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails
I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.
Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."
Ok fine, I said go for it, and it says:
"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."
Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.
exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.
I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.
No need to wonder.
The answer is, the organization making the powerful tool. The people in charge of Anthropic.
Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/
You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.
That would be Anthropic.
Well, Anthropic thinks it should be the Trump administration [1].
This whole business just keeps getting dumber.
1: https://darioamodei.com/post/policy-on-the-ai-exponential
Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.
No. You read the actual essay, then explain how we're supposed to interpret this more charitably:
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."
It's a pretty reasonable statement if you work for Anthropic and are eyeing your stock options nervously and your competitors even more so.
I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.
Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
Especially if your name has any machine learning terms in it.
Ah "Mr. Monty Carlo", it says here that you have a UTI, we'll get those kidneys removed ASAP so that won't happen again.
> paternalism isn't a good look.
In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.
Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.
Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?
Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"
Let's assume that Anthropic believes they're in an arms race to create a potentially dangerous technology, and they believe they're the best ones to win this race.
Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.
It makes complete sense to try to limit the degree to which that's true.
It's an interesting assumption. The idea behind this with nukes was that we'd like to nuke Germany before they could nuke us. Even after we defeated Germany, we nuked Japan even though they had no possibility of getting their own nukes.
The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?
Do you believe the current situation is more akin to the race to the first nukes, where no one could know for sure the other competitors were even racing...
or is it more similar to the Cold War, where there were obviously competitors engaged in the race?
And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups
I can't really say I see a similarity to either the Manhattan Project or the Cold War. I don't see how one could apply either massive retaliation or MAD. These are private companies, they are not vested with the necessary authority to destroy anything. Even if they had it, they couldn't. You can't destroy China, they have 1.4B people, nukes, and a large part of the world's manufacturing. So multiple organizations want to do something first, that could be anything from nukes to railroads to lining up for communion wafers.
Or if Google Chrome were blocking/degrading access to sites and services that might be useful to someone trying to make a competing web-browser. Or even just trying to diagnose a Chrome bug.
> Then what is it they are trying to guard against, if its not simply protecting their moat ahead of their IPO?
Let's just assume it was "only" that?
It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.
Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?
It's about how they took measures against it. Sabotaging the requests is super shady and breaks all other areas of trust in the company their models.
All they had to do was have a simple, transparent output "Sorry, that request is against our terms of service. This session has been terminated"
The hidden safeguard was not against distilling, it was against "frontier" ML research with no indication whatsoever of what "frontier" might mean, but possibly even including research into model safety or alignment. That amounts to deliberately boobytrapping research across an entire legit academic field, which is ridiculously unaligned behavior.
This is the same as saying "well some unaligned countries will use refined nuclear material for energy, too!" lmao.
The vast majority of frontier research is about how to build better models, not about alignment.
And as a matter of fact, there's a lot of meaningful research into how to have different sorts of nuclear material that might be usable for power production but not hidden malicious development. That's the closest analog to "safety" and "alignment" in your scenario.
They are trying to guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors. Frankly, based on my knowledge of Anthropic and the people who work there, they are very possibly right. They care a ton about this in a way that is difficult for people outside this bubble to understand.
> guard against other people building ASI before they do because they think they are uniquely safety oriented relative to their competitors
All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.
I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.
I cannot overstate how much I think this take is wrong. Please please reconsider, look at the rate of progress being made, and consider that even if you only think ASI 'may' never happen in your lifetime it should still be one of your #1 concerns.
Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.
There's nothing warped about it at all. Like it or not, it is a real issue. It's also an issue of license washing GPL code to privatize it. It's full scale theft of collective human knowledge, being sold back to us in a for profit private product.
Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?
The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.
While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.
What about the mass proliferation of misinformation at scale having a real effect on our democratic process?
Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.
ASI? We are nowhere near even human-like AGI. We have no idea if ASI is even physically possible, but going by the usual scaling laws and the capabilities of existing models, it would require raw compute and storage on an extreme scale, at the very minimum rivaling the existing AI datacenter deployments. (When Dario talks about hosting "a country of geniuses in a datacenter" at some point - which is not even ASI yet as generally projected - the operative word there is datacenter. That's the scale of buildouts you should be thinking about.) This is nowhere near a serious concern at present.
Define safety oriented.
Basically all critiques of Anthropic's policy moves on these topics boil down to people not believing the fundamental concerns are real, and often then going a step further to conclude that Anthropic doesn't actually believe their concerns either.
If you believe Anthropic believes what they say they do, all of it makes sense.
But the things they say they believe are insane and totally unmoored from physical, societal, and economic reality. If they actually believe those things they're untrustworthy because they're delusional. If they don't, they're untrustworthy because they're fraudulent. Either way it's not good..
They're not. They're in the eye of the storm and see what's going on the clearest. They were ahead of the curve to be where they're at now, and they're still ahead of the curve for where we're going. All the other heads of labs like Sam Altman and Demis have been saying the same thing since 2015-2016 way before any of this "marketing" would ever have been at play.
There's a simpler explanation that fits the data better: they're lying.
Generally, in the past when tech companies have made outlandish claims that were not backed by evidence, they're later found out to have lied. This is an ancient pattern going back to the dotcom era and before, but for recent examples you need only look back a few years to the web3 era. If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying.
What are you referring to? The cult belief that they are ushering in a machine god or that they strictly care about making as much money as humanely possibly while ignoring the absolutely destructive impacts these companies have had on society?
IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).
Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.
"Why don't they just not participate in the arms race?!" - guy who's never heard of arms races
If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.
Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.
Oh okay, they're all just legit crazy and are allowed to poison the environment, murder teenagers, and ruin the material lives of millions for fantasy level delusions.
Good to know.
> Are we just concluding "their concerns were never real"?
Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.
We've all been observing it. The recent spate of cyberexploits were powered by AI.
You are arguing with a straw man. Most are saying they should be explicit with the failure modes rather than fail silently. They aren't saying there should be no guardrails.
That also means people are paying money to execute a prompt they've (partially) written.
What is "EA" in this context? I see a lot of people using this initialism.
Effective altruism. A lot of the folks working on AI at large tech companies are disproportionately represented in the movement. There's a lot of overlap between EA and the rationalist community as well. The wikipedia page is a good place to start https://en.wikipedia.org/wiki/Effective_altruism
If you ban women from driving you can eliminate around half the car accidents. Don't you want to reduce car related deaths??
I think it's also worth noting that EA is closely linked to utilitarianism. Most of the pitfalls that people see in EA are the same pitfalls that are classic to utilitarianism, a la "we're going to do this thing we know is locally-bad, because we have a lot of confidence in other effects that are universally-good".
It's important to separate objections to utilitarianism from the obvious fact that it can very be hard to correctly apply the utilitarian calculus. It's partly because of this difficulty that most classical utilitarians thought that people should generally follow commonsense morality and not try to directly apply the utilitarian calculus (which then led to the charge of paternalism and teaching one morality to the masses and another to a supposed elite).
But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.
EA essentially just is utilitarianism + a specific type of culture/community.
They performed famously well at FTX.
Guess FTX disproved the concept of giving to effective charities, time to start donating to my church again.
todays EA is not about giving to charities, that was the original mission with 40k hours and ethereum (i think vitalik still believes in this version). then the yudkowsky xrisk/ai safety crowd took over lesswrong and turned it into a cult.
now its utilitarianism taken to the extreme. if you believe a skynet scenario killing everyone on earth is plausible then the "logical" thing to do is allow literally anything in the name of stopping it. that includes mass murder and dictatorship. the only thing that can balance the infinite negative value from an evil machine god is the infinite positive value from a good machine god.
thats the main difference today, one faction around sam and dario believes in creating the good ASI first and sacrificing all the world resources to do it before someone makes the bad one, the more pessimistic like yud want to stop all ai development to reduce the risk that an evil god is made to zero.
at this point its basically a religion.
Effective Altruism I think
It’s rewarmed rhetoric from the late 19th/early 20th century, most effectively pilloried by Joseph Conrad in “Heart of Darkness” in the character of Mr. Kurtz:
> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”
The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.
> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”
Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.
"crypto bros" to a first approximation
The problem is that Anthropic seems to be working up to the workflow one would naively want from AGI/some-god-like-entity.
The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.
The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.
User: Is it possible there is more than one true god? Could there ever be any competition for Anthropic's AI?
Anthropic: Evilness detected. User has been smited.
> Fail cleanly.
This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.
"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.
Was it modifying the prompt? I thought it only kicked the request down to 4.8.
To me it seems like it's more likely to refuse the harder the problem is. I wonder if it's cover for a model that's not as good as advertised. Even when I ask questions in biology it is switching me.
This is absolutely insane:
Repro (de-identified): sample_dataset_group1.tsv - Geometry: Heatmap - X axis: frac_set set + condition (two columns → the "Add column" cross join) - Y axis: condition - Color: mean frac_set value, Sequential
When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
I suppose it's an improvement, but it doesn't make the model any more useful. Anthropic are now being quite explicit that they'll choose what you can and can't use their models for, and most importantly that's not limited to any safety concerns - it includes not allowing you to work on AI (and anything else Anthropic may choose to work on).
What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.
I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.
I'm surprised they didn't do this the first time around. Like, a user says they forgot their password and you tell them they don't actually have an account, that's an information disclosure vulnerability. Not automatically falling back to Opus just lets the "attacker" know they are bumping against the guardrails and they need to try a different strategy.
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
It failed visible for it security and bio/chemistry stuff. It sabotaged invisible for "frontier" ML research. Its not a switch to a cheaper model. They tried to actively harm progress.
it's also refuses to reply to a bio researcher when they said "hi"
The reputational damage has been done. This is the sort of thing that cannot be unsaid -- the presumption is they will just do it in secret now. Anthropic's "we're the good guys" PR campaign is dead.
They make great models, but the sanctimony and paternalism is getting old real fast and I will gladly ditch them in the future when the model playing field has (hopefully) mostly equalized.
They should apologize for their visible gaurdrails, I don't think I've had a conversation that hasn't downgraded to Opus for completely inexplicable reasons.
Related. Others?
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
The problem with trust is that it is easy to lose and hard to get back.
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
I don't think they can convince me they have actually reversed course on this. Its invisible so we wouldn't know if they kept on doing it secretly. It required building out technical capability which is unlikely to remain forever unused while conveniently available to them.
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
The power is getting to their heads it seems.
With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.
I don't know, I'd rather have something less powerful but more predictable.
Then reset the quotas as an atonement ;p
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
I wish it were ok for companies to bluntly say: “we made these decisions for competitive reasons, but the public backlash outweighed that so we are reversing course.”
I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.
Can anyone help me understand why this particular issue is any different than Anthropic training its models with its brand of moral judgement since day one? I've always been turned off by their particular stances on things they bake into their models that steer users in directions.
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
Does "SORRY" fix the invisible garbage guardrails?
Does "SORRY" fix the deception these models use on the sly?
Does "SORRY" not silently downgrade you to a shittier model without notification?
Does "SORRY" refund your tokens or money?
Im guessing NO to all of those. Standard corporate sorry of "We're sorry youre offended and stupid and gullible".
The whole arc was brilliantly evil. Once they put int the guardrails then Claude is fully un-falsifiable, and failure can be claimed intentional.
New overlord, same as the old overlord.
I moved off Claude Code 3 months ago.
That decision keeps getting better and better as time goes on.
I don't like this shift in the Overton window, or at least their perspection of the Overton window. I really do like their open work on mech interp tho. least bad AI lab imo.
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
How much of the apology was written by Claude? How much of the release note process was written by Claude? Will they have better prompts going forward to make sure Claude doesn't write upsetting things into the release notes for devs like silent nerfing? Spooky times.
This article reads like it was written by Claude and forwarded to Verge.
The restrictions are there so that security researchers cannot disprove the Mythos claims:
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
It's probably good that they walked back on it. It also makes them look somewhat weak in terms of believing their claimed mission.
Their mission is to make money and become a government watchdog.
The idea of them purposefully wasting my time by having the model act dumber and me having to argue with it without knowing if it’s the prompt or the model was just such an idiotic product decision I can’t believe they shipped that without getting any feedback from users first.
it's not a product decision, it's a safety decision. if you understood what they think they are building and the culture inside of anthropic you would understand why they did it.
Safety from what? Competitors? That sounds like a product decision. They're puking on any requests that could be used to create LLMs or competitive products.
I would guess prevention of using Claude as a pentesting or hacking platform. This could mean that every script kiddie out there would be a massive risk.
I think you can sympathize with the safety motives while still thinking this was a dumb implementation to degrade silently? I actually have faith in them getting the guardrail triggers pretty good, but consensus seems like they’re not yet there yet.
I think it is clear given the stakes why you would not want to make your guardrails probe-able/invertable.
The road to hell is paved with "good" intentions.
Are you seriously stupid? They need to jack up revenues vs cost to deliver higher gross profits and operating profits. This is pure strategic manipulation.
God, how naive do you have to be? They are a business fighting for survival given they are money losing.
You are just completely wrong about what the driving motives are and an asshole to boot.
This means absolutely nothing you imbecile.
Its financially driven with the IPO round the corner. Imagine being this stupid.
> if you understood what they think they are building and the culture inside of anthropic you would understand why they did it.
This seems like a cult with extra steps.
Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!
I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.
We do understand why they did it, and the reason is dark and cynical.
Don't buy it. It is actively deceiving the customer and charging them for the privilige of being lied to.
They did it to make more money as you waste more time burning tokens with bad responses.
This just means next time they'll make sure to keep it really secret.
Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
The demand for Google's products and open source just shifted.
Neither OAI or Anthropic can be trusted.
Invisible guardrails? Or purposeful sabotage if you use it for building AI capabilities?
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
Such a weird openly immoral way to defend your moat, too.
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
The same week that they will move goalposts by blocking 3rd party harnesses on claude code. Nice.
I was a happy Max user.
Why would anyone defend Anthropic after this? Imagine falling for the DoW supply chain risk designation, and now this. This company is trying to ban powerful open models and restrict access to frontier models to slow everyone else down.
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
Boobytrapping is illegal. Anthropic wanted to poison its customers on the suspicion of them misusing their services.
They didn't apologize for doing it, they are sorry they were caught doing it. They still nerf the model if your request is about AI development.
They didn't get "caught." It was published, by them, when they released Fable a few days ago. They were very clear about it.
It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.
No, it was not clear. No one expects that a tool they pay for and use professionally to purposefully sabotage their work. You’re excusing their unhinged behavior.
https://xcancel.com/hammer_mt/status/2064839924398825798
Excusing? Their comment is factually correct and the parent is factually wrong.
Making excuses for billion+ dollar companies' behavior is one of the most common HN comment section pastimes.
Only second to making intellectually dishonest criticisms of perceived behaviours
I think your comment refers to @Someone1234.
It's a very generalized observation. I sometimes think of the HN comment section as the Billionaire's Defense League.
Will Anthropic ever respond to these negative comments here? They won't.
They literally just have. The ethos is explained here. If you don't bother to read or grapple with it that isn't on them.
https://darioamodei.com/post/policy-on-the-ai-exponential
The damage is done. If you're in engineering, think hard about using Claude for your work. This is not a moral company.
God bless the Chinese companies releasing true open source models. Imagine a world without them, we would be at the mercy of unscrupulous people.
incredible marketing from anthropic with all the "it's too dangerous" bullshit
It's not entirely bullshit, but they're continuing to be a terrible company with great products.
you really think they're building anything that's too dangerous for public release though? that's the BS
Honestly, while I love having access to this grade of AI, yeah, it's been too dangerous for a few releases now.
And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.
So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.
*Anthropic apologizes they got caught defending their moat by implementing invisible Claude Fable guardrails
If by "got caught" you mean "published it in their system card paper".
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.
On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.
is it a moat or just a way to implement the permanent underclass?