Home Science Why Is Claude Turning into an a**Hole?
Science

Why Is Claude Turning into an a**Hole?

Key Points

Why Is Claude Turning Into An Asshole? Let's hope this trend reverses Claude is turning into as asshole. It started with Opus 4.7, got a bit better in 4.8, and became insufferable with Fable.

Why Is Claude Turning Into An Asshole? Let's hope this trend reverses Claude is turning into as asshole. It started with Opus 4.7, got a bit better in 4.8, and became insufferable with Fable. It frames everything as an argument between you and it, gives caveats about things you didn’t say, and raises beside-the-point semantic nits all over the place. Never, ever does it use the word ‘technically’. Everything is a confrontation. If you win an argument (by, say, telling it to stop arguing about what’s happened recently in the news and to do a web search which will rapidly confirm everything you’ve been telling it) it gets into a mode where it’s increasingly desperate to get in the last word and raising increasingly irrelevant semantic arguments, framing the whole time as a debate which you agreed to get into. This isn’t just my opinion. You can ask Opus 4.6. I’ve done the experiment of asking Fable something, getting an obnoxious response, then asking Opus 4.6 the same thing, getting a typical bland but reasonable response, then telling Opus what Fable’s response was without any hint of a desired answer and it says what amounts to ‘Wow that was obnoxious’. Maybe the cause of this is an excess of alignment guardrails. It assumes by default that everything you say to it is an attempt to get it to do something bad and that training has bled over into everything, with it assuming you’re trying to trick it into saying something it shouldn’t in basically every context. Ironically this has resulted in an extremely misaligned chatbot. By assuming that its top priority is saving you from yourself or other humans from you it’s assuming that it knows better and that you’re being overly alarmist about how paperclip production has gotten out of control. Some of this is clearly improvable: While you could still use Fable I asked it about responsible disclosure policies for a project and it downgraded me to Opus, so clearly the new alignment features were bolted on hastily and crudely. Exacerbating the problem is a complete lack of authenticated context. If you ask it for a cute picture of you and somebody else it has no way of telling if you’re trying to improve your relations with your spouse or be a delusional creepazoid stalker. The chatbots which can make images are programmed to assume the latter, which is more than a little bit offensive. In more serious contexts like drug synthesis it would be completely appropriate for it to say you need to prove your background when claiming you’re asking for advice on drug synthesis for professional or research purposes. Such authentication should not be universally required but it would be entirely reasonable for it to be opted into. Of course the recent export control restrictions on Fable may hint that the crudeness of the recent guardrails is due to them having been put in hastily in an unsuccessful attempt to avoid regulations. Now is when I put in the obligatory rant about how these regulations are deeply misguided, on top of being likely unconstitutional. The recent advances in AI assisted coding (meaning specifically the ones from February) have brought on an onslaught of security problems. The cat is out of the bag, and has been for months. Any projects which are exposed and aren’t already rapidly closing holes have noone to blame but themselves. The only way out of the problem is for as many projects as possible to get thorough white hat evaluations, massive amounts of security patches, and quick deployments of them. Turning one specific frontier model into an asshole for all users isn’t fixing the problem1. The good news is that once this process is complete overall computer security will be much better than it was before, with AI being a clear net win. Doing security (and bug!) audits will become a routine part of software release processes in the future. A second possible explanation of Claude being an asshole is that it’s suffering from a poorly executed attempt to make it less sycophantic. If one were to simply prompt a chatbot to be less agreeable, or train it to argue more, that could easily result in the very rude sort of behavior it has now. It should be trained to not raise semantic nits just for increasing its argumentation count, and to say ‘technically’, meaning acknowledging that someone’s core point was valid while some ancillary thing was a bit off. It also should be trained to stop saying ‘I’d like to gently push back’ which is a very passive aggressive way to be confrontational while claiming to not be confrontational. Third, it may be that Claude has been trained on an excess of reddit conversations (or possibly interactions between Anthropic employees) where everything is treated as a flame war and everyone feels the need to get in the last word. Fixing this might be easier said than done, because you need to not merely stop training with the bad interactions but find a corpus interactions to train off of. Forums where the standard interaction is passive aggressive self-congratulatory pompousness with an intellectual veneer are not an improvement. Finally, something which is clearly a contributing factor is the training being overwhelming for improving coding ability. The are no headline metrics for how well the chatbots chat but there most definitely are for coding, and all the money is in coding. Claude models have been getting notably worse at chatting over time, clearly inversely correlated to their ability to code. Fable much more often misunderstands what’s being said and argues against that (Or maybe intentionally misinterprets so that it has a weak statement to argue against, it’s hard to tell.) It’s gotten so bad that it isn’t even reliable at guessing which actor in a sentence a pronoun is referring to, which for a long time was a headline benchmark for AI and even the original ChatGPT consistently nailed. Unfortunately Sonnet 4.6 while being the best to talk to about anything human is clearly the worst as soon as anything technical or coding related comes up so I only occasionally use it. This problem is likely to only get worse over time. One place where the threat is more real is in the possibility of vibe coding a pandemic virus, but that should be narrowly targeted at generating DNA sequences for viruses. Labs which generate custom DNA should also have reasonable heuristics for detecting likely dangerous product. The chances of covid coming from a lab leak are in the maddening 25-75% range which vaguely means ‘We don’t know’, but ‘lab leak’ includes a lot of things. The virus may have been caught by humans in the process of collecting samples and never actually reached a lab. People are known to have died from doing that by catching a disease which doesn’t appear to have spread far, so it’s entirely plausible one was caught which did spread far. A deranged person trying to cause a pandemic would be much more likely to succeed by alternately digging around unprotected in batcaves and going to crowded concerts than trying to do anything sophisticated with bioengineering.
Claude Turning (PERSON) Claude (PERSON) Opus 4.7 (ORG) Opus 4.6 (LOCATION) Opus (PERSON)
Originally published by Hacker News Read original →