philip lelyveld The world of entertainment technology

1Jun/26Off

These AI models are free, private, and will never say ‘no’

... another type of AI model will never refuse to provide what the user asks for. In recent months, these models have become more accessible and popular.

"Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice, an AI security company that has conducted red-teaming and safety evaluation for AI model developers. ...

But there's a whole other class of AI models whose guardrails are much easier to strip away. They're known as open-weight models. Some are made by tech giants, such as OpenAI and Alibaba, while others are put out by smaller outfits like China's DeepSeek. Like their better-known proprietary counterparts, many possess advanced capabilities such as writing functional code or generating life-like images. Unlike with ChatGPT, Claude or Gemini, it's easier to permanently remove their built-in safety guardrails – and the companies behind them have no idea how they're being used. ...

Safety guardrails of open-weight models can be weakened or removed in many ways. This is largely because the model developers have made what's known as the model weights available to the public. Model weights are sets of parameters, like knobs and dials in a machine, telling the models how to process information.

One recently developed method called "abliteration" has caught the attention of AI and national security researchers. By tweaking model weights, people can take away the model's ability to say "no."

Hugging Face, which hosts open-source AI models, currently lists over 6,000 abliterated models, compared to about 600 in 2024.  ...

Models without guardrails can be both useful and dangerous

It is difficult to get a comprehensive picture of how people are using open-weight models, because these models are run locally on users' computers, and don't need the internet to function. ...

An individual in a pro-ISIS chat room claimed they used an "uncensored" AI to research the amount and type of explosives needed to destroy "Trump Tower in the U.S.," according to the Counter Extremism Project, a nonprofit that focuses on counterterrorism. ...

See the full story here: https://www.npr.org/2026/05/31/nx-s1-5816391/ai-safety-concerns-danger-open-weight-models-risks



Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.