philip lelyveld The world of entertainment technology

9May/23Off

AI gains “values” with Anthropic’s new Constitutional AI chatbot approach

On Tuesday, AI startup Anthropic detailed the specific principles of its "Constitutional AI" training approach that provides its Claude chatbot with explicit "values." It aims to address concerns about transparency, safety, and decision-making in AI systems without relying on human feedback to rate responses.

Claude is an AI chatbot similar to OpenAI's ChatGPT that Anthropic released in March. ...

The company has published the complete list on its website. ...

Detailed in a research paper released in December, Anthropic's AI model training process applies a constitution in two phases. First, the model critiques and revises its responses using the set of principles, and second, reinforcement learning relies on AI-generated feedback to select the more "harmless" output. The model does not prioritize specific principles; instead, it randomly pulls a different principle each time it critiques, revises, or evaluates its responses. "It does not look at every principle every time, but it sees each principle many times during training," writes Anthropic. ...

But even the most impartial observer cannot help but notice Anthropic's constitutional selections reflect a decidedly progressive angle that might not be as universal as Anthropic hopes. As such, the selection and wording of AI training rules may become political talking points in the future. ...

It's worth noting that, technically, a company training an AI language model using Anthrophic's technique could tweak its constitutional rules and make its outputs as sexist, racist, and harmful as possible. However, the company did not discuss that prospect in its announcement. ...

See the full story here: https://arstechnica.com/information-technology/2023/05/ai-with-a-moral-compass-anthropic-outlines-constitutional-ai-in-its-claude-chatbot/

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.