philip lelyveld The world of entertainment technology

6Jul/23Off

OpenAI commits to ‘superalignment’ research

... Dubbed “superalignment”, OpenAI, which makes ChatGPT and a range of other AI tools, says there needs to be both scientific and technical breakthroughs to steer and control AI systems that could be considerably more intelligent than the humans that created it. To solve the problem OpenAI will dedicate 20% of its current compute power to running calculations and solving the alignment problem. ...

Current AI alignment techniques, used on models like GPT-4 – the technology that underpins ChatGPT – involve reinforcement learning from human feedback. ...

This all means that the current techniques and technologies will not scale up to work with superintelligence and so new approaches are needed. “Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence,” the pair declared. ...

OpenAI has set out three steps to achieving the goal of creating a human-level automated alignment researcher that can be scaled up to keep an eye on any future superintelligence. This includes providing a training signal on tasks that are difficult for humans to evaluate – effectively using AI systems to evaluate other AI systems. They also plan to explore how the models being built by OpenAI generalise oversight tasks that it can’t supervise. 

There are also moves to validate the alignment of systems, specifically automating the search for problematic behaviour externally and within systems. Finally the plan is to test the entire pipeline by deliberately training misaligned models, then running the new AI trainer over them to see if it can knock it back into shape, a process known as adversarial testing. ...

See the full story here: https://techmonitor.ai/technology/ai-and-automation/ai-alignment-openai-superintelligence-superalignment

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.