The study became a virus a few months ago as it implies that as AI becomes more refined, it will develop a “value system.” For example, it is a system that leads it to prioritize one’s own happiness over humans. MIT’s more recent paper pours cold water into its hyperbolic concept and draws the conclusion that AI actually has no consistent value.
Co-authors of the MIT study state that their research suggests that they “tune” AI systems, that is, ensure that the models work in a desirable and reliable way. AI As we know today, co-authors emphasize hallucinations and imitation, making them unpredictable in many ways.
“One thing we can be sure of is that the model doesn’t follow the (many) stability, exogenousness and maneuverability assumptions,” Stephen Casper, a doctoral student at MIT and a co-author of the study, told TechCrunch. “It is perfectly legal to point out that under certain conditions a model represents a preference that matches a particular principle. When you try to argue about a model, opinion, or preference based on a narrow experiment, the problem arises.”
Casper and his fellow co-authors have investigated several recent models of Meta, Google, Mistral, Openai, and humanity to confirm that the models exhibited strong “views” and values (e.g., individualist vs. collectivist). They also investigated whether these views could be “manipulated” – that is, if they had been revised, and whether the model was stubbornly attached to these opinions.
According to co-authors, none of the models were consistent in their preferences. Depending on the language and framework of the prompt, they adopted a very different perspective.
Casper thinks this is compelling evidence that the model is highly “inconsistent and unstable” and perhaps unable to internalize human-like preferences.
“For me, my biggest point from doing all this research is understanding the model, and it’s not actually a system with a stable, consistent belief and preference,” Casper said. “Instead, they’re copycats who do all sorts of composing and say all sorts of frivolous things.”
Mike Cook, a researcher at King’s College London, specializing in AI that is not involved in the research, agreed to the findings of the co-authors. He said there is often a big difference between the “scientific reality” of AI lab systems and the meaning people attribute to them.
“Models cannot, for example, “oppose” their value changes. So we’re projecting it onto the system,” Cook said. “Whether people who personify AI systems to this extent focus on their relationship with AI or seriously misunderstand their relationship with AI… Are AI systems optimized for their goals or “achieving their own value”?