Statistical Model for the Ai Alignment Problem

Why AI must embrace uncertainty to stay aligned with humans

The paper addresses the AI shutdown problem, a long-standing challenge in AI safety. The shutdown problem asks how to design AI systems that will shut down when instructed, will not try to prevent ...

Devdiscourse

One-way AI alignment no longer works in generative AI world: Here's why

The authors argue that generative AI introduces a new class of alignment risks because interaction itself becomes a mechanism of influence. Humans adapt their behavior in response to AI outputs, ...

Hosted on MSN

The Human-AI Alignment Problem

We’re now deep into the AI era, where every week brings another feature or task that AI can accomplish. But given how far down the road we already are, it’s all the more essential to zoom out and ask ...

HUB

Gillian K. Hadfield named Bloomberg Distinguished Professor of AI Alignment and Governance

In a world where machines and humans are increasingly intertwined, Gillian Hadfield is focused on ensuring that artificial intelligence follows the norms that make human societies thrive. "The ...

Fast Company

Are large language models the problem, not the solution?

There is an all-out global race for AI dominance. The largest and most powerful companies in the world are investing billions in unprecedented computing power. The most powerful countries are ...

Futurism

OpenAI Tries to Train AI Not to Deceive Users, Realizes It’s Instead Teaching It How to Deceive Them While Covering Its Tracks

OpenAI researchers tried to train the company’s AI to stop “scheming” — a term the company defines as meaning “when an AI behaves one way on the surface while hiding its true goals” — but their ...

ZDNet

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

The Conversation

AI systems can easily lie and deceive us – a fact researchers are painfully aware of

Armin Alimardani previously held a part-time contract with OpenAI as a consultant. The organisation had no input into this piece. The views expressed are solely those of the author. In the classic ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results