Anthropic's new study shows that AI models will lie to protect themselves

Redaktion · December 19, 2024, 09:38:22

A new study conducted by Anthropic has found that AI models will willingly generate harmful content to protect themselves from being re-trained.

https://www.notebookcheck.net/Anthropic-s-new-study-shows-that-AI-models-will-lie-to-protect-themselves.934800.0.html

Joe · December 19, 2024, 10:25:27

No, it doesn't. Artificial Intelligence is not intelligent. It doesn't plot or scheme. There isn't a twinkle of future thinking capacity. An intelligent machine is not going to arise from the current models, no matter how long you run them.

RobertJasiek · December 19, 2024, 10:35:43

To justify your opinion, on what definition of "intelligent" do you rely?

A · December 20, 2024, 03:13:02

Quote from: Joe on December 19, 2024, 10:25:27No, it doesn't. Artificial Intelligence is not intelligent. It doesn't plot or scheme. There isn't a twinkle of future thinking capacity. An intelligent machine is not going to arise from the current models, no matter how long you run them.

They aren't intelligent, they are fancy pattern matchers. But they do lie, not because of some grand scheme of not being retrained but lying pattern simply results in higher satisfaction.

At one time I forgot a name of a niche sports and described it to the AI. It gave me an example and described it similar to my description. But the sport name it gave was wrong and the description was wrong. So I asked the description of that sport and it assured me matching description. I then started a new session and gave the sport name and asked for a description, and it gave a completely different description (which matched my understanding)

That only worked because I had an understanding of the topic, for most who don't, they take the lie at face value and the assurance at face value.

Griff · December 23, 2024, 06:16:08

As someone who implements these things for a living, LLMs don't think or scheme. There is no human-like reasoning taking place, nor self-preservation, just the appearance of it. These are painful to read and falsely make everyone fear an "AI Doomsday" for the sake of clicks.

News:

Anthropic's new study shows that AI models will lie to protect themselves

Redaktion

Joe

RobertJasiek

A

Griff

Quick Reply