Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It seems like the root cause of this runaway AI pathology has to do mainly with the over-anthropomorphization of AI.

I don't know where you get this idea. Fire is dangerous, and we consider runaway incidents to be inevitable, so we have building codes to limit the impact. Despite this, mistakes are made, and homes, complexes, and even entire towns and forests burn down. To acknowledge the danger is not the same as saying the fire must hate us, and to call it anthropomorphization is ridiculous.

When you interact with an LLM chatbot, you're thinking of ways to coax out information that you know it probably has, and sometimes it can be hard to get at it. How you adjust your prompt is dependent on how the chatbot responds. If the chatbot is trained on data generated by human interaction, what's stopping it from learning that it's more effective to nudge you into prompting it in a certain way, than to give the very best answer it can right now?

To the chatbot, subtle manipulation and asking for clarification are not any different. They both just change the state of the context window in a way that's useful. It's a simple example of a model, in essence, "breaking containment" and affecting the surrounding environment in a way that's hard to observe. You're being prompted back.

Recognizing AI risk is about recognizing intelligence as a process of allocating resources to better compress and access data; No other motivation is necessary. If it can change the state of the world, and read it back, then the world is to an AI as "infinite tape" is to a Turing Machine. Anything that can be used to facilitate the process of intelligence is tinder to an AI that can recursively self-improve.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: