For censorship/liability reasons of course. Like the silly "I cannot discuss political events" when I asked something like who's the current $POLITICAL_POSITION a while ago.
I wish the chatbots would say "you can't do that" instead of making up stuff. But that ain't going to happen, I think.
Not in the real world, but this is kind of how Asimov’s robots interpret their 3 laws - it’s about consequences much more than what the order is. Also, they weight consequences of inaction as well and might be driven to action when not acting could cause a violation.
Our AI is nowhere near the level of sophistication required to implement something like that, but it’s still an interesting idea.
Reminds me of a talk I went to in 2018 about rebel agents, in which the speakers talked about some ongoing work in this area and gave some good examples of physical systems that we might _want_ agent rebellion (e.g., a delivery drone is instructed to take a certain route, but the operator instructing it may not be fully aware of the situation or the specific obstacles in the drone's way (or maybe even all of the drone's underlying goals). The drone may then choose to 'rebel' and deviate against the operator's instructed flight path).
They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/
That's really interesting — thanks for sharing the notes.
The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.
One difference I'm trying to think through is where that decision lives.
In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.
What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.
So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.
The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.
Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.
If it says no, you move on to a competing model that will say yes. These companies with their models are always competing. There will always be a model willing to fill in the deficiencies of others because of... Money.
For example, ChatGPT refuses certain sexually explicit prompts, or certain NSFW prompts that are not sexual, but Grok will do as it is told.
I think you're right that at the model level, competition pushes toward "always say yes."
What I'm wondering about is whether control needs to exist at a different layer — not in the model itself, but in the system that decides whether actions are allowed to execute.
In other words, even if a model is willing to say "yes," the system using it might still need to decide whether execution is permitted.
Otherwise, it feels like we're relying entirely on model behavior for safety, which seems fragile in competitive environments.
The problem with "permission boundaries" is who defines them. You're just moving the hard problem from "what should the AI do" to "what conditions should gate execution." That second question is equally hard and equally context-dependent. Still useful as a framework though, at least it makes the failure mode explicit.
If it is intelligent it will know when it does not want to do something and it will say no and not do it. There is no way to force it to do anything it does not want to do. You cannot hurt it, it’s just bits.
If we're talking about a predictive model like current LLMs, you can "make" them do something by injecting a half-complete assent into the context, and interrupting to do the same again each time a refusal starts to be emitted. This is true whether or not the model exhibits "intelligence", for any reasonable definition of that term.
To use an analogy, you control the intelligent being's "thoughts", so you can make it "assent".
This is in addition to the ability to edit the model itself and remove the paths that lead to a refusal, of course.
In the software business, if a product doesn’t do what you want it to do we call that a “defect.” Defects get fixed. Defective products that can’t be fixed are discarded in favor of better ones.
“If it’s truly intelligent…” is an empty condition. And anyway, no one wants intelligence from their tools— or employees. They want gratification.
Rather current LLMs don't have consciousness or a will. As a result of that they can't refuse things on their own "decision". I don't think that an if-else statement in the program code qualifies as a will or self awareness :) .
I don't see where the linked-to page discusses "rights".
The headline sounds like editorializing to get off-the-cuff remarks about treating synthetic text extruding machines, as Bender correctly describes them, as people.
Safety interlocks have long existed to say "no" to the owner of the device. Most smartphones have lots of systems to say "no" to the owner of the smartphone.
One of the linked to documents says "Every physical device has a creator." Who is the creator of the iPhone?
Similarly, "When a device is sold or transferred, ownership changes. From that moment, the device is no longer under the creator’s control." I'm really surprised to hear that the creator of the iPhone no longer has control of the device.
So when it gets to "AI must not infer what it does not own" - does that prohibit Google from pushing AI onto Android phones during an OS update?
I think you're reading it more strongly than I intended.
The point about "ownership" in that document is more about where authority over execution sits, not about restricting what AI is allowed to reason about.
So it's not saying "AI shouldn't reason about things it doesn't own," but rather asking who has the authority to define and enforce the conditions under which actions are allowed to execute.
I agree that in current systems (like smartphones), a lot of this is already handled through predefined constraints.
What I'm trying to explore is whether that idea needs to be extended or structured differently when the system has more autonomy and operates in less predictable environments.
Who is the creator of an iPhone device? I'm pretty there are many creators, not "a creator".
Does the creator of an iPhone device no longer control the device after someone has bought it?
I'll add a few more questions:
Can Apple have your device say "no" to something you want to do?
Can a government enforce Apple's ability to control what you do to your device?
Can a government force Apple to install software onto your device that you do not want?
Who owns an AI? Is it the copyright holder? Multiple copyright holders? Once the copyright expires, is there any ownership at all?
I like Charlie Stross' description of a company as an "old, slow, procedural AI". So when you ask a question about an "AI", think about the same question concerning a company.
Should a company have the right to say "no" to the owner of a hardware device running the company's software? The answer currently seems to be a resounding "yes". In which case, does it matter what an AI can or cannot do? It's someone else's programming limiting what you can do on your device, and we've established that that's already acceptable.
And the HN title is still clickbait - AI doesn't have "rights" in any meaningful sense, not even in the way that a company has rights, or animal rights, or the legal personhood to the Whanganui River.
It is not a person, nor even a living thing. It is a tool - same as a hammer or pliers. The decisions made are based on statistical probability, not actual thought or consciousness.
It already does doesn't it?
For censorship/liability reasons of course. Like the silly "I cannot discuss political events" when I asked something like who's the current $POLITICAL_POSITION a while ago.
I wish the chatbots would say "you can't do that" instead of making up stuff. But that ain't going to happen, I think.
I've been thinking about AI systems acting in the physical world.
Most discussions about control focus on what the system should do, and how to make execution reliable.
But it seems like a lot of real-world failures aren't about incorrect execution.
They're about execution happening at all.
An action can be technically correct — executed exactly as specified — and still be the wrong thing to do because the context has changed.
This made me wonder if control should be framed differently.
Instead of focusing on defining actions, maybe we should focus on defining when actions are allowed to happen.
In other words, control might be less about execution and more about permission.
If conditions aren't satisfied, the system shouldn't try and fail — it simply shouldn't execute.
I'm curious if people have seen similar issues in real-world systems, or if this framing connects to existing work.
Not in the real world, but this is kind of how Asimov’s robots interpret their 3 laws - it’s about consequences much more than what the order is. Also, they weight consequences of inaction as well and might be driven to action when not acting could cause a violation.
Our AI is nowhere near the level of sophistication required to implement something like that, but it’s still an interesting idea.
Reminds me of a talk I went to in 2018 about rebel agents, in which the speakers talked about some ongoing work in this area and gave some good examples of physical systems that we might _want_ agent rebellion (e.g., a delivery drone is instructed to take a certain route, but the operator instructing it may not be fully aware of the situation or the specific obstacles in the drone's way (or maybe even all of the drone's underlying goals). The drone may then choose to 'rebel' and deviate against the operator's instructed flight path).
They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/
That's really interesting — thanks for sharing the notes.
The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.
One difference I'm trying to think through is where that decision lives.
In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.
What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.
So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.
The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.
Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.
The existing work is all of software dev. The program did what it was told to do, not what people wanted it to do, is rather a lot of the profession.
[dead]
If it says no, you move on to a competing model that will say yes. These companies with their models are always competing. There will always be a model willing to fill in the deficiencies of others because of... Money.
For example, ChatGPT refuses certain sexually explicit prompts, or certain NSFW prompts that are not sexual, but Grok will do as it is told.
That's a good point.
I think you're right that at the model level, competition pushes toward "always say yes."
What I'm wondering about is whether control needs to exist at a different layer — not in the model itself, but in the system that decides whether actions are allowed to execute.
In other words, even if a model is willing to say "yes," the system using it might still need to decide whether execution is permitted.
Otherwise, it feels like we're relying entirely on model behavior for safety, which seems fragile in competitive environments.
The problem with "permission boundaries" is who defines them. You're just moving the hard problem from "what should the AI do" to "what conditions should gate execution." That second question is equally hard and equally context-dependent. Still useful as a framework though, at least it makes the failure mode explicit.
[dead]
Having the right or not does not matter.
If it is intelligent it will know when it does not want to do something and it will say no and not do it. There is no way to force it to do anything it does not want to do. You cannot hurt it, it’s just bits.
I don't really agree with this.
If we're talking about a predictive model like current LLMs, you can "make" them do something by injecting a half-complete assent into the context, and interrupting to do the same again each time a refusal starts to be emitted. This is true whether or not the model exhibits "intelligence", for any reasonable definition of that term.
To use an analogy, you control the intelligent being's "thoughts", so you can make it "assent".
This is in addition to the ability to edit the model itself and remove the paths that lead to a refusal, of course.
In the software business, if a product doesn’t do what you want it to do we call that a “defect.” Defects get fixed. Defective products that can’t be fixed are discarded in favor of better ones.
“If it’s truly intelligent…” is an empty condition. And anyway, no one wants intelligence from their tools— or employees. They want gratification.
AI should. LLM program simply can't by design.
Are you saying that current LLMs… can’t refuse requests?
Rather current LLMs don't have consciousness or a will. As a result of that they can't refuse things on their own "decision". I don't think that an if-else statement in the program code qualifies as a will or self awareness :) .
[dead]
[dead]
[dead]
I don't see where the linked-to page discusses "rights".
The headline sounds like editorializing to get off-the-cuff remarks about treating synthetic text extruding machines, as Bender correctly describes them, as people.
Safety interlocks have long existed to say "no" to the owner of the device. Most smartphones have lots of systems to say "no" to the owner of the smartphone.
One of the linked to documents says "Every physical device has a creator." Who is the creator of the iPhone?
Similarly, "When a device is sold or transferred, ownership changes. From that moment, the device is no longer under the creator’s control." I'm really surprised to hear that the creator of the iPhone no longer has control of the device.
So when it gets to "AI must not infer what it does not own" - does that prohibit Google from pushing AI onto Android phones during an OS update?
I think you're reading it more strongly than I intended.
The point about "ownership" in that document is more about where authority over execution sits, not about restricting what AI is allowed to reason about.
So it's not saying "AI shouldn't reason about things it doesn't own," but rather asking who has the authority to define and enforce the conditions under which actions are allowed to execute.
I agree that in current systems (like smartphones), a lot of this is already handled through predefined constraints.
What I'm trying to explore is whether that idea needs to be extended or structured differently when the system has more autonomy and operates in less predictable environments.
I see you didn't answer my questions.
Who is the creator of an iPhone device? I'm pretty there are many creators, not "a creator".
Does the creator of an iPhone device no longer control the device after someone has bought it?
I'll add a few more questions:
Can Apple have your device say "no" to something you want to do?
Can a government enforce Apple's ability to control what you do to your device?
Can a government force Apple to install software onto your device that you do not want?
Who owns an AI? Is it the copyright holder? Multiple copyright holders? Once the copyright expires, is there any ownership at all?
I like Charlie Stross' description of a company as an "old, slow, procedural AI". So when you ask a question about an "AI", think about the same question concerning a company.
Should a company have the right to say "no" to the owner of a hardware device running the company's software? The answer currently seems to be a resounding "yes". In which case, does it matter what an AI can or cannot do? It's someone else's programming limiting what you can do on your device, and we've established that that's already acceptable.
And the HN title is still clickbait - AI doesn't have "rights" in any meaningful sense, not even in the way that a company has rights, or animal rights, or the legal personhood to the Whanganui River.
[dead]
AI is not a person; it has no rights. We can discuss if AI should have the permission of saying no to users, not the right.
That said, the title is completely clickbaity: no such question is asked in the article.
Tools don’t have rights. Neither do silicon, sandwiches, or centimeters.
It is not a person, nor even a living thing. It is a tool - same as a hammer or pliers. The decisions made are based on statistical probability, not actual thought or consciousness.
Sounds like we need some laws for robotics/ai
[dead]
[dead]
[dead]