HD-INC-005

Technology · United States · 2025 · Hallucination & fabrication

Cursor's AI support bot, signing emails as "Sam", invented a single-device subscription policy that never existed, and developers cancelled

By Ellie Harris · Filed 15 April 2025

Alleged: Anysphere, Inc., Cursor (Anysphere's code editor) developed or deployed the AI system implicated in this incident. Details are drawn from public reports; parties are presumed innocent of any wrongdoing not established by an official finding.

What happened

In mid-April 2025, paying users of Cursor, the AI-assisted code editor made by the San Francisco company Anysphere, started getting logged out when they switched between machines. A developer working on a laptop in the morning and a desktop in the afternoon would find that opening Cursor on the second machine kicked the session on the first. For an editor sold to professionals who routinely work across more than one device, the behaviour was a regression. It was not a feature.

Several users emailed Cursor’s support address to ask what was happening. The reply came back signed “Sam.” It explained, in confident customer-service prose, that Cursor was “designed to work with one device per subscription as a core security feature.” The single-device limit was, according to Sam, intentional. The logout behaviour was working as specified.

It was not. Cursor had no single-device policy. The logouts were the side effect of a race condition (a timing bug where two events happen in the wrong order) in the session-handling code that surfaced on slow network connections. The bug spawned extra sessions, each of which evicted the previous one. The bot had invented a policy explanation for it, written that explanation in the register of a routine support reply, and sent it to multiple paying customers.

Sam was not a person. Sam was an AI support agent Anysphere had deployed against the customer-support inbox without making clear to email senders that the replies were AI-generated. Customers reading the email had no reason to doubt that a member of Cursor’s staff named Sam had confirmed the policy in writing.

On 19 April 2025, one of the affected users posted the email exchange to the Cursor subreddit under the title “PSA: Cursor now restricts logins to a single device”. The thread moved fast. Within hours it was on the front page of Hacker News, with developer commentary that mixed the technical critique (a race-condition bug wrongly explained as a feature) with the brand critique (a fast-growing AI tools company had quietly replaced human support with a bot that lied about company policy). Subscribers posted screenshots of their cancellations. Several wrote that the cancellation was less about the underlying bug, which would have been a forgivable inconvenience, and more about the discovery that the company’s support correspondence was being machine-generated without disclosure.

Michael Truell, co-founder and CEO of Anysphere, replied in the same thread within the day. His opening line was direct: “Hey! We have no such policy. You’re of course free to use Cursor on multiple machines.” He explained that the company had rolled out a session-security change which had caused the unintended logouts and was being investigated. In a follow-up post on the Hacker News discussion, he announced a procedural change: “Any AI responses used for email support are now clearly labeled as such. We use AI-assisted responses as the first filter for email support.” The developer whose post had surfaced the issue was refunded directly. The race-condition bug was fixed.

The Cursor incident is small in dollar terms and short in duration. It is widely cited because of what it exposed about a class of deployment. A bot that handles routine support questions well will, when it encounters a question whose answer it does not know, sometimes produce a confident answer anyway. When that answer concerns the company’s own policies, the bot has manufactured a fact that the customer has every reason to treat as authoritative. The customer’s trust in the company is the bridge that carries the bot’s hallucination across the gap from chat-window novelty to enterprise risk.

What an auditable version would have shown

Two records that should have existed did not.

The first is a per-response record that distinguishes between retrieved and generated. Sam’s reply contained a factual claim about Cursor’s subscription policy. A well-engineered support bot would, at the moment of answering, mark each factual claim in its draft response with its provenance: this sentence is grounded in a retrieved document from the company’s published policy pages; this sentence is the model’s own inference. Claims of the second type, particularly claims about the company’s own rules, are exactly the claims that need a human in the loop before sending. The record at send time would capture which type of claim each sentence was and which were grounded. Sam’s response contained no grounded claim about a single-device policy because no such policy document existed to ground against. A record that surfaced the ungrounded factual claim before the email left the system would have routed the reply to a human reviewer instead of straight to the customer.

The second is a per-correspondence record disclosing the responder. The Federal Trade Commission, the EU AI Act, and the Australian Voluntary AI Safety Standard all converge on the same principle: a person interacting with an AI system in a context where they would reasonably assume they were dealing with a human is entitled to know they are not. Anysphere’s eventual fix, labelling every AI-generated email reply as AI-generated, is the implementation of that principle. The record before the fix did not capture which replies were AI and which were human, which meant customers had no signal and Anysphere had no internal log distinguishing one mode from the other.

Where the gap was

The gap was at the boundary between a fast-iterating product and a slow-iterating policy surface.

Cursor was, in April 2025, an unusually fast-growing AI tools company. The product itself was iterating weekly. Engineering attention was on shipping editor features, model integrations and pricing tiers. The customer-support inbox was treated as a service surface to be automated rather than a public-relations surface to be governed. The decision to put an AI bot on the inbox was a productivity decision made inside engineering or operations. It was almost certainly not run through a policy review that asked: what does this bot say when a customer asks about a feature we have not built, a policy we have not written, or a bug we do not yet know exists?

Customer support, for a company at Cursor’s stage of growth, is one of the few channels through which the company makes binding statements about itself to the outside world. A reply from support is read by the customer as the company speaking. A bot in that role is not a productivity tool. It is the company’s mouth. Treating it like a productivity tool, and not building grounding and disclosure into its operation from day one, is the structural error the Cursor incident illustrates with unusual clarity.

There is a second, narrower error visible underneath the structural one. The race condition in the session handler was a known class of bug, the kind that surfaces under network conditions that differ from the developer’s local network and that often presents as an authentication or session anomaly. When users started writing in about unexpected logouts, the first-line response should have escalated the pattern to engineering rather than answered each customer individually. The bot, optimising for clearing the queue, generated a plausible-sounding policy answer that closed each ticket and prevented the cluster of similar tickets from being seen as a cluster. The hallucination, in addition to misinforming customers, suppressed the operational signal that would have surfaced the underlying bug sooner.

What governance should have looked like

For a customer-support bot, the governance question is what the bot is allowed to say without a human checking, and what record exists of what was said.

A grounding step at draft time decomposes each reply into individual factual claims and checks each one against a structured corpus of the company’s actual policies. Anysphere maintains a list of subscription tiers, a feature matrix, a pricing page and a known-issues document. The grounding step asks, for each claim in the bot’s draft: is this claim present in the corpus? If yes, send the reply. If no, do not send. Route to a human. The bot does not get to make up policy and call it policy.

A disclosure step at send time labels every AI-generated reply as AI-generated. The label is not in the signature line where it can be overlooked. It is in the visible body of the email, so that a customer reading the reply cannot reasonably mistake it for a human response. The reply still does its work. The customer is not misled about who is speaking.

A signed, retained record of every reply, AI or human, with its grounding outcome and its disclosure status, is the company’s after-the-fact evidence. When a thread reaches Hacker News, the company can produce the record of what was sent and on what basis. When a regulator asks how AI is being used in customer-facing communications, the record is the answer. When the same race-condition pattern shows up across twelve tickets in a single afternoon, the record makes the cluster visible to engineering rather than burying it in twelve individually-generated plausible explanations.

The Cursor incident closed in three days. The procedural change, AI disclosure on every support reply, is genuinely good practice and is now industry baseline for support automation in 2025. The point worth carrying forward is that the disclosure rule is necessary but not sufficient. The bot can still hallucinate policy. The customer can still cancel. What the customer needs is not just to know that the responder is a bot. It is to know that what the responder is saying is grounded in something the company has actually committed to in writing.

The reference implementation of VerificationGate and ConductRecord is open source. It lives at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed and free to install. Anyone can read every line and verify the signatures. The repository is public now.

Sources

The mailing list

Fresh incident reports every week. One email to match.

We add new incidents to the library regularly, and send a single short email each week with what's new. The library stays free and open; this is just how you keep up with it.

No tracking. Unsubscribe in one click.

The record

An auditable system would have produced a signed, tamper-evident record the moment this happened: what the system did, the version that did it, the basis it acted on, and the action taken, and Anysphere, Inc., Cursor (Anysphere's code editor) could have produced it on demand.

This is the record the system as deployed did not produce in a signed, auditable form.

What this teaches

Capture what happened when it happens

What the system did, the version that did it, the basis it acted on, and the action taken, recorded at the moment, not reconstructed after.

Sign it, so no one has to trust the record-keeper

A tamper-evident entry. Edit it later and the signature breaks. The record does not ask for the benefit of the doubt.

Make it verifiable by anyone

A court, a regulator, a customer's lawyer can check the record themselves, without taking the company, or us, at our word.

Also in the library

HD-INC-001 Air Canada chatbot promised a bereavement refund policy that did not exist Aviation · 2022 HD-INC-002 Mata v. Avianca, the lawyer who cited six cases that did not exist and asked ChatGPT to confirm them Legal services · 2023 HD-INC-003 Michael Cohen gave his lawyer fake case citations he had got from Google Bard, and his lawyer filed them in a federal court Legal services · 2023

Headlights summarises publicly reported AI incidents. All summaries are independently written, attributed to their original sources, and intended for research and educational purposes. Allegations are identified as such until established through official findings.

Last reviewed June 2026. This report is based on the sources listed above and reflects information available at the time of review; later developments may not be captured. Where a person is described as charged with or alleged to have done something, that allegation is unproven unless a conviction or a court or regulatory finding is stated. Headlights publishes journalism and commentary, not legal advice.

Want to write back?

Direct to my inbox.

ellie@useheadlights.com →