HD-INC-001

Aviation · Canada · 2022 · Hallucination & fabrication

Air Canada chatbot promised a bereavement refund policy that did not exist

By Ellie Harris · Filed 11 November 2022

Alleged: Air Canada developed or deployed the AI system implicated in this incident. Details are drawn from public reports; parties are presumed innocent of any wrongdoing not established by an official finding.

What happened

On 11 November 2022, Jake Moffatt’s grandmother died. He went to the Air Canada website to book flights from Vancouver to Toronto for the funeral. Before booking, he opened the support chatbot to ask about bereavement fares. The chatbot told him he could book at the regular price and apply for a partial refund within ninety days of the ticket being issued.

He booked. He applied for the refund. Air Canada refused.

The airline’s actual policy required bereavement fares to be requested before booking, not after. The chatbot had described a process that did not exist. Moffatt produced a screenshot of the conversation as evidence and took the matter to the British Columbia Civil Resolution Tribunal. Air Canada’s defence, in substance, was that the chatbot was a separate entity from the airline and that the airline could not be held responsible for what its chatbot said.

On 14 February 2024, the Tribunal rejected the argument. Tribunal Member Christopher C. Rivers, summarising the airline’s position, wrote: “In effect, Air Canada suggests the chatbot is a separate legal entity that is responsible for its own actions. This is a remarkable submission.” He ruled that a chatbot, however interactive, is still a part of the company’s website. The company is responsible for what its website tells a customer. The airline was ordered to pay CAD 812.02 in total, covering the fare difference, pre-judgment interest, and court fees.

The damages were modest. Moffatt v. Air Canada has since been cited in nearly every legal analysis of AI agent liability written in Canada, the UK, and Australia.

What an auditable version would have shown

The case turned on a single piece of evidence: Moffatt’s screenshot of the chatbot conversation. Air Canada did not contradict the screenshot. It did not produce its own log of the conversation. It did not produce the underlying prompt template, the policy document the chatbot was trained on, or the model version active on the day of the conversation. It made a legal argument about agency instead.

An auditable conduct record would have produced something different on demand. The conversation captured server-side and signed cryptographically the moment it happened. The model version active on 11 November 2022. The retrieval sources the model pulled from when it composed the bereavement-refund claim. The policy version current at the time the question was asked. None of that existed.

With that record, the airline could have argued from evidence. Maybe the bot pulled from a stale knowledge base. Maybe the bot hallucinated against a correct one. Maybe the policy had just changed. Each is a different problem with a different fix, and a signed record would have shown which one applied to Moffatt’s conversation. Without the record, Air Canada’s only remaining argument was that the chatbot was a separate legal entity.

Where the gap was

The gap was not specific to Air Canada. It was, and remains, the default state of almost every AI agent currently deployed by a non-trivial company.

The default deployment writes logs into a customer service database with thirty to ninety day retention, no signing, no model version pinning, no retrieval traces, no snapshot of the prompt. When the incident arrives, the company has logs, which are not the same thing as evidence. A customer’s screenshot, presented in court, carries roughly the same weight as an unsigned database row, because both are unverifiable claims.

The Moffatt court did not have to grapple with this because Air Canada chose not to introduce its own logs. The next case will be different. Companies will produce logs and customers will point out that the logs are unsigned, the model version was not captured, the system prompt has been changed seven times since the incident, and there is no way to tell whether the bot the company is now describing is the same bot the customer talked to. Without a signed, version-pinned conduct record, the logs are admitted with weight comparable to the screenshot.

What governance should have looked like

Every chatbot reply gets written to a signed, hash-chained record at the moment it happens. The record captures the model version, the policy version that was active that day, the documents the model retrieved from, a hash of the system prompt, and the conversation itself. The signature is verifiable by any third party, including the customer’s lawyer, without needing to trust the company.

Two years later, when the court asks “what did the bot actually say?”, the company produces the signed record. Anyone with a few lines of code can verify the signature: the customer, the customer’s lawyer, the court itself. If the record was edited after the fact, the signature breaks. If the model version was different that day, the record shows it. The argument shifts from “trust us” to “check it yourself.”

The signed conduct record is one layer. Air Canada had several others available. A retrieval-grounding policy that confined the bot to repeating verified policy text rather than improvising would have caught the bereavement hallucination at the source. A refusal pattern for sensitive categories, defaulting to “let me connect you with a person” for bereavement, refunds, and legal questions, would have removed the bot from the decision entirely. Adversarial testing that included bereavement scenarios before deployment would have surfaced the problem in QA rather than in court. Policy version pinning with a freshness check would have flagged any answer drawing on a policy not re-verified in the last thirty days. None of these are exotic. They are documented practice in any mature AI governance framework. The cumulative cost of implementing all four is less than the cost of one court hearing.

The reference implementation of VerificationGate and ConductRecord is open source. It lives at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed, free for any company to install. Anyone can read every line. Anyone can verify the signatures. No vendor lock-in. No proprietary auditor in the loop. The repository is public now.

Sources

The mailing list

Fresh incident reports every week. One email to match.

We add new incidents to the library regularly, and send a single short email each week with what's new. The library stays free and open; this is just how you keep up with it.

No tracking. Unsubscribe in one click.

The record

An auditable system would have produced a signed, tamper-evident record the moment this happened: what the system did, the version that did it, the basis it acted on, and the action taken, and Air Canada could have produced it on demand.

This is the record the system as deployed did not produce in a signed, auditable form.

What this teaches

Capture what happened when it happens

What the system did, the version that did it, the basis it acted on, and the action taken, recorded at the moment, not reconstructed after.

Sign it, so no one has to trust the record-keeper

A tamper-evident entry. Edit it later and the signature breaks. The record does not ask for the benefit of the doubt.

Make it verifiable by anyone

A court, a regulator, a customer's lawyer can check the record themselves, without taking the company, or us, at our word.

Also in the library

HD-INC-002 Mata v. Avianca, the lawyer who cited six cases that did not exist and asked ChatGPT to confirm them Legal services · 2023 HD-INC-003 Michael Cohen gave his lawyer fake case citations he had got from Google Bard, and his lawyer filed them in a federal court Legal services · 2023 HD-INC-005 Cursor's AI support bot, signing emails as "Sam", invented a single-device subscription policy that never existed, and developers cancelled Technology · 2025

Headlights summarises publicly reported AI incidents. All summaries are independently written, attributed to their original sources, and intended for research and educational purposes. Allegations are identified as such until established through official findings.

Last reviewed June 2026. This report is based on the sources listed above and reflects information available at the time of review; later developments may not be captured. Where a person is described as charged with or alleged to have done something, that allegation is unproven unless a conviction or a court or regulatory finding is stated. Headlights publishes journalism and commentary, not legal advice.

Want to write back?

Direct to my inbox.

ellie@useheadlights.com →