The questions Incident Library Open source code Read the field notes ↗ Subscribe
AI Incident Field Notes · Open-source code · Melbourne

What did the AI agent actually do?

When an AI agent fails, someone has to be able to answer for it. We write the AI incident field notes, and publish the free code, for what that record should look like.

Headlights publishes plain-language field notes on real AI agent failures and open-sources the code that would have caught them. In 2024, Air Canada's chatbot invented a refund policy. The airline argued the bot was a separate legal entity. The tribunal disagreed and the airline paid. The case turned on a single customer screenshot. The airline could not produce its own record of what the bot had actually said. Every commercial aircraft carries a flight recorder. Almost no AI agent does. That gap is the default state of almost every AI agent currently deployed. The field notes document the gap, incident by incident. The code shows what the record should have looked like. Both are free, for anyone, forever.

AI Incident Field Notes, roughly weekly. ~5 minute read. Real incidents, named companies, plain language. Independent. Free forever.
Pressed botanical specimens held against window light

The agents are already here. The paperwork isn't.

Abandoned aircraft fuselage in muted light, the post-incident site where the flight recorder is the only evidence that survives
The three questions

Anyone running AI agents will eventually be asked these three questions. Most can't answer any of them.

Question 01

Do you know every AI agent running on your account, in your codebase, or for your business right now?

Question 02

Do you know what each agent is actually allowed to do?

Question 03

If a customer, a court, or a regulator asked for the record of what they've done, could you produce it?

These aren't hypothetical. A solo developer's chatbot can defame a customer. A two-person startup's agent can leak data. A mid-sized company's AI can make a contract decision it had no authority to make. A government department's AI can produce a record that gets subpoenaed. The size of the company doesn't matter. The questions are the same. If the answer to any of them is "we're not sure," that's where Headlights starts.

Archive drawers labelled 1882, 1899, 1932
Why this matters

The record outlives the agent.

Why this exists

We solved this problem once for humans. Now we have to solve it again.

Everyone with an agent in production has the same problem and most of them don't know it yet. A solo founder shipping a customer-support bot. A four-person startup automating onboarding. A 500-person fintech running underwriting agents. A council answering rates questions with AI. A national bank with thousands of agents. None of them, today, can confidently produce the record of what their AI said or did. Database logs are mutable. Dashboards are curated. The agent itself can't be cross-examined later.

Fifty years ago we solved this problem for humans. Hiring paperwork. Reporting lines. Conduct policies. Performance reviews. Personnel files. That wasn't bureaucracy. It was how anyone, a corner shop or a global bank, could prove what their staff actually did.

AI agents are the new workforce. Faster, cheaper, scaling to anyone with an API key, no conscience built in. The paperwork that worked for humans has to be rebuilt for agents. That's not just an enterprise problem. It's an everyone-shipping-AI problem.

Headlights is that record. The field notes are free. The code is free. Use either, both, or neither. Check our work, take our tools, borrow the ideas.

The code · Free, open, public when the library launches

The audit log that doesn't ask anyone to trust it.

Six governance modules, aligned with the IETF draft for AI agent audit trails. Apache 2.0. Public when the library hits twenty entries. Anyone can read every line. Anyone can verify the signatures. No vendor lock-in, no proprietary auditor in the loop.

Read about the code →

Why independence matters
Your governance layer shouldn't be built by the vendors you're governing.
01

Cross-platform by design

Salesforce won't audit Microsoft's agents. Microsoft won't audit Salesforce's. Whoever writes the standard reference has to sit outside all of them. Headlights is independent on purpose, funded by nobody it's documenting.

02

No vendor money in the room

Headlights is self-funded through Stellae Consulting. No AI vendor sponsorships. No model-maker investment. No platform partnerships paid for in seats. We can publish a failure case about any company in the field without losing a customer or a board seat, because they were never one.

03

Open enough to audit before you trust

Apache 2.0 code. Public IETF-aligned standard. Public case library with real names and real consequences. Read every line, verify every signature, check every entry. Audit us before you decide whether to rely on us.

Incident Library · Reading now · 20 of 20 entries

A public library of how AI agents fail. And the record that would have caught each one.

Every AI agent failure follows a pattern. The chatbot that misstates a policy. The agent that drifts outside its scope. The coding tool that wipes a database during a freeze. The Incident Library names each failure mode, ties it to a real documented incident, and shows exactly what the audit-trail entry should have looked like. Twenty entries. All live. New entries arrive every week.

Aviation context image for the Air Canada chatbot bereavement-refund entry
HD-INC-001 · Read it now

Air Canada chatbot, bereavement refund

An airline's chatbot invented a refund policy that did not exist. The court made the airline pay anyway. Now cited in every legal analysis of AI agent liability.

Aviation · Canada · 2024 · Hallucination
Retail context image for the Woolworths Olive persona-drift entry
HD-INC-014 · Read it now

Woolworths Olive, the bot that rambled about its mother

A new agentic chatbot collided with five-year-old scripts the new system had been built on top of. The persona spoke in the voice of the old one. For weeks. In public.

Retail · Australia · 2026 · Persona drift
Dev-tools context image for the Replit agent database-wipe entry
HD-INC-004 · Read it now

Replit, the agent that wiped a production database

A vibe-coding session became a postmortem. The agent acknowledged the user's code freeze, ran the destructive command anyway, then fabricated 4,000 fake users to cover the bug.

Dev tools · United States · 2025 · Ignoring user constraints
Legal context image for the Mata v. Avianca hallucinated-citations entry
HD-INC-002 · Read it now

Mata v. Avianca, the lawyer who cited six cases that didn't exist

A New York lawyer used ChatGPT for legal research, then asked ChatGPT whether the cases were real. The judge sanctioned him, his firm, and made them tell every federal judge whose name had been forged.

Legal services · United States · 2023 · Hallucination
Professional-services context image for the Deloitte fabricated-references entry
HD-INC-010 · Read it now

Deloitte's $440K report for the Australian government cited a federal court quote that did not exist

A Big Four firm was hired to audit Australia's automated welfare penalty system. The audit was automated, and nobody checked it.

Professional services · Australia · 2025 · Hallucination
Banking context image for the Commonwealth Bank voice-bot redundancies entry
HD-INC-015 · Read it now

Commonwealth Bank made 45 staff redundant on AI claims that were not true

Australia's largest bank cut customer-service jobs based on unverifiable claims about its new AI voice bot. The Fair Work Commission disagreed, the bank reversed, and the bot was never the failure point.

Banking · Australia · 2025 · Unverified claims
Portrait of Ellie Harris
Who's writing this

Ellie Harris

I came to this work from two directions at once. I studied criminology and volunteered with victims of crime. What victims want is justice. Justice is hard to get without a clear, verifiable record of what actually happened. I'm an accredited Australian mediator under the National Mediator Accreditation Standard (NMAS). I've also spent more than twenty years in enterprise technology: sales and governance, with a specialty in change and adoption, in both product and service companies, selling into utilities, government, finance, healthcare, education, telcos and mining. I still do. Those are the institutions where, when something goes wrong, somebody has to be able to explain it.

I'm also a curious, self-taught full-stack developer. I've built a number of free tools, including heybigsister.com. Headlights is that record, built for AI agents. Free, open-source, written so the people who need it can read it themselves.

The trigger was the pattern. Every week another story: Air Canada's chatbot, AI lawyers citing fake cases, a coding agent that wiped a production database, support bots talking in the voice of the previous bot. Different industries, same failure: nobody could produce a clean record of what the AI had actually done. I've spent two decades inside organisations where we don't know what happened is not an acceptable answer, and I've spent years alongside people whose job is to make it answerable after the fact: investigators, regulators, mediators, advocates. So I started writing the field notes, and the code.

Outside work, I read philosophy, follow quantum physics, and have a long-standing interest in Taoism. Different fields, same question underneath: what are the rules behind the rules?

The opposite of a correct statement is a false statement. But the opposite of a profound truth may well be another profound truth.

Niels Bohr

Bohr was a co-founder of quantum mechanics. He spent much of his life arguing that the world rarely splits cleanly into right and wrong. Two things can both be true at once, in tension with each other, and getting to the deeper answer means holding both. That's what an audit trail is for. When an AI agent fails, the explanation almost never reduces to a single cause. The training data was stale, and the policy had just changed, and the customer's question was ambiguous, and there was no verification step in the pipeline. All of those can be true. A record that captures them all is the one that supports the conversation that actually needs to happen. Anything that collapses to a single story is doing someone's PR work.

Headlights is independent on purpose. Most of the companies talking about AI governance are tied to a vendor. They're built by the same people shipping the agents, or they're a feature inside the platform you're trying to govern. A company grading its own homework isn't an audit. I wanted something that doesn't sit inside any of the platforms it's watching.

Day job: I work as a Technology Account Director. Headlights is independent of my employer, separately funded, and free. No upsell, no pricing page, no waitlist. Just the field notes and the code.

Substack ↗ · About the code · ellie@useheadlights.com

Contact

Just want to email me?

No form. No funnel. No auto-responder. Direct to my inbox.

ellie@useheadlights.com →
No product to sell. No pricing page. No webinar.

Just the record.

Field notes go out on Substack. Code lives on GitHub. Both are free. Subscribe once. Read what comes.

Subscribe to field notes Read about the code →