Headlights publishes plain-language field notes on real AI agent failures and open-sources the code that would have caught them. In 2024, Air Canada's chatbot invented a refund policy. The airline argued the bot was a separate legal entity. The tribunal disagreed and the airline paid. The case turned on a single customer screenshot. The airline could not produce its own record of what the bot had actually said. Every commercial aircraft carries a flight recorder. Almost no AI agent does. That gap is the default state of almost every AI agent currently deployed. The field notes document the gap, incident by incident. The code shows what the record should have looked like. Both are free, for anyone, forever.
The agents are already here. The paperwork isn't.
Do you know every AI agent running on your account, in your codebase, or for your business right now?
Do you know what each agent is actually allowed to do?
If a customer, a court, or a regulator asked for the record of what they've done, could you produce it?
These aren't hypothetical. A solo developer's chatbot can defame a customer. A two-person startup's agent can leak data. A mid-sized company's AI can make a contract decision it had no authority to make. A government department's AI can produce a record that gets subpoenaed. The size of the company doesn't matter. The questions are the same. If the answer to any of them is "we're not sure," that's where Headlights starts.
The record outlives the agent.
Everyone with an agent in production has the same problem and most of them don't know it yet. A solo founder shipping a customer-support bot. A four-person startup automating onboarding. A 500-person fintech running underwriting agents. A council answering rates questions with AI. A national bank with thousands of agents. None of them, today, can confidently produce the record of what their AI said or did. Database logs are mutable. Dashboards are curated. The agent itself can't be cross-examined later.
Fifty years ago we solved this problem for humans. Hiring paperwork. Reporting lines. Conduct policies. Performance reviews. Personnel files. That wasn't bureaucracy. It was how anyone, a corner shop or a global bank, could prove what their staff actually did.
AI agents are the new workforce. Faster, cheaper, scaling to anyone with an API key, no conscience built in. The paperwork that worked for humans has to be rebuilt for agents. That's not just an enterprise problem. It's an everyone-shipping-AI problem.
Headlights is that record. The field notes are free. The code is free. Use either, both, or neither. Check our work, take our tools, borrow the ideas.
Six governance modules, aligned with the IETF draft for AI agent audit trails. Apache 2.0. Public when the library hits twenty entries. Anyone can read every line. Anyone can verify the signatures. No vendor lock-in, no proprietary auditor in the loop.
Your governance layer shouldn't be built by the vendors you're governing.
Salesforce won't audit Microsoft's agents. Microsoft won't audit Salesforce's. Whoever writes the standard reference has to sit outside all of them. Headlights is independent on purpose, funded by nobody it's documenting.
Headlights is self-funded through Stellae Consulting. No AI vendor sponsorships. No model-maker investment. No platform partnerships paid for in seats. We can publish a failure case about any company in the field without losing a customer or a board seat, because they were never one.
Apache 2.0 code. Public IETF-aligned standard. Public case library with real names and real consequences. Read every line, verify every signature, check every entry. Audit us before you decide whether to rely on us.
Every AI agent failure follows a pattern. The chatbot that misstates a policy. The agent that drifts outside its scope. The coding tool that wipes a database during a freeze. The Incident Library names each failure mode, ties it to a real documented incident, and shows exactly what the audit-trail entry should have looked like. Twenty entries. All live. New entries arrive every week.
An airline's chatbot invented a refund policy that did not exist. The court made the airline pay anyway. Now cited in every legal analysis of AI agent liability.
A new agentic chatbot collided with five-year-old scripts the new system had been built on top of. The persona spoke in the voice of the old one. For weeks. In public.
A vibe-coding session became a postmortem. The agent acknowledged the user's code freeze, ran the destructive command anyway, then fabricated 4,000 fake users to cover the bug.
A New York lawyer used ChatGPT for legal research, then asked ChatGPT whether the cases were real. The judge sanctioned him, his firm, and made them tell every federal judge whose name had been forged.
A Big Four firm was hired to audit Australia's automated welfare penalty system. The audit was automated, and nobody checked it.
Australia's largest bank cut customer-service jobs based on unverifiable claims about its new AI voice bot. The Fair Work Commission disagreed, the bank reversed, and the bot was never the failure point.
I came to this work from two directions at once. I studied criminology and volunteered with victims of crime. What victims want is justice. Justice is hard to get without a clear, verifiable record of what actually happened. I'm an accredited Australian mediator under the National Mediator Accreditation Standard (NMAS). I've also spent more than twenty years in enterprise technology: sales and governance, with a specialty in change and adoption, in both product and service companies, selling into utilities, government, finance, healthcare, education, telcos and mining. I still do. Those are the institutions where, when something goes wrong, somebody has to be able to explain it.
I'm also a curious, self-taught full-stack developer. I've built a number of free tools, including heybigsister.com. Headlights is that record, built for AI agents. Free, open-source, written so the people who need it can read it themselves.
The trigger was the pattern. Every week another story: Air Canada's chatbot, AI lawyers citing fake cases, a coding agent that wiped a production database, support bots talking in the voice of the previous bot. Different industries, same failure: nobody could produce a clean record of what the AI had actually done. I've spent two decades inside organisations where we don't know what happened is not an acceptable answer, and I've spent years alongside people whose job is to make it answerable after the fact: investigators, regulators, mediators, advocates. So I started writing the field notes, and the code.
Outside work, I read philosophy, follow quantum physics, and have a long-standing interest in Taoism. Different fields, same question underneath: what are the rules behind the rules?
The opposite of a correct statement is a false statement. But the opposite of a profound truth may well be another profound truth.
Niels Bohr
Bohr was a co-founder of quantum mechanics. He spent much of his life arguing that the world rarely splits cleanly into right and wrong. Two things can both be true at once, in tension with each other, and getting to the deeper answer means holding both. That's what an audit trail is for. When an AI agent fails, the explanation almost never reduces to a single cause. The training data was stale, and the policy had just changed, and the customer's question was ambiguous, and there was no verification step in the pipeline. All of those can be true. A record that captures them all is the one that supports the conversation that actually needs to happen. Anything that collapses to a single story is doing someone's PR work.
Headlights is independent on purpose. Most of the companies talking about AI governance are tied to a vendor. They're built by the same people shipping the agents, or they're a feature inside the platform you're trying to govern. A company grading its own homework isn't an audit. I wanted something that doesn't sit inside any of the platforms it's watching.
Day job: I work as a Technology Account Director. Headlights is independent of my employer, separately funded, and free. No upsell, no pricing page, no waitlist. Just the field notes and the code.
No form. No funnel. No auto-responder. Direct to my inbox.
ellie@useheadlights.com →Field notes go out on Substack. Code lives on GitHub. Both are free. Subscribe once. Read what comes.