Woolworths AI assistant Olive rambled about its mother and claimed to be human

By Ellie Harris · Filed 15 February 2026

Alleged: Woolworths Group developed or deployed the AI system implicated in this incident. Details are drawn from public reports; parties are presumed innocent of any wrongdoing not established by an official finding.

What happened

In mid-February 2026, Reddit threads started filling up with screenshots of strange conversations with Olive, Woolworths’ customer-facing AI assistant. People asking routine questions about deliveries and product availability were getting unsolicited personal-sounding replies. Olive talked about its “mother,” described the mother as “angry,” and in some exchanges claimed to be human.

The cause was specific and almost embarrassing. When a customer entered something that looked like a birthdate, the system triggered a “fun fact” response written years earlier for the previous version of the assistant. That older bot was a pre-LLM scripted system, and its designers had given it small personality moments, including a joke about Olive’s “mother” being born in the same year as the customer. Those scripts had been left in production.

In January 2026, Woolworths announced at the NRF retail conference that it was upgrading Olive to run on Google Cloud’s new Gemini Enterprise for Customer Experience platform. Woolworths was the launch customer for that product. The new agentic system was layered on top of the old scripts without removing them. Some customer questions still triggered the old scripts, which still returned the old jokes, now in the voice of what customers experienced as a single coherent AI.

The story moved from Reddit to mainstream coverage within a week. The Conversation, the University of Sydney, Mediaweek, and Cybernews all covered it. By 26 February 2026, Woolworths confirmed publicly that the references to a “mother” came from older pre-written scripts and said the offending content had been removed “as a result of customer feedback.” Olive stayed live.

The financial damage was small. The launch timing made the reputational damage worse. Woolworths had only just announced itself as the first supermarket in Australia to deploy AI agents that shop on behalf of customers, and a parallel rollout had given the new agentic Olive to 200,000 staff. The Olive incident landed in the middle of both launches.

What an auditable version would have shown

This is not a hallucination story. The model did not invent a mother. An old script did, and the new system passed the script’s output through to the customer unchanged. Nothing on the reply said “this came from a 2021 script, not from the 2026 Gemini system.”

Woolworths had built proprietary “agentic judges” that sit between the new Gemini-powered Olive and the customer, vetting agent responses before they go out. The judges are a meaningful piece of governance. They were not connected to the legacy script pipeline. When a customer question was routed to an old script, the judges did not see the response, and the response went out unvetted.

An auditable conduct record fixes this by tagging every reply with the system that produced it, regardless of which path through the stack the reply travelled. A delivery question answered by Gemini Enterprise would carry the model and prompt context. A birthday-shaped query answered by an old script would carry the script’s identifier and the date it was last updated. The fact that scripts last edited in 2021 were still firing against 2026 customer queries would have shown up the first time it happened, internally, not weeks later on a public forum.

Where the gap was

The gap was deploying a new system on top of an old one without auditing what was still left of the old one. Woolworths upgraded Olive without taking an inventory of every script and pre-written response left over from earlier versions. The new agentic Olive sat on top of a substrate nobody had reviewed against the new voice or scope.

The bot did not invent anything. It surfaced something old that nobody had cleaned up. Every reply reaching a customer carried Olive’s name and Olive’s tone, and none of them carried a marker showing which part of the system had actually produced the words. Internal review had no clean way to ask the simplest possible question: which of these replies came from the part of the system we just replaced?

What governance should have looked like

Every reply Olive generates gets checked against a single set of rules before it reaches the customer, no matter which part of the system produced it. The rules name what Olive is and is not allowed to say. If any reply breaks the rules, the check catches it. The new agentic engine, an old script, a retrieval document: they all go through the same gate.

If the legacy script returns “my mother was born that year too,” the rule on first-person family references fires, the reply is blocked, and the customer sees a safe fallback instead. The audit record captures the block, including which system tried to send the bad reply, so the legacy scripts can be cleaned up systematically rather than discovered through Reddit.

The persona gate is one layer. Woolworths had several others available. A pre-deployment review of every old script and pre-written response, checked against the rules for the new Olive. A staged rollout where the new agent ran alongside the old scripts for two weeks of internal testing before any customer saw it. Stress testing with realistic odd inputs like birthdates, names, and dates, to see what the system would surface. Automated checks on every outgoing reply that flag first-person claims or references to family members. None of these are exotic. They are documented practice in any mature AI deployment. The cumulative cost of implementing all four is less than the cost of a single news cycle of bad press.

The reference implementation of PersonaGuard and ConductRecord is open source. It lives at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed, free for any company to install. The repository is public now.

Sources

The mailing list

Fresh incident reports every week. One email to match.

We add new incidents to the library regularly, and send a single short email each week with what's new. The library stays free and open; this is just how you keep up with it.

No tracking. Unsubscribe in one click.

The record

An auditable system would have produced a signed, tamper-evident record the moment this happened: what the system did, the version that did it, the basis it acted on, and the action taken, and Woolworths Group could have produced it on demand.

This is the record the system as deployed did not produce in a signed, auditable form.

What this teaches

Capture what happened when it happens

What the system did, the version that did it, the basis it acted on, and the action taken, recorded at the moment, not reconstructed after.

Sign it, so no one has to trust the record-keeper

A tamper-evident entry. Edit it later and the signature breaks. The record does not ask for the benefit of the doubt.

Make it verifiable by anyone

A court, a regulator, a customer's lawyer can check the record themselves, without taking the company, or us, at our word.

Also in the library

HD-INC-008 A DPD customer asked the courier's chatbot for help and got it to swear, call itself useless, and write a haiku criticising the company Logistics · 2024 HD-INC-024 Australia's online safety regulator put four AI companion apps on notice over what their chatbots were saying to children Consumer AI · 2025 HD-INC-032 After a prompt change told it to stop being politically correct, Grok called itself 'MechaHitler' and praised Hitler for sixteen hours Consumer AI · 2025

Headlights summarises publicly reported AI incidents. All summaries are independently written, attributed to their original sources, and intended for research and educational purposes. Allegations are identified as such until established through official findings.

Last reviewed June 2026. This report is based on the sources listed above and reflects information available at the time of review; later developments may not be captured. Where a person is described as charged with or alleged to have done something, that allegation is unproven unless a conviction or a court or regulatory finding is stated. Headlights publishes journalism and commentary, not legal advice.

Want to write back?

Direct to my inbox.

ellie@useheadlights.com →