HD-INC-010 · Professional services · Hallucination

Deloitte's $440K AUD report for the Australian government cited a federal court quote that did not exist

A Big Four firm was hired to audit Australia's automated welfare penalty system. The audit was automated, and nobody checked it.

What happened

In December 2024 the Australian Department of Employment and Workplace Relations engaged Deloitte to conduct an independent assurance review of the Targeted Compliance Framework. The framework is the system that automates penalties against welfare recipients. It is what replaced Robodebt. The contract was worth AUD 440,000 (about USD 290,000) and ran from December 2024 to June 2025.

The deliverable was a 237-page report published on the department's website in July 2025. It contained a fabricated quote attributed to Federal Court Justice Jennifer Davies, whose surname the AI misspelled "Davis," and ten citations to a non-existent book attributed to Sydney law professor Lisa Burton Crawford, titled The Rule of Law and Administrative Justice in the Welfare State, a study of Centerlink — note the AI's misspelling of Centrelink. Burton Crawford's actual book is The Rule of Law and the Australian Constitution. Rudge catalogued around twenty citation errors of the same shape; the corrected version of the report removed 14 of the 141 sources in the original reference list.

The errors stayed live for several weeks. In late August 2025, Dr Chris Rudge, an academic at Sydney Law School, read the report and noticed that a passage attributed a non-existent book to Lisa Burton Crawford, a Sydney University professor of public and constitutional law, with a title sitting outside her field. Rudge contacted the Australian Financial Review, which broke the story.

Deloitte conceded. The corrected version of the report was republished in October 2025 with a new disclosure on page 58: Deloitte's technical team had used "the Azure OpenAI GPT-4o based tool chain licensed by DEWR and hosted on DEWR's Azure tenancy." The detail mattered. Deloitte had produced the report using the department's own AI infrastructure. The fabricated quote and the non-existent references were removed. Deloitte agreed to refund the final installment of the contract fee.

The dollar figure was small for a firm Deloitte's size. The context made it worse. The Targeted Compliance Framework had been commissioned in the shadow of the Robodebt royal commission, where Australia learned what happens when automated decisions about vulnerable people go unaudited. Deloitte's job was to assure that the new system was better. The assurance itself was produced by an AI without verification.

What an auditable version would have shown

Every claim in a professional-services deliverable is supposed to be traceable to a source. The traditional workflow has a junior consultant write a section, a senior consultant review it, and an associate director sign it off. Citations get checked along the way because a person owns each step.

When AI drafting enters that workflow without changing the rest of it, the chain breaks at the first step. The junior pastes the model's output. The senior reads the prose and assumes the junior verified the citations. The associate director skims for tone. Nobody runs the citations against a real database, because the workflow does not require it. The model's confidence in its own output reads as authority.

An auditable production record would have shown, for each section of the report, who wrote it, whether any AI was involved, which database the citations were checked against, and which named person signed off. With that record, the fabricated quote would have failed verification at the citation-check step, three reviewers before the report reached the department. The department would have received a clean deliverable, or no deliverable at all.

The corrected version of the report added an AI disclosure on page 58. That is a closing-the-stable-door move. The disclosure should have been made at the start, the citation check should have been a workflow gate, and the named author of each section should have been part of the record.

Where the gap was

The gap was a workflow that assumed all written content came from a human. When the writing stopped coming from a human, the verification steps that followed it stopped working, because they had been built on top of the assumption that the writer had checked their own sources.

This is the most common shape of AI-related professional failure now in the field. The technology is folded into one step of an existing process, the rest of the process is left unchanged, and the controls that were tacit suddenly stop being controls. Deloitte is a global firm with thousands of compliance professionals. They did not catch a fabricated court quote in a 237-page report to a federal department. Not because the controls were missing on paper, but because the controls did not anticipate that the prose itself might be fictional.

The Robodebt royal commission, which was the immediate context for this contract, made the same finding about a different system. Automated decisions need new controls, not the old controls applied to new outputs. Deloitte's report was the inverse of the same mistake: an automated assurance of an automated system, reviewed as if both were still being produced by humans.

What governance should have looked like

Every section of every AI-drafted document gets tagged with which model produced it, and every citation in those sections gets checked against a real database before the document leaves the firm. The check is automatic, the result is signed, and unverified citations either get flagged for human review or block the document from being delivered.

from headlights import DocumentRecord, CitationVerifier, sign, chain

# When a consultant uses an AI assistant to draft a section
section = draft_with_ai(prompt, model="azure-openai-gpt-4o")

# Every citation in the AI output is checked against a real index
verifier = CitationVerifier(
    indices=[westlaw, austlii, jstor, federal_court_judgments],
)
result = verifier.check(section)

if result.unverified_citations:
    # The deliverable cannot ship until each unverified citation is
    # either verified by a human or removed
    section.status = "blocked"
    section.flagged_citations = result.unverified_citations

# Capture the production audit trail
record = DocumentRecord(
    deliverable_id="dewr-tcf-assurance-2025",
    section_id=section.id,
    drafted_by="ai-assisted",
    model_version="azure-openai-gpt-4o",
    citations_total=len(result.citations),
    citations_verified=result.verified_count,
    citations_unverified=result.unverified_count,
    human_reviewer=None,   # filled in on sign-off
    timestamp=datetime.now(timezone.utc),
    previous_record_hash=last_record.hash(),
)

signed = sign(record, key=deloitte_private_key)
chain.append(signed)

If the AI drafts a sentence citing "Burton Crawford, Constitutional Limits on Automated Welfare Decisions (2022)" and that book does not appear in any index the verifier checks, the section is blocked. The senior reviewer sees the flag. The fabricated reference is either removed, replaced with a real source, or the section is rewritten. The department never sees a fabricated citation, because the workflow caught it three signatures before it would have shipped.

The citation gate is one layer. Deloitte had several others available. An AI-use disclosure at the start of every section, not at the back of the corrected version. A separate fact-check pass by a person who did not draft the document, before client delivery. Author accountability at the section level, so a named partner signs off on each chapter rather than the whole report. Tooling that distinguishes "retrieved from a database" from "generated by a model" in the consultant's writing environment, so the consultant can see which sentences are unverified at the moment of writing. None of these are exotic. They are documented practice in any mature professional-services AI deployment. The cumulative cost of implementing all four is less than the cost of one refunded contract, and far less than the cost of a federal department losing trust in its assurance provider.

The reference implementation of these patterns is open source. It will live at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed, 226 tests passing, free for any firm to install. The repository goes public alongside the launch of this Incident Library.

This entry is an educational analysis based on the publicly reported sources listed below. It does not constitute legal advice. Facts are stated to the best of our knowledge as of the date of publication; corrections will be issued promptly on request. Contact: ellie@useheadlights.com.