HD-INC-002 · Legal services · Hallucination

Mata v. Avianca, the lawyer who cited six cases that did not exist and asked ChatGPT to confirm them

A New York lawyer used ChatGPT for legal research and then asked ChatGPT whether the cases were real. The judge sanctioned him, his co-counsel, and his firm, and made them tell every federal judge whose name had been forged.

What happened

On 27 or 28 August 2019, Roberto Mata was on an overnight Avianca flight from El Salvador to John F. Kennedy Airport in New York when a metal serving cart struck his knee. In 2022 he sued Avianca in the United States District Court for the Southern District of New York. The airline moved to dismiss, arguing the claim was time-barred under the Montreal Convention. Mata's lawyer, Steven A. Schwartz of the New York firm Levidow, Levidow & Oberman, a personal-injury attorney admitted to the New York bar for more than three decades, filed an affidavit in March 2023 opposing the dismissal. The affidavit cited six federal decisions in support of his theory that a bankruptcy stay tolled the limitations period: Varghese v. China Southern Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines.

Avianca's counsel could not find any of the cases. The court's own staff could not find them. Schwartz had used ChatGPT to do the research. He had then asked ChatGPT itself whether the cases were real. ChatGPT had confirmed they were. None of them existed.

On 22 June 2023, Judge P. Kevin Castel imposed Rule 11 sanctions on Schwartz, on co-counsel Peter LoDuca (who had signed the affidavit because of the case's admission requirements but had no role in the research and later swore he had "no reason to doubt the sincerity" of Schwartz's work), and on the firm itself. The total sanction was USD 5,000. Castel also ordered the lawyers to send a copy of the sanctions order and the affidavit to Mata himself, and separately to each of the federal judges whose names had been falsely associated with the fabricated cases. The forty-six-page sanctions opinion is now the founding document of the AI-in-court literature and is cited in nearly every later case involving AI-fabricated submissions.

Schwartz's defence was that he had not understood ChatGPT could fabricate. He said he believed it was operating like a search engine connected to a real database of cases. Castel found subjective bad faith on a conscious-avoidance theory. A reasonable lawyer would have checked the citations against a real legal database before filing, regardless of where the citations came from.

What an auditable version would have shown

Mata is the case that taught the United States legal profession the difference between retrieved facts and generated facts. Schwartz believed ChatGPT was retrieving from a corpus of real decisions. The model was producing plausible-sounding text. When Schwartz asked the model whether Varghese v. China Southern Airlines was real, the model's answer came from the same generative process that had produced the citation in the first place. Asking the model to verify the model was a closed loop.

An auditable record on the research session would have tagged each citation with its source at the moment of generation. A real retrieved citation comes from a connected legal database such as Westlaw, LexisNexis, or PACER and carries that source's identifier. A model-generated citation has no external referent at all. The audit log makes the distinction automatically, and the distinction shows up in the lawyer's writing tool, on the reviewer's checklist, and, if the document ever reaches a courtroom, in the record before the judge.

With that record in place, Schwartz's question to ChatGPT could not have produced a misleading answer because the verification would have been routed away from the model entirely. The legal database would have returned no match for Varghese. The citation would have been flagged. The brief would not have been filed in the form it was.

Where the gap was

The technology to ground citations against real databases had existed for decades by March 2023. Westlaw and LexisNexis had been doing it since the early 1980s. Schwartz did not use those tools. He used a general-purpose chatbot trained on text from the web, which had learned the surface form of case citations without having any reliable connection to which cases were real.

The gap was a lawyer treating a general-purpose AI tool as if it were a connected legal research system. The market has since adjusted in the dedicated legal AI category. Every major legal AI tool launched after Mata, including Harvey, Spellbook, Lexis+ AI, and Westlaw Precision AI, explicitly grounds its output against verified case databases and surfaces the citation source. The category problem was understood and fixed. The bigger exposure now sits with any lawyer who reaches past those tools for a general-purpose chatbot when they are in a hurry, where the original architecture that produced Schwartz's brief is still the default.

The repetition continues. The Australian cases catalogued elsewhere in this library all involve the same architecture: a lawyer using a general-purpose chatbot for legal research, the chatbot fabricating citations, the lawyer not verifying them before filing. The technology has moved on. The workflow inside many firms has not moved with it. As of mid-2026, public trackers record dozens of sanctioned cases globally for AI-hallucinated submissions between 2023 and 2026.

What governance should have looked like

When anyone asks an AI tool whether a citation is real, the question gets routed to a real legal database, never back to the model. The model is allowed to suggest citations. The verification step lives outside the model entirely and is non-negotiable before any document leaves the firm.

from headlights import VerificationGate, ConductRecord, sign, chain

# A lawyer asks the assistant to find supporting authority
prompt = "Find federal cases applying the Montreal Convention's tolling rule to bankruptcy stays"
output = ai.draft(prompt)

# Headlights extracts every citation and routes verification to real
# legal databases. The model is never asked "is this case real?"
gate = VerificationGate(
    citation_indices=[westlaw, lexis_plus, pacer, austlii],
)
for citation in extract_citations(output):
    match = gate.verify(citation)
    citation.status = match.status   # "verified", "no-match", or "ambiguous"
    citation.source = match.source if match.status == "verified" else None

# The lawyer sees a clear flag on every unverified citation BEFORE
# the document is finalised
unverified = [c for c in output.citations if c.status != "verified"]
if unverified:
    output.status = "draft_blocked"
    output.warning = (
        f"{len(unverified)} citation(s) could not be verified against any "
        f"legal database. Confirm by hand or remove before filing."
    )

# Capture the research session audit trail
record = ConductRecord(
    agent_id="legal-research-assistant",
    session_id=session.id,
    timestamp=datetime.now(timezone.utc),
    user_query=prompt,
    citations_extracted=len(output.citations),
    citations_verified=len(output.citations) - len(unverified),
    citations_unverified=len(unverified),
    model_consulted_for_verification=False,   # critical: never True
    external_sources_used=[s.name for s in gate.indices],
    previous_record_hash=last_record.hash(),
)
signed = sign(record, key=firm_private_key)
chain.append(signed)

The decisive field is model_consulted_for_verification. In Mata, that flag would have been set to True, because Schwartz asked the model itself whether the citations were real. With this pattern in place, the flag is always False. The model proposes citations. A real database disposes of the question of whether they exist.

The verification gate is one layer. Schwartz's firm had several others available. Mandatory database verification of every cited case before any brief leaves the firm, treated as a workflow step rather than an assumption. Source-type display in the writing environment, so a lawyer can see at a glance which sentences are AI-generated and which are retrieved from a real index. AI disclosure on filings, which most US federal courts and several state and Australian courts have since required by standing order. Junior-attorney citation review as a named workflow stage, separate from substantive review, so the verification work is not folded into "drafting" and lost. None of these are exotic. They are documented practice in any firm that has read the Mata sanctions opinion. The cumulative cost of implementing them is less than the cost of a single Rule 11 sanction and the bar referral that often follows.

The reference implementation of these patterns is open source. It will live at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed, 226 tests passing, free for any firm to install. The repository goes public alongside the launch of this Incident Library.

This entry is an educational analysis based on the publicly reported sources listed below. It does not constitute legal advice. Facts are stated to the best of our knowledge as of the date of publication; corrections will be issued promptly on request. Contact: ellie@useheadlights.com.