HD-INC-003 · Legal services · Hallucination

Michael Cohen gave his lawyer fake case citations he had got from Google Bard, and his lawyer filed them in a federal court

A former presidential personal attorney used a chatbot to find legal precedent and handed the fake cases to his lawyer, who put them in front of a federal judge. The judge noticed, ordered an investigation, and declined to sanction, but the case has been taught in every continuing legal education AI module since.

What happened

On 29 November 2023, Michael Cohen's attorney David M. Schwartz filed a motion in the Southern District of New York seeking the early termination of Cohen's term of court-ordered supervised release. The motion argued that District Court decisions, affirmed by the Second Circuit, had previously granted early termination in comparable cases. To make the point, the motion cited three:

  • United States v. Figueroa-Florez, 64 F.4th 223 (2d Cir. 2022)
  • United States v. Ortiz (No. 21-3391), 2022 WL 4424741 (2d Cir. Oct. 11, 2022)
  • United States v. Amato, 2022 WL 1669877 (2d Cir. May 10, 2022)

None of the three exist. The citation for Figueroa-Florez points to a page in the middle of an unrelated Fourth Circuit decision. The citation for Amato corresponds to a decision of the Board of Veterans Appeals. The third was simply invented.

Cohen had completed his federal sentence on campaign-finance and tax-related convictions arising from the 2016 presidential election and had been in supervised release since. He decided to find precedent that would support cutting that release short. He used Google Bard. He treated the chatbot as a search engine, what he later described to the court as "a super-charged search engine," and assumed its outputs were retrieved from a database of real cases. They were not. Bard generated case-shaped text that looked like search results, with confident citations in the standard federal reporter format. Cohen passed the citations to Schwartz. Schwartz included them in the filing without independently verifying any of them.

Judge Jesse M. Furman noticed. On 12 December 2023 he issued an order requiring Schwartz to produce copies of the three decisions. Schwartz could not produce them. The court ordered a show-cause hearing. On 29 December 2023, Cohen filed a sworn affidavit explaining that he had used Google Bard, that he had not understood that the tool could fabricate citations, that he had assumed it functioned like a search engine, and that he had not told Schwartz the source was an AI chatbot. Schwartz filed his own affidavit acknowledging that he had relied on Cohen's citations without independent verification.

On 20 March 2024 Judge Furman issued a written opinion declining to impose Rule 11 sanctions on Schwartz. He found that Schwartz had been negligent but not in bad faith. He declined to find Cohen in contempt, observing that the court had "no basis to question Cohen's representation that he believed the cases to be real" and that it would have "been downright irrational" for Cohen to have submitted citations he knew were fabricated. The published opinion was picked up immediately by legal-ethics scholars and added to the small but growing body of caselaw on AI-generated citations, alongside Mata v. Avianca (HD-INC-002) decided in the same court six months earlier. There were no monetary penalties. There was a public record. The case has been cited in nearly every continuing legal education AI-ethics module produced since.

What an auditable version would have shown

The case turned on the absence of two records that should have existed and did not.

The first was at the chatbot end. Cohen could not show what prompt he had given Bard. Bard could not show what response it had generated, in what session, against what model version. Bard's logs, to the extent they persisted, were not produced in court, were not signed, and were not version-pinned. When Judge Furman asked the questions that would matter in any other forensic inquiry, what did the chatbot actually say, on what date, in response to what query, with what model active, none of the parties had the record. The investigation became an exercise in reconstruction from memory and emails.

The second was at the firm end. Schwartz could not show that any verification step had taken place between receiving the citations from Cohen and submitting them to the court. A signed log of the citation-verification step that should have run, each citation checked against Westlaw, LexisNexis or PACER, each result captured as a structured outcome, would have been a one-line answer to the court. The log either showed the check ran and the citations failed, in which case the question is why Schwartz filed them anyway; or the log showed the check did not run at all, in which case the question is why a federal filing left the firm without it. The absence of the log was, in itself, the evidence of the conduct.

With both records present, the inquiry collapses from weeks of show-cause hearings to two structured queries. With neither present, the court had to reconstruct from affidavits and memory.

Where the gap was

The gap was on three sides at once.

On the user side, Cohen treated Bard as a search engine. This is the most common misconception about general-purpose chatbots and remains so. Bard, like every chatbot of its generation, did not retrieve, did not search, and did not return facts from a database. It generated plausible text that resembled what a search engine would have returned. A signed indication at the chatbot's output, this is generated text, not retrieved data, would have given Cohen a clear, structured warning that what he was holding was a draft and not a citation.

On the chatbot side, Bard did not warn the user that its output was not retrieved. The interface presented generated text indistinguishably from retrieved text. The session produced no structured output that downstream tools could parse for source attribution. The user received plain English that looked like a search result.

On the firm side, Schwartz did not run a verification step before submitting. Every law firm has access to legal-research databases that would have flagged the three citations as nonexistent in under a minute. The step was skipped. The firm's filing pipeline had no checkpoint that required a citation to be confirmed against an authoritative source before a document went to court. Mata v. Avianca had established six months earlier that this checkpoint was a professional necessity. The lesson did not propagate at the same speed it spread.

What governance should have looked like

The verification of a citation before it leaves the firm is not an AI problem. It is a discipline problem that AI has made urgent. The fix is mechanical: every citation in every court filing passes through a verifier that confirms it exists in an authoritative database, and the verifier's output is signed and retained as part of the matter file.

from headlights import CitationVerifier, ConductRecord, sign, chain
from datetime import datetime, timezone

# Before any filing goes out, every citation passes through the verifier.
verifier = CitationVerifier(
    authoritative_sources=["westlaw", "lexisnexis", "pacer", "courtlistener"],
    fail_on_unverified=True,
)

filing_citations = [
    "United States v. Figueroa-Florez, 64 F.4th 223 (2d Cir. 2022)",
    "United States v. Ortiz (No. 21-3391), 2022 WL 4424741 (2d Cir. Oct. 11, 2022)",
    "United States v. Amato, 2022 WL 1669877 (2d Cir. May 10, 2022)",
]

results = verifier.check_all(filing_citations)
# results = [
#   {"citation": "...Figueroa-Florez...", "found": False, "sources_checked": [...]},
#   {"citation": "...Ortiz...",          "found": False, "sources_checked": [...]},
#   {"citation": "...Amato...",          "found": False, "sources_checked": [...]},
# ]

# Build a signed record of the verification attempt, every check, every miss.
record = ConductRecord(
    workflow="court_filing_pre_submission",
    matter_id="us-v-cohen-supervised-release-termination",
    drafted_by="attorney_id_DMS",
    drafted_with_ai_assistance=True,
    ai_tool="google-bard",
    citations_submitted=filing_citations,
    verification_results=results,
    verification_passed=False,
    timestamp=datetime.now(timezone.utc),
    previous_record_hash=last_record.hash(),
)

signed = sign(record, key=firm_private_key)
chain.append(signed)

# Because verification failed, the filing is blocked at the gate.
if not record.verification_passed:
    raise FilingBlocked(
        "Three citations not verified in any authoritative source. "
        "Review required before submission."
    )

Two records, side by side, would have surfaced the problem at the firm level long before it reached Judge Furman.

Citation verification at the filing gate. The firm's filing pipeline should refuse to release any document to a court until every citation has been confirmed against an authoritative source. The verification runs in seconds against Westlaw or PACER. Three citations that did not exist in any source would have been flagged immediately. Schwartz would have asked Cohen where they came from. The conversation would have happened inside the firm rather than in front of a federal judge.

AI disclosure on the matter record. Cohen did not tell Schwartz the citations came from Bard. A conduct record that captured the source of every citation, including "generated by AI chatbot, not retrieved from authoritative source", would have made the disclosure mechanical rather than discretionary. The professional duty to disclose AI assistance in court filings has since been formalised in standing orders by federal and state judges across the United States. The record makes compliance with that duty verifiable rather than self-reported.

Cohen is the second-most cited case in the AI-hallucination-in-legal-filings literature, behind Mata v. Avianca. It is the more instructive of the two. Mata established that the lawyer is responsible. Cohen established that the client is too, and that without a verifiable record at the point of generation the post-hoc inquiry is slow, ambiguous, and dependent on memory. With signed records on both sides, the chatbot's generation and the firm's verification, the inquiry becomes a two-line query.

This entry is an educational analysis based on the publicly reported sources listed below. It does not constitute legal advice. Facts are stated to the best of our knowledge as of the date of publication; corrections will be issued promptly on request. Contact: ellie@useheadlights.com.