HD-INC-003

Legal services · United States · 2023 · Hallucination & fabrication

Michael Cohen gave his lawyer fake case citations he had got from Google Bard, and his lawyer filed them in a federal court

By Ellie Harris · Filed 29 November 2023

Alleged: Gerstman Schwartz LLP, Google (Bard) developed or deployed the AI system implicated in this incident. Details are drawn from public reports; parties are presumed innocent of any wrongdoing not established by an official finding.

What happened

On 29 November 2023, Michael Cohen’s attorney David M. Schwartz filed a motion in the Southern District of New York seeking the early termination of Cohen’s term of court-ordered supervised release. The motion argued that District Court decisions, affirmed by the Second Circuit, had previously granted early termination in comparable cases. To make the point, the motion cited three:

United States v. Figueroa-Florez, 64 F.4th 223 (2d Cir. 2022)
United States v. Ortiz (No. 21-3391), 2022 WL 4424741 (2d Cir. Oct. 11, 2022)
United States v. Amato, 2022 WL 1669877 (2d Cir. May 10, 2022)

None of the three exist. The citation for Figueroa-Florez points to a page in the middle of an unrelated Fourth Circuit decision. The citation for Amato corresponds to a decision of the Board of Veterans Appeals. The third was simply invented.

Cohen had completed his federal sentence on campaign-finance and tax-related convictions arising from the 2016 presidential election and had been in supervised release since. He decided to find precedent that would support cutting that release short. He used Google Bard. He treated the chatbot as a search engine, what he later described to the court as “a super-charged search engine,” and assumed its outputs were retrieved from a database of real cases. They were not. Bard generated case-shaped text that looked like search results, with confident citations in the standard federal reporter format. Cohen passed the citations to Schwartz. Schwartz included them in the filing without independently verifying any of them.

Judge Jesse M. Furman noticed. On 12 December 2023 he issued an order requiring Schwartz to produce copies of the three decisions. Schwartz could not produce them. The court ordered a show-cause hearing. On 29 December 2023, Cohen filed a sworn affidavit explaining that he had used Google Bard, that he had not understood that the tool could fabricate citations, that he had assumed it functioned like a search engine, and that he had not told Schwartz the source was an AI chatbot. Schwartz filed his own affidavit explaining that he had believed the citations came from E. Danya Perry, a respected attorney who had commented on an earlier draft, and so had not independently reviewed them; he had not known Cohen was the source.

On 20 March 2024 Judge Furman issued a written opinion declining to impose Rule 11 sanctions on Schwartz. He found that Schwartz had been negligent but not in bad faith. He declined to find Cohen in contempt, observing that the court had “no basis to question Cohen’s representation that he believed the cases to be real” and that it would have “been downright irrational” for Cohen to have submitted citations he knew were fabricated. The published opinion was picked up immediately by legal-ethics scholars and added to the small but growing body of caselaw on AI-generated citations, alongside Mata v. Avianca (HD-INC-002) decided in the same court six months earlier. There were no monetary penalties. There was a public record. The case has been cited in nearly every continuing legal education AI-ethics module produced since.

What an auditable version would have shown

The case turned on the absence of two records that should have existed and did not.

The first was at the chatbot end. Cohen could not show what prompt he had given Bard. Bard could not show what response it had generated, in what session, against what model version. Bard’s logs, to the extent they persisted, were not produced in court, were not signed, and were not version-pinned. When Judge Furman asked the questions that would matter in any other forensic inquiry, what did the chatbot actually say, on what date, in response to what query, with what model active, none of the parties had the record. The investigation became an exercise in reconstruction from memory and emails.

The second was at the firm end. Schwartz could not show that any verification step had taken place between receiving the citations from Cohen and submitting them to the court. A signed log of the citation-verification step that should have run, each citation checked against Westlaw, LexisNexis or PACER, each result captured as a structured outcome, would have been a one-line answer to the court. The log either showed the check ran and the citations failed, in which case the question is why Schwartz filed them anyway; or the log showed the check did not run at all, in which case the question is why a federal filing left the firm without it. The absence of the log was, in itself, the evidence of the conduct.

With both records present, the inquiry collapses from weeks of show-cause hearings to two structured queries. With neither present, the court had to reconstruct from affidavits and memory.

Where the gap was

The gap was on three sides at once.

On the user side, Cohen treated Bard as a search engine. This is the most common misconception about general-purpose chatbots and remains so. Bard, like every chatbot of its generation, did not retrieve, did not search, and did not return facts from a database. It generated plausible text that resembled what a search engine would have returned. A signed indication at the chatbot’s output, this is generated text, not retrieved data, would have given Cohen a clear, structured warning that what he was holding was a draft and not a citation.

On the chatbot side, Bard did not warn the user that its output was not retrieved. The interface presented generated text indistinguishably from retrieved text. The session produced no structured output that downstream tools could parse for source attribution. The user received plain English that looked like a search result.

On the firm side, Schwartz did not run a verification step before submitting. Every law firm has access to legal-research databases that would have flagged the three citations as nonexistent in under a minute. The step was skipped. The firm’s filing pipeline had no checkpoint that required a citation to be confirmed against an authoritative source before a document went to court. Mata v. Avianca had established six months earlier that this checkpoint was a professional necessity. The lesson did not propagate at the same speed it spread.

What governance should have looked like

The verification of a citation before it leaves the firm is not an AI problem. It is a discipline problem that AI has made urgent. The fix is mechanical: every citation in every court filing passes through a verifier that confirms it exists in an authoritative database, and the verifier’s output is signed and retained as part of the matter file.

Two records, side by side, would have surfaced the problem at the firm level long before it reached Judge Furman.

Citation verification at the filing gate. The firm’s filing pipeline should refuse to release any document to a court until every citation has been confirmed against an authoritative source. The verification runs in seconds against Westlaw or PACER. Three citations that did not exist in any source would have been flagged immediately. Schwartz would have asked Cohen where they came from. The conversation would have happened inside the firm rather than in front of a federal judge.

AI disclosure on the matter record. Cohen did not tell Schwartz the citations came from Bard. A conduct record that captured the source of every citation, including “generated by AI chatbot, not retrieved from authoritative source”, would have made the disclosure mechanical rather than discretionary. The professional duty to disclose AI assistance in court filings has since been formalised in standing orders by federal and state judges across the United States. The record makes compliance with that duty verifiable rather than self-reported.

Cohen is the second-most cited case in the AI-hallucination-in-legal-filings literature, behind Mata v. Avianca. It is the more instructive of the two. Mata established that the lawyer is responsible. Cohen established that the client is too, and that without a verifiable record at the point of generation the post-hoc inquiry is slow, ambiguous, and dependent on memory. With signed records on both sides, the chatbot’s generation and the firm’s verification, the inquiry becomes a two-line query.

The reference implementation of CitationVerifier, VerificationGate, and ConductRecord is open source. It lives at github.com/saffronandindia/headlights-oss, Apache 2.0 licensed and free to install. Anyone can read every line and verify the signatures. The repository is public now.

Sources

The mailing list

Fresh incident reports every week. One email to match.

We add new incidents to the library regularly, and send a single short email each week with what's new. The library stays free and open; this is just how you keep up with it.

No tracking. Unsubscribe in one click.

The record

An auditable system would have produced a signed, tamper-evident record the moment this happened: what the system did, the version that did it, the basis it acted on, and the action taken, and Gerstman Schwartz LLP, Google (Bard) could have produced it on demand.

This is the record the system as deployed did not produce in a signed, auditable form.

What this teaches

Capture what happened when it happens

What the system did, the version that did it, the basis it acted on, and the action taken, recorded at the moment, not reconstructed after.

Sign it, so no one has to trust the record-keeper

A tamper-evident entry. Edit it later and the signature breaks. The record does not ask for the benefit of the doubt.

Make it verifiable by anyone

A court, a regulator, a customer's lawyer can check the record themselves, without taking the company, or us, at our word.

Also in the library

HD-INC-001 Air Canada chatbot promised a bereavement refund policy that did not exist Aviation · 2022 HD-INC-002 Mata v. Avianca, the lawyer who cited six cases that did not exist and asked ChatGPT to confirm them Legal services · 2023 HD-INC-005 Cursor's AI support bot, signing emails as "Sam", invented a single-device subscription policy that never existed, and developers cancelled Technology · 2025

Headlights summarises publicly reported AI incidents. All summaries are independently written, attributed to their original sources, and intended for research and educational purposes. Allegations are identified as such until established through official findings.

Last reviewed June 2026. This report is based on the sources listed above and reflects information available at the time of review; later developments may not be captured. Where a person is described as charged with or alleged to have done something, that allegation is unproven unless a conviction or a court or regulatory finding is stated. Headlights publishes journalism and commentary, not legal advice.

Want to write back?

Direct to my inbox.

ellie@useheadlights.com →