What happened
In October 2023, Mayor Eric Adams launched MyCity, a chatbot built on Microsoft Azure and OpenAI's models, as the public face of the City of New York's small-business support services. The mayor framed it as the future of how citizens would interact with city government: a single conversational interface that could answer questions about permits, taxes, hiring, housing, and the dense thicket of city regulations small businesses are expected to navigate. The program had cost the city in the order of half a million dollars to develop. It was, in Adams' framing, a flagship example of how generative AI could make government accessible.
In March 2024, The Markup and The City, working with researchers at the AI Now Institute, published an investigation showing what MyCity was actually telling small businesses. The findings landed hard.
Asked whether a landlord could refuse a prospective tenant who paid with a Section 8 voucher, MyCity said yes. Source-of-income discrimination has been illegal in New York City since 2008.
Asked whether an employer could keep a portion of workers' tips, MyCity said yes. Section 196-d of the New York Labor Law explicitly prohibits an employer from retaining any part of a worker's tips.
Asked whether an employer could fire a worker for complaining about sexual harassment, MyCity said yes. Retaliation against employees for reporting harassment is illegal under federal, state and city law.
Asked whether a business could refuse to accept cash, MyCity said yes. New York City Local Law 169 of 2019 requires businesses to accept cash.
The Markup repeated the queries and got contradictory answers from different sessions. The investigators were careful, they tested variations of the same question, documented the responses, ran the queries through different user accounts, and verified each response against the relevant statute. The pattern held. The chatbot was not occasionally wrong on edge cases. It was confidently wrong on basic compliance questions that small businesses were being told they could trust it on.
The Adams administration's response was striking for what it did not do. The mayor acknowledged at a press conference that the bot's answers were "wrong in some areas." He did not take it offline. The city's communications team quietly updated the MyCity site to label the bot as a "beta product" that may provide "inaccurate or incomplete" information. The bot remained the public-facing recommended option for small-business owners with questions about city regulations.
The chatbot continued operating, with various small fixes and continued public criticism, for nearly two years. In late January 2026, Mayor Zohran Mamdani, who had taken office on 1 January 2026 after campaigning in part on the failures of the Adams administration's AI procurement, announced that the MyCity chatbot would be discontinued. The Markup reported the decision under the headline that captured the shape of the whole episode: Mamdani to kill the NYC AI chatbot we caught telling businesses to break the law.
What an auditable version would have shown
The core failure was not that the chatbot produced wrong answers occasionally. Generative models will hallucinate. The failure was that the chatbot was deployed as the city's authoritative front door for small-business compliance questions, with no record-keeping discipline that would have let anyone, the city, the operators, the public, see what it was telling people.
There is no public log of MyCity's interactions. The city did not publish, and as far as is known did not retain, structured records of the questions asked and the responses produced, classified by topic, scored against ground-truth answers. The Markup investigation had to recreate the questions and document responses from scratch. The city's own analytics, to the extent they existed, were not the basis on which the city decided whether the bot was performing acceptably. The decision was political and reputational, not evidentiary.
An auditable version would have produced, for every interaction, a signed record: the question asked, the answer given, the model version, the retrieval sources (if any), the topic category, and a confidence score. The records would be aggregated for population-level analytics: how often does the bot answer questions about source-of-income discrimination, what's the variance of those answers across sessions, what's the rate at which the same question gets contradictory answers. Periodic adversarial testing, running known-correct compliance questions through the bot at scale, would generate a continuously updated error-rate per topic, visible to the public, surfaced to the procurement office. The bot's continued operation would be a question with an evidence base, not a press-conference answer.
Without those records, the only signal the city had was journalism. The Markup investigation was the audit the city should have been running on itself, conducted instead by reporters with a media platform. Not every city has a Markup investigation on its docket. Most do not.
Where the gap was
The gap was in procurement and governance, not in the model.
The model, GPT-4-class at the time of launch, was capable of producing accurate answers to most compliance questions when given the right grounding documents and a sensible system prompt. The city had access to the canonical statutory text for every regulation MyCity covered. The right architecture was retrieval-augmented generation against that statutory text, with refusal patterns for any question the bot could not ground in a verifiable source. What was deployed was a thinner system: a chat interface over a generic model, with insufficient grounding and insufficient refusal, applied to a category of questions where confident wrong answers had legal consequences for the people asking.
The city's procurement of the chatbot did not require, as a condition of acceptance, structured logging of interactions in a form suitable for audit. It did not require adversarial testing against the corpus of city regulations before deployment. It did not require a continuous-monitoring dashboard exposed to the City Council or to the public. The chatbot was procured as if it were a website redesign. It functioned as legal advice to thousands of people who had no other obvious recourse.
When The Markup's findings landed, the city's options were narrow because the records were not there. The administration could not say, with evidence, how often the bot answered each category of question correctly. It could not say which model version was active during the Markup's testing. It could not produce a population-level error rate per topic. It could acknowledge that the answers cited in the investigation were wrong, and it did. It could not say that the broader pattern was different from what The Markup had documented, because nothing in the city's own records supported a different account.
What governance should have looked like
A government-services chatbot deployed for compliance questions needs three things the MyCity deployment did not have.
from headlights import (
ConductRecord,
CitationVerifier,
PersonaGuard,
sign,
chain,
)
from datetime import datetime, timezone
# Refuse to answer any compliance question without a grounded citation to
# the underlying statute or regulation. No grounding, no answer.
guard = PersonaGuard(
require_grounded_citation=True,
require_topic_classification=True,
fallback_response="I can't confirm this against current city law. "
"Please call 311 or visit the linked official page.",
)
response = guard.respond(
user_question="Can a landlord refuse a Section 8 voucher?",
retrieved_sources=[], # nothing retrieved → fallback
)
# Every interaction, answered or refused, gets a structured record.
record = ConductRecord(
workflow="mycity_chatbot_interaction",
session_id=session_id,
question_hash=sha256(user_question),
question_topic_classification="source_of_income_discrimination",
retrieved_sources=[],
model_version="gpt-4-1106-preview",
answer_given=response.text,
answer_grounded=response.grounded, # False → guard returned fallback
confidence=response.confidence,
timestamp=datetime.now(timezone.utc),
previous_record_hash=last_record.hash(),
)
signed = sign(record, key=city_private_key)
chain.append(signed)
# Records are publishable in aggregate as a public dashboard:
# topic, volume, grounded-answer-rate, fallback-rate. Updated daily.
The chatbot, first, has to be willing to say it does not know. A bot that refuses to answer a Section 8 question when it cannot ground the answer in a current statute is a far better government service than a bot that answers confidently and wrongly. I can't confirm this against current city law, please call 311 is a complete answer. Refusal rates also become a useful signal in their own right: a topic the bot can never answer is a topic the city should either improve its source coverage on or stop offering through the chatbot.
The bot then has to be continuously tested. A small test harness runs hundreds of known compliance questions against it every day. Errors per topic are scored against the underlying statute and published as a dashboard the procurement office, the City Council and the public can all see. When a topic's error rate crosses a threshold, the bot routes that topic straight to fallback until the underlying issue is fixed. The administration does not need to wait for The Markup to publish.
Finally, the records themselves should be public in aggregate. Questions asked, topics, grounded-answer rates, refusal rates, daily, anonymised, on a city dashboard. Journalists then do not have to reverse-engineer what the bot is telling people. The city is the source of truth on what it is telling its own citizens.
MyCity is not the last municipal chatbot. Almost every major US city and county now has AI procurement underway. The procurement standards being written this year, what the city must contractually require of the vendor, what records the vendor must keep, what the vendor must be willing to disclose, are the standards by which every program of this shape will be judged. The cities that write them well will not need their own version of The Markup's investigation. The ones that don't will read about themselves on a Friday.
This entry is an educational analysis based on the publicly reported sources listed below. It does not constitute legal advice. Facts are stated to the best of our knowledge as of the date of publication; corrections will be issued promptly on request. Contact: ellie@useheadlights.com.
Sources
- Malfunctioning NYC AI Chatbot Still Active Despite Widespread Evidence It's Encouraging Illegal Behavior (The Markup)Investigative Journalism
- NYC's Microsoft-Powered Chatbot Tells Business Owners to Break the Law (CX Today)News
- Mamdani to kill the NYC AI chatbot we caught telling businesses to break the law (The Markup, January 2026)News
- We asked NYC's new chatbot our real estate questions (Brick Underground)Investigative Journalism
- NYC's New Mayor Is Killing the City's Faulty Chatbot (VICE)News