HD-INC-017 · Real estate · Algorithmic harm

Zillow's home-pricing algorithm bought more than $1B in houses above what they could be sold for, and ended the iBuyer business in a single quarter

An aggressive pricing model bought 7,000 homes during a hot market. The model could not keep pace with the turn. Zillow took a $304 million Q3 write-down, shut the division, and laid off 25% of the company.

What happened

Zillow Offers was the iBuyer arm of Zillow Group: the part of the company that, beginning in 2018, made cash offers on residential homes, bought them, did light renovations, and sold them back into the market. The pitch was that an algorithm trained on Zillow's vast home-pricing dataset, the same dataset behind the Zestimate, the consumer-facing home-value estimate Zillow had been refining for years, could outperform a human flipper at scale. Buy at the algorithm's offer price, hold briefly, sell at the algorithm's predicted exit price, capture the margin.

Through the first half of 2021, Zillow Offers operated in a US housing market that was, in retrospect, peaking. Prices were rising fast. The pricing model was tuned for an environment in which homes appreciated rapidly between purchase and resale; the margin came partly from the appreciation. Zillow's bidding became more aggressive. Offers crept above what comparable homes were transacting for, on the theory that by the time the resale closed, the market would have caught up.

In the second half of 2021, the market turned. Not dramatically, the bubble did not burst, but the rate of appreciation slowed, supply chain disruptions extended the renovation cycle, and labour shortages pushed contractor schedules out. Homes Zillow had bought at peak-aggressiveness pricing began sitting on the books longer than the model assumed. The exit prices the model had projected stopped materialising. By late summer, Zillow had a multi-thousand-home inventory it could not move at the prices it had paid.

On 17 October 2021, Zillow paused new home purchases through Offers, citing capacity constraints. On 2 November 2021, in its Q3 earnings release, the company announced it was winding down Zillow Offers entirely. The Q3 release disclosed an inventory write-down of approximately US$304 million; the total cost of the wind-down, including write-downs across Q3 and the expected Q4 impairment, ran past US$500 million. Approximately 25 percent of Zillow's workforce, around 2,000 employees, would be laid off. The stock dropped over 10 percent the day of the announcement and continued falling in the days that followed.

Co-founder and CEO Rich Barton, in the shareholder letter, framed the decision in terms of risk to the broader business: the algorithm's "inability to accurately forecast the price of homes in the future" combined with "labour and supply shortages" had created exposure too large for Zillow to continue carrying. A securities class action followed in the Western District of Washington, alleging that Zillow had misled investors about the model's performance through 2021. The case settled in 2023.

The Zillow Offers shutdown is the foundational case study in a particular failure mode: an algorithmic system that performs well in the conditions it was trained on, fails when the conditions change, and fails most expensively precisely when its operator has scaled the system up most aggressively.

What an auditable version would have shown

Zillow had structured records. It was, after all, a data company. What Zillow lacked was a structured record of what the model believed about the future at the moment each purchase was authorised, and whether that belief was being independently challenged.

For each home Zillow Offers bought, the relevant question is not just "what did the model offer" but "what did the model expect to sell this for, in how many days, with what confidence, and what was the confidence interval around that estimate." For a portfolio of 7,000 homes, that is 7,000 forecasts. Aggregated, the forecasts have a distribution: median expected hold time, variance, confidence intervals, sensitivity to underlying market assumptions. The question that mattered in the third quarter of 2021, is the model's view of the future systematically wrong because the regime has shifted, should have been answerable from the structured record of forecasts versus actual outcomes, in close to real time.

Zillow's internal reporting was sufficient to know that homes were sitting longer than expected. It was not, in the public record at least, structured in a way that surfaced the deeper question: was the forecast distribution itself drifting? Were closed sales coming in systematically below model predictions on a pattern that signalled a regime change rather than ordinary noise?

An auditable version would log, for each purchase decision, the model's expectations: predicted resale price, predicted hold time, confidence interval, key assumptions about appreciation and renovation costs. Each closed sale would be matched to its purchase-time forecast, with the gap computed. The aggregate gap, sliced by region, by purchase month, by model version, would be a continuous quality signal. A widening gap between forecast and actual is the early sign that the model's view of the world has stopped matching the world's actual behaviour. Acting on that signal weeks earlier, rather than in a Q3 earnings release, would have meant smaller inventory, smaller write-downs, and a different company.

Where the gap was

The gap was in the feedback loop between the model and the operating decisions, not in the model itself.

The Zillow Offers pricing model was capable. It was working with one of the larger residential-real-estate datasets in the world. It had been refined over years against the Zestimate's predictions of standing home values. Its problem was not technical incapacity. Its problem was that the operating decisions, how aggressively to bid, how many homes to buy, in which markets, were being made on the basis of recent performance rather than on the basis of how confident the model was in its forward forecasts.

When the model is right consistently, the operator scales up. The operator scales up by giving the model more aggressive bidding parameters and pushing it into more markets. The model's outputs at the scaled-up bidding parameters are still confident, confidence does not necessarily go down just because the bidding got more aggressive. But the model's sensitivity to assumption changes did, mechanically, increase. Aggressive bidding means a smaller buffer between purchase price and projected exit. A market turn that would have eaten the buffer at conservative bidding eats well past it at aggressive bidding.

The control that was missing was a forecast-skill metric, watched continuously, with an explicit threshold at which bidding parameters revert. Forecast-skill metrics are standard in weather forecasting, in economics, and increasingly in trading systems. They were not, on the available evidence, the basis on which Zillow Offers' bidding aggressiveness was governed. The bidding aggressiveness was governed by what looked, period-on-period, like rising performance. The performance was a function of a market regime that was about to change.

What governance should have looked like

For any algorithmic system that takes operating decisions, the governance question is not whether the model is accurate today. It is whether you have continuous visibility into the model's forecast skill and a written, signed escalation path when that skill degrades.

from headlights import (
    ConductRecord,
    MetricRecord,
    ConstraintGate,
    sign,
    chain,
)
from datetime import datetime, timezone

# Every purchase decision is recorded with the model's full forecast.
record = ConductRecord(
    decision_type="ibuyer_home_purchase",
    listing_id=listing_id,
    offer_price_usd=412_500,
    model_version="zillow-offers-v9.3",
    forecast={
        "predicted_resale_price_usd": 458_000,
        "predicted_hold_days": 92,
        "confidence_interval_resale": (440_000, 476_000),
        "key_assumptions": {
            "regional_appreciation_quarterly": 0.027,
            "renovation_cost_usd": 18_500,
            "renovation_duration_days": 35,
        },
    },
    decision_made_by="algorithmic_auto_approve",
    human_authoriser_id=None,        # below auto-approve threshold
    timestamp=datetime.now(timezone.utc),
    previous_record_hash=last_record.hash(),
)

signed = sign(record, key=zillow_private_key)
chain.append(signed)

# Once the home sells, log a matched MetricRecord. The gap between
# forecast and outcome is the forecast-skill signal.
outcome = MetricRecord(
    related_decision_hash=signed.hash(),
    actual_resale_price_usd=429_000,
    actual_hold_days=147,
    forecast_residual_usd=-29_000,   # sold below forecast
    forecast_residual_days=+55,      # held 55 days longer than predicted
    timestamp=datetime.now(timezone.utc),
)
chain.append(sign(outcome, key=zillow_private_key))

# The constraint gate watches aggregate residuals. When the population-
# level forecast skill degrades, the gate forces a step-down in bidding
# aggressiveness automatically.
gate = ConstraintGate(
    watch_metric="forecast_residual_resale_pct",
    rolling_window_days=14,
    threshold=-0.025,                # forecasts averaging 2.5% high
    on_breach="reduce_bidding_aggressiveness_one_step_and_alert_humans",
)

The governance question here is not whether the model is accurate today, but whether you have a continuous, structured read on how the model's predictions are matching reality, and a pre-agreed response when that match degrades.

Capturing the prediction at the moment of the decision is the part most operators skip. It is easy to log the offer price. It is harder to log the model's full view of the future at the time the offer went in: the predicted resale, the predicted hold time, the confidence interval, the key assumptions about appreciation and renovation. Without that, there is no way to tell, after the fact, whether a loss was bad luck or model failure. The forecast is the record.

The other half of the loop is reconciliation. Every closed sale gets matched back to the forecast that authorised the purchase. The gap is the data point. Aggregated across the portfolio, the gaps tell you whether the model is staying calibrated or whether its view of the world is drifting away from what the world is actually doing. A small, stable gap means the model knows what it does not know. A widening gap, especially one that skews in a single direction, is the early sign that the market regime has shifted and the model has not.

The third piece is the response. A widening gap is not, on its own, useful unless something happens because of it. Zillow's bidding aggressiveness was governed by recent profit, not by forecast skill. If the bidding parameters had stepped down automatically when the population-level gap crossed a threshold, and a clearly named human had been notified the moment that happened, the conversation Zillow had with the market on 2 November 2021 would have been a conversation Zillow had with itself sometime in August. Lower drama, lower numbers, ongoing business.

What broke Zillow Offers was not the model. The model was capable. What broke Zillow Offers was the absence of an instrumented feedback loop between what the model believed and what the world actually did, watched in close to real time, with a written response when the two diverged. The next regime change, in housing, in insurance pricing, in algorithmic underwriting, in hiring, will find whichever operator has not yet built that loop.

This entry is an educational analysis based on the publicly reported sources listed below. It does not constitute legal advice. Facts are stated to the best of our knowledge as of the date of publication; corrections will be issued promptly on request. Contact: ellie@useheadlights.com.