Does our rating reinforce neighborhood segregation? We checked.

Plain answer: no. Within any given state, no rating component (Process, Structural, or the Overall composite) correlates with any of 15 ZIP-code-level ACS demographics at |Pearson r| ≥ 0.05. The audit threshold is 0.25 — we have roughly 10× headroom on the test that actually matters.

Why this audit exists

A rating system that ranks businesses based on inputs correlated with neighborhood demographics — race, income, education, language — risks reinforcing existing segregation in childcare access. Parents in already-disadvantaged neighborhoods would see lower-rated options because of where their daycare is, not because of how the daycare cares for kids.

That's a real worry, and it's the most legitimate criticism anyone can raise about our work. So before launch we ran a quantitative audit. The methodology is in how-the-rating-works; this page is the audit numbers.

What we measured

For each of 222,602 displayable daycares in our database (228K total minus ~6K hidden because they're closed/suspended), we joined the ZIP Code Tabulation Area (ZCTA) the daycare sits in to American Community Survey (ACS) demographics. Then we computed the Pearson correlation between each of these 15 demographics and each rating component:

pct_white_nonhispanic
pct_black
pct_hispanic
pct_asian
pct_aian (American Indian / Alaska Native)
median_household_income
pct_below_poverty
pct_below_200_poverty (below 200% of federal poverty line)
pct_bachelors_or_higher
unemployment_rate
labor_force_participation
pct_owner_occupied
pct_single_parent_fam
pct_lep_households (limited English proficiency)
median_rent_pct_income

Threshold for the audit: |r| < 0.25 on every demographic for every rating component. That's a strict test — 0.25 is a "weak" correlation by social-science conventions; we'd have to land below it to claim demographic neutrality.

The pooled result (v3)

| Score | Max |Pearson r| | Most-correlated demographic | |---|---:|---| | Process | 0.090 | median_household_income | | Structural | 0.173 | pct_hispanic | | Composite | 0.186 | pct_hispanic | | Overall stars (rounded) | 0.179 | pct_hispanic |

Pooled correlations rose from v2 (where the composite max was 0.049) because v3 added provider-level inspection, license, and credential data. The new data concentrates in particular states (TX, FL, CA, NY, IL) which have their own demographic profiles — so pooled correlations conflate "state policy regimes correlate with state demographics" with "rating discriminates within neighborhoods." We disaggregate that next.

The within-state result (the one that actually matters)

Pooled correlations conflate two things:

Between-state differences — states with different demographic profiles have different policy regimes (which states fund QRIS, which have rich licensing infrastructure, which inspect facilities aggressively).
Within-state differences — for two daycares in the same state, does the rating track the demographics of the surrounding ZIP?

The audit's spirit is about (2), not (1). To isolate the within-state signal we demean each variable within state before computing the correlation — the standard "fixed effects" approach.

| Score | Max |Pearson r| within-state | Most-correlated demographic | |---|---:|---| | Process | 0.046 | labor_force_participation | | Structural | 0.030 | pct_aian | | Composite | 0.022 | pct_black | | Overall stars | 0.022 | pct_black |

Translation: pooled |r| between Composite and pct_hispanic was 0.186. Within-state, it collapses to 0.005. The pooled signal was almost entirely a between-state confound — states with more Hispanics tend to fund QRIS more and inspect more thoroughly, but within any given state, the rating doesn't track neighborhood Hispanic share.

Within-state, the v3 audit is better than v2 (composite max |r| dropped from 0.030 to 0.022) — adding provider-level structural data didn't introduce neighborhood bias; if anything, it slightly reduced it.

How state-anchored is each score?

The ICC (intraclass correlation coefficient) tells you what fraction of a score's variance is between states vs. within states.

| Score | ICC | Interpretation | |---|---:|---| | Provider-level Structural only (where we have real per-facility data) | 0.305 | About 70% of the variance is within state. This is the headline win of v3 — for the ~135K facilities where we have provider-level data, the rating genuinely differentiates between facilities in the same state. | | Final Structural (provider + state-baseline fallback) | 0.666 | Reflects the 60/40 mix between provider-level data and the state-regulatory-baseline fallback. Fallback is by design state-anchored, which pulls the overall ICC up. | | Process | 0.391 | Process has substantial within-state variation — facilities in the same state can have meaningfully different QRIS / accreditation / CLASS attachments. | | Composite (Overall) | 0.514 | Roughly half between, half within. |

In v2, Structural's ICC was 0.907 — essentially a state-level baseline with no within-state variation. v3 introduces provider-level data where it exists, which is why the provider-level-only number drops to 0.305.

What this audit does not address

A few honest limits:

HI and SD have no provider-level structural data under our current scrape coverage. Both rely on the state-level baseline fallback. HI's DOE-operated and charter pre-K classrooms benefit from a documented public-school override, but private licensed centers in HI receive a state-floor rating. Same for SD. We're working on additional data sources for both.
Cross-state comparability of inspection rates. Texas and California inspect facilities more aggressively than Florida — so a 20% violation rate means somewhat different things in different states. Within-state comparisons remain valid; cross-state comparisons should be read cautiously.
Facility-level credential coverage is thin. Per-classroom teacher-credential distributions are available for Florida and Louisiana so far (about 2% of the master). Public-school pre-K classrooms in 11 additional states get program-documented credential overrides. The other 98% rely on state-level teacher requirements.
Future drift. The ratings get refreshed quarterly; we re-run this audit on every refresh and publish the numbers. If a rating change ever pushes any |r| above 0.15 (well below our 0.25 threshold but a useful early-warning bar), we investigate before publishing.

Why we removed BTP

Earlier drafts of the methodology added a residual-based +0.5 star bonus to daycares that scored higher than a regression on neighborhood demographics would predict — the "Beating the Predicted" or BTP bonus. The launch-gate audit (2026-05-05) showed this didn't meaningfully reduce demographic correlation in the ratings, and on a few demographics slightly increased it. We removed it in v2 and haven't reintroduced it in v3. The natural Process+Structural blend already passes the audit with comfortable headroom; no residual adjustment was earning its keep.

Reproducing this audit

The full audit pipeline is in the project repo:

python code/compute_structural_v3.py     # provider-level structural sections + state-level fallback
python code/calculate_ratings_v3.py      # final ratings (Process + Structural + Composite)
python code/segregation_audit_v3.py      # this audit

Inputs are the master CSV (clean_data/all_states_master_with_accreditations_2026-05-07.csv), the per-state Structural comparison CSV, the Head Start CLASS grantee file, and the v2-derived ACS demographics file. Outputs are the augmented master + every audit CSV referenced above.

If you spot something off or want to compute alternative correlations, the inputs and outputs are public.