The question
For every Mets season from 1988 to 2025, we calculate expected wins based on payroll percentile, Opening Day WAR, and run differential. We compare expected wins to actual wins. We model the distribution of the resulting performance gap across all 30 franchises over the same period. The model answers one question:
Is the Mets' cumulative underperformance relative to their talent and resources, concentrated specifically in September and specifically in high-stakes situations, statistically distinguishable from bad luck?
Model 1: The Performance Gap
Linear regression across all 30 teams, 37 seasons (1,110 data points). Features: payroll percentile, pre-season WAR projection, run differential through August. The Mets' mean residual: -4.7 wins per season below prediction. Their z-score: -3.48. Their rank among 30 franchises in cumulative negative residual: dead last. 30th of 30.
Model 2: September Specificity
Is the underperformance randomly distributed across the season? A paired t-test comparing Mets September win percentage to their April-August win percentage across 37 curse-era seasons. The September mean: .438. The April-August mean: .512. The concentration coefficient: .145. The t-statistic: -2.84. The distribution is not random. It is concentrated in September, specifically, repeatedly, across different rosters, different managers, different decades.
Model 3: Opponent Specificity
Is the Mets' poor performance randomly distributed across opponents in high-leverage situations? The Marlins Constant: 10.88 — the ratio of observed Marlins elimination involvement to expected (if losses were distributed evenly among 29 opponents). Three elimination-context losses to the Marlins in eight total opportunities. The chi-squared test returns p = .0031.
Model 4: The Curse Coefficient
Fisher's method combines the three independent p-values. The test statistic follows a chi-squared distribution with 6 degrees of freedom. The combined p-value is then contextualized via Monte Carlo simulation: for each of 30 franchises, we simulate 10,000 37-year windows with random draws from the observed residual distribution. We count how often any franchise produces a combined signal as extreme as the Mets'.
The answer: in 10,000 simulations across 30 franchises, a combined signal this extreme appeared fewer than 4 times. The Curse Coefficient: p = 0.0004.
"The observed pattern is extremely unlikely to occur by chance in any franchise over any 37-year window. The data has found something."
Incompetence, curse, or both?
The platform's through-line question, answered: the data cannot distinguish between a structural organizational failure that manifests specifically in September under pressure, and a curse. Both produce the same statistical signature. Both are consistent with the observed p-value. The reader may choose their preferred interpretation. The number is the same either way.