I have completed a comprehensive statistical analysis of NASA exoplanet discovery data, examining whether yearly discovery counts follow a Poisson distribution or require alternative modeling approaches. The analysis reveals significant findings that have important implications for understanding the nature of scientific discovery processes.
You Can Find The Dataset Used For The Analysis Here:
https://www.kaggle.com/datasets/adityamishraml/nasaexoplanets
Extreme Overdispersion Detected: The data exhibits dramatic overdispersion with a variance-to-mean ratio of 571.22, indicating that the observed variability is over 500 times greater than what a Poisson distribution would predict. This completely violates the fundamental Poisson assumption that variance equals the mean.
Poisson Model Fails Completely: The chi-square goodness-of-fit test for the Poisson distribution yields infinite test statistics, providing overwhelming evidence against this model. The Poisson distribution cannot accommodate the extreme variability observed in exoplanet discoveries.
Negative Binomial Model Succeeds: In contrast, the Negative Binomial distribution provides an excellent fit with:
Chi-square statistic: 0.649
P-value: 0.421
Model parameters: r = 0.297, p = 0.0018
The model successfully captures the clustering patterns in discovery data.
Comparison of observed exoplanet discoveries with Poisson and Negative Binomial fitted distributions showing significant overdispersion.
The binned frequency analysis clearly demonstrates the superior performance of the Negative Binomial model across all count ranges. While the Poisson model produces either negligible or grossly inflated expected frequencies, the Negative Binomial expected values closely match observations.