Probability Distributions
Overview
The precip-index package supports five probability distributions for SPI and SPEI calculation. Each distribution offers different characteristics suited to various data types and use cases.
All distributions are implemented in the distributions module with a unified API, and can be selected via the distribution parameter in any SPI/SPEI function.
| Distribution | String Key | Recommended For | Parameters |
|---|---|---|---|
| Gamma | 'gamma' |
SPI (default) | alpha, beta, prob_zero |
| Pearson Type III | 'pearson3' |
SPEI | skew, loc, scale, prob_zero |
| Log-Logistic | 'log_logistic' |
SPEI (alternative) | alpha, beta, prob_zero |
| GEV | 'gev' |
Extreme value analysis | shape, loc, scale, prob_zero |
| Generalized Logistic | 'gen_logistic' |
European drought indices | shape, loc, scale, prob_zero |
Distribution Selection
When to Use Each Distribution
Gamma (default for SPI)
- Standard choice for precipitation data (McKee et al. 1993)
- Two-parameter distribution: simple, well-understood
- Efficient Numba-optimized fast path in this package
- Best for precipitation-only analysis (SPI)
Pearson Type III (recommended for SPEI)
- Three-parameter distribution that handles skewed data
- Recommended by Vicente-Serrano et al. (2010) for water balance data
- Better fit for P-PET distributions which can be negative
- Includes skewness parameter for asymmetric tails
Log-Logistic (alternative for SPEI)
- Used in the original SPEI R package by Begueria & Vicente-Serrano
- Good tail behavior for extreme events
- Two-parameter (alpha, beta) fitted via Maximum Likelihood Estimation (MLE)
- Computationally efficient
GEV (Generalized Extreme Value)
- Designed for modeling extreme values (block maxima)
- Three-parameter (shape, location, scale)
- Appropriate for analyzing extreme drought or flood events
- Shape parameter controls tail behavior (Gumbel, Frechet, Weibull subtypes)
Generalized Logistic
- Used in some European drought monitoring systems
- Three-parameter with flexible tail behavior
- Good for regions with moderate climate variability
Quick Recommendation
from indices import spi, spei
# SPI: use Gamma (default)
spi_12 = spi(precip, scale=12)
# SPEI: use Pearson III
spei_12 = spei(precip, pet=pet, scale=12, distribution='pearson3')
# Extreme analysis: use GEV
spi_extreme = spi(precip, scale=1, distribution='gev')Mathematical Background
Gamma Distribution
PDF:
\[ f(x) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha-1} e^{-x/\beta}, \quad x > 0 \]
Parameters:
- \(\alpha\) (shape) > 0: controls the shape of the distribution
- \(\beta\) (scale) > 0: controls the spread
Estimation: Method of Moments using the Thom (1958) approximation:
\[ A = \ln(\bar{x}) - \frac{\sum \ln(x_i)}{n}, \quad \alpha = \frac{1 + \sqrt{1 + 4A/3}}{4A}, \quad \beta = \frac{\bar{x}}{\alpha} \]
Pearson Type III Distribution
PDF:
\[ f(x) = \frac{|b|}{\Gamma(a)} \left|b(x - c)\right|^{a-1} e^{-b(x-c)} \]
Parameters:
- skew: skewness coefficient
- loc: location (mean)
- scale: standard deviation
Estimation: Method of Moments using sample mean, variance, and skewness. This approach is computationally efficient and well-suited for the water balance data typically used with SPEI.
Log-Logistic Distribution
CDF:
\[ F(x) = \frac{1}{1 + (\alpha/x)^\beta}, \quad x > 0 \]
Parameters:
- \(\alpha\) (scale) > 0
- \(\beta\) (shape) > 0
Estimation: Maximum Likelihood Estimation (MLE) using SciPy’s fisk distribution. MLE provides asymptotically efficient parameter estimates for this distribution.
Generalized Extreme Value (GEV)
CDF:
\[ F(x) = \exp\left\{-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right\} \]
Parameters:
- \(\xi\) (shape): controls tail behavior
- \(\mu\) (location): center of the distribution
- \(\sigma\) (scale) > 0: spread
Special cases:
- \(\xi = 0\): Gumbel (Type I)
- \(\xi > 0\): Frechet (Type II, heavy tail)
- \(\xi < 0\): Weibull (Type III, bounded upper tail)
Generalized Logistic
CDF:
\[ F(x) = \frac{1}{1 + \left[1 - k\left(\frac{x-\mu}{\sigma}\right)\right]^{1/k}} \]
Parameters: shape (\(k\)), location (\(\mu\)), scale (\(\sigma\))
Zero-Inflation Handling
Precipitation data often contains zero values, requiring a mixed distribution approach. All distributions in this package handle zero-inflation using:
\[ H(x) = q + (1-q) \cdot F(x) \]
Where:
- \(q\) = probability of zero (proportion of zeros in calibration data)
- \(F(x)\) = CDF of the continuous distribution fitted to non-zero values
- \(H(x)\) = mixed CDF used for standardization
This ensures that:
- Zero values receive appropriate probability mass
- The continuous distribution is fitted only to positive values
- The final SPI/SPEI transformation correctly accounts for both components
Fitting Methods
The package supports three parameter estimation methods, with each distribution having a recommended default:
Method of Moments (Default for Gamma, Pearson III)
- Speed: Fast
- Used for: Gamma distribution, Pearson Type III
- Approach: Match sample moments (mean, variance, skewness) to distribution parameters
- Implementation:
- Gamma: Thom (1958) approximation using log-transformed data
- Pearson III: Direct calculation from sample moments
- Advantage: Computationally efficient, well-suited for large gridded datasets
Maximum Likelihood Estimation (Default for Log-Logistic)
- Speed: Moderate (iterative optimization)
- Used for: Log-Logistic distribution
- Approach: Maximize the log-likelihood function
- Implementation: SciPy’s optimized
fisk.fit()method - Advantage: Asymptotically efficient (optimal for large samples)
L-Moments (Default for GEV, Generalized Logistic)
- Speed: Moderate
- Used for: GEV, Generalized Logistic
- Approach: Use linear combinations of order statistics
- Advantage: More robust than conventional moments, especially for small samples and extreme value distributions
- Reference: Hosking (1990)
Default Methods by Distribution
| Distribution | Default Method | Rationale |
|---|---|---|
| Gamma | Method of Moments | Fast, proven approach for precipitation (WMO standard) |
| Pearson III | Method of Moments | Efficient for water balance data with moderate skewness |
| Log-Logistic | MLE | Optimal asymptotic properties for this distribution |
| GEV | L-moments | Robust for extreme value analysis |
| Generalized Logistic | L-moments | Robust for heavy-tailed data |
Robustness Features
The distribution fitting includes several safeguards:
Fallback Behavior
If the primary fitting method produces invalid parameters, the system applies fallback strategies:
- Gamma: Method of Moments with robust handling of edge cases (near-zero variance, extreme values)
- Pearson III: Method of Moments → Normal approximation (skew=0) for degenerate cases
- Log-Logistic: MLE via SciPy’s optimizer with parameter bounds
- GEV: L-moments with polynomial approximations for shape parameter
- Generalized Logistic: L-moments with bounded parameter estimates
The package includes L-moments implementations for all distributions, which can be explicitly requested via method=FittingMethod.LMOMENTS. However, Method of Moments is the default for Gamma and Pearson III because it is faster and performs well for typical climate data.
Parameter Bounding
All fitted parameters are bounded to prevent numerical overflow:
- Shape parameters: capped at 1000
- Scale parameters: bounded between \(10^{-10}\) and \(10^{10}\)
- Probabilities: clamped to \((10^{-10}, 1 - 10^{-10})\) before normal quantile transform
Data Diagnostics
Before fitting, the module checks:
- Minimum sample size (30 valid values required)
- Minimum non-zero values (10 required for reliable fitting)
- Maximum zero proportion (95% threshold)
- Minimum variance (prevents degenerate fits)
Usage Examples
Basic Distribution Selection
from indices import spi, spei
# SPI with default Gamma distribution
spi_gamma = spi(precip, scale=12)
# SPI with Pearson III
spi_p3 = spi(precip, scale=12, distribution='pearson3')
# SPEI with Log-Logistic (as in original R SPEI package)
spei_ll = spei(precip, pet=pet, scale=12, distribution='log_logistic')
# SPEI with Pearson III (recommended)
spei_p3 = spei(precip, pet=pet, scale=12, distribution='pearson3')Multi-Scale with Distribution
from indices import spi_multi_scale, spei_multi_scale
# Multi-scale SPI with GEV for extreme analysis
spi_ds = spi_multi_scale(precip, scales=[1, 3, 6, 12], distribution='gev')
# Multi-scale SPEI with Pearson III
spei_ds = spei_multi_scale(precip, pet=pet, scales=[3, 6, 12],
distribution='pearson3')Global-Scale Processing
from indices import spi_global, spei_global
# Global SPI with Pearson III
result = spi_global(
'chirps_global_monthly.nc',
'spi_pearson3_12_global.nc',
scale=12,
distribution='pearson3'
)
# Global SPEI with Log-Logistic
result = spei_global(
'chirps_global_monthly.nc',
'pet_global_monthly.nc',
'spei_loglogistic_12_global.nc',
scale=12,
distribution='log_logistic'
)Save and Load Parameters
from indices import spi, save_fitting_params, load_fitting_params
# Fit with Pearson III and save parameters
spi_12, params = spi(precip, scale=12, distribution='pearson3',
return_params=True)
save_fitting_params(
params, 'params_pearson3.nc',
scale=12, periodicity='monthly',
distribution='pearson3'
)
# Load and reuse parameters
params = load_fitting_params('params_pearson3.nc', scale=12,
periodicity='monthly')
# Distribution auto-detected from file
spi_12_new = spi(new_precip, scale=12, fitting_params=params,
distribution='pearson3')Output Variable Naming
Output variables include the distribution name in the variable name:
| Distribution | SPI Variable Name | SPEI Variable Name |
|---|---|---|
| Gamma | spi_gamma_12_month |
spei_gamma_12_month |
| Pearson III | spi_pearson3_12_month |
spei_pearson3_12_month |
| Log-Logistic | spi_log_logistic_12_month |
spei_log_logistic_12_month |
| GEV | spi_gev_12_month |
spei_gev_12_month |
| Gen. Logistic | spi_gen_logistic_12_month |
spei_gen_logistic_12_month |
Performance Notes
- Gamma distribution uses a Numba-optimized fast path with vectorized NumPy/SciPy operations. This is significantly faster than other distributions for large datasets.
- Non-Gamma distributions use the generic
distributions.pymodule, which is scipy-based and processes cells individually. For global-scale datasets, expect longer computation times. - For large datasets, consider using the chunked processing pipeline (
spi_global(),spei_global()) which manages memory automatically regardless of distribution choice.
References
McKee, T.B., Doesken, N.J., Kleist, J. (1993). The relationship of drought frequency and duration to time scales. 8th Conference on Applied Climatology.
Vicente-Serrano, S.M., Begueria, S., Lopez-Moreno, J.I. (2010). A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index. Journal of Climate, 23(7), 1696-1718.
Stagge, J.H., Tallaksen, L.M., Gudmundsson, L., Van Loon, A.F., Stahl, K. (2015). Candidate Distributions for Climatological Drought Indices (SPI and SPEI). International Journal of Climatology, 35(13), 4027-4040.
Hosking, J.R.M. (1990). L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B, 52(1), 105-124.
Thom, H.C.S. (1958). A note on the gamma distribution. Monthly Weather Review, 86(4), 117-122.
See Also
- Validation & Test Results - Quality verification across distributions
- Methodology - Scientific background
- Implementation Details - Code architecture
- API Reference - Function documentation
- User Guides - Practical usage