python - Desviación estándar ponderada en NumPy?

statsmodels standard-deviation (4)

¿Qué tal el siguiente breve "cálculo manual"?

def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))

numpy.average() tiene una opción de pesos, pero numpy.std() no. ¿Alguien tiene sugerencias para una solución alternativa?

Hay un muy buen ejemplo propuesto por gaborous :

import pandas as pd import numpy as np # X is the dataset, as a Pandas'' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise) # Convert to a Pandas'' Series (it''s just aesthetic and more # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys())) xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance''s values computed correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance

Corregir la ecuación para la covarianza de muestra no sesgada ponderada, URL (versión: 2016-06-28)

Hay una clase en los statsmodels de estadísticas para calcular las estadísticas ponderadas: statsmodels.stats.weightstats.DescrStatsW :

from statsmodels.stats.weightstats import DescrStatsW array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100 weighted_stats = DescrStatsW(array, weights=weights, ddof=0) weighted_stats.mean # weighted mean of data (equivalent to np.average(array, weights=weights)) # 1.97196261682243 weighted_stats.std # standard deviation with default degrees of freedom correction # 0.21434289609681711 weighted_stats.std_mean # standard deviation of weighted mean # 0.020818822467555047 weighted_stats.var # variance with default degrees of freedom correction # 0.045942877107170932

La buena característica de esta clase es que si desea calcular diferentes propiedades estadísticas, las llamadas subsiguientes serán muy rápidas porque los resultados ya calculados (incluso los intermedios) se almacenan en caché.

No parece haber tal función en numpy / scip todavía, pero hay un ticket propone esta funcionalidad adicional. Incluido allí encontrará Statistics.py que implementa las desviaciones estándar ponderadas.