arviz_stats.residual_r2#
- arviz_stats.residual_r2(data, pred_mean=None, obs_name=None, summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#
Residual \(R^2\) for Bayesian regression models.
The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model. For details of the residual \(R^2\) see [1].
Briefly, it is defined as:
\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the residual variance.
\[\mathrm{Var}_{\mathrm{res}}^s = V_{n=1}^N \hat{e}_n^s,\]where \(\hat{e}_n^s=y_n-\hat{y}_n^s\) are the residuals for observation \(n\) in posterior sample \(s\).
The residual \(R^2\) differs from the Bayesian \(R^2\) in that it computes residual variance from the observed data, while for the Bayesian \(R^2\) all variance terms come from the model, and not directly from the data.
- Parameters:
- data
xarray.DataTreeorInferenceData Input data. It should contain the posterior_predictive and observed_data groups.
- pred_mean
str Name of the variable representing the predicted mean.
- obs_name
str, optional Name of the variable representing the observed data.
- summary: bool
Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval
- group
str, optional Group from which to obtain the predicted means (pred_name). Defaults to “posterior”.
- point_estimate: str
The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.
- ci_kind: str
The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.
- ci_prob: float
The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.
- circular: bool
Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - Var_{\mathrm{res}}\). where \(Var_{\mathrm{res}}\) is computed using the circular variance which goes from 0 to 1. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.
- round_to: int or str or None, optional
- If integer, number of decimal places to round the result. Integers can be negative.
If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.
- data
- Returns:
Namedtupleorarray
See also
arviz_stats.bayesian_r2Bayesian \(R^2\).
arviz_stats.loo_r2LOO-adjusted \(R^2\).
References
[1]Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.
Examples
Calculate residual \(R^2\) for Bayesian logistic regression:
In [1]: from arviz_stats import residual_r2 ...: from arviz_base import load_arviz_data ...: data = load_arviz_data('anes') ...: residual_r2(data, pred_mean='p', obs_name='vote') ...: Out[1]: residual_R2(mean=0.49, eti_lb=0.45, eti_ub=0.52)
Calculate residual \(R^2\) for Bayesian circular regression:
In [2]: data = load_arviz_data('periwinkles') ...: residual_r2(data, pred_mean='mu', obs_name='direction', circular=True) ...: Out[2]: residual_R2(mean=0.81, eti_lb=0.79, eti_ub=0.82)