arviz_stats.bayesian_r2#
- arviz_stats.bayesian_r2(data, pred_mean=None, scale=None, scale_kind='sd', summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#
Bayesian \(R^2\) for regression models.
The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model.
The Bayesian \(R^2\) (or modeled \(R^2\)) differs from other definitions of \(R^2\) in that it is computed only using posterior quantities from the fitted model. For details of the Bayesian \(R^2\) see [1].
Briefly, it is defined as:
\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the modelled residual variance.
For a Gaussian family, this is \(\\sigma^2\). For a Bernoulli family, this is \(p(1-p)\), where \(p\) is the predicted probability of success (see [2] for details). This is computed internally if scale is not provided.
For other models, you may need to compute the appropriate scale variable representing the modeled variance (or pseudo-variance) and pass it using the
scaleargument.- Parameters:
- data
xarray.DataTreeorInferenceData Input data. It should contain the posterior and posterior_predictive groups.
- pred_mean
str Name of the variable representing the predicted mean.
- scale
str, optional Name of the variable representing the modeled variance (or pseudo-variance). It can be omitted for binary classification problems, in which case the pseudo-variance is computed internally.
- scale_kind
str Whether the variable referenced by scale is a standard deviation (“sd”) or variance (“var”). Defaults to “sd”. If “sd”, it is squared internally to obtain the variance. Omitted if scale is None.
- summary: bool
Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval
- group
str, optional Group from which to obtain the predicted means (pred_mean) and scale (scale).
- point_estimate: str
The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.
- ci_kind: str
The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.
- ci_prob: float
The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.
- circular: bool
Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - \mathrm{Var}_{\mathrm{res}}\). Thus the scale must represent the modeled circular variance and scale_kind must be “var”. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.
- round_to: int or str or None, optional
If integer, number of decimal places to round the result. Integers can be negative. If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.
- data
- Returns:
Namedtupleorarray
See also
arviz_stats.residual_r2Residual \(R^2\).
arviz_stats.loo_r2LOO-adjusted \(R^2\).
References
[1]Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.
[2]Tjur, T. Coefficient of determination in logistic regression models-A new proposal: The coefficient of discrimination The American Statistician, 63(4) (2009). https://doi.org/10.1198/tast.2009.08210
Examples
Calculate Bayesian \(R^2\) for logistic regression:
In [1]: from arviz_stats import bayesian_r2 ...: from arviz_base import load_arviz_data ...: data = load_arviz_data('anes') ...: bayesian_r2(data, pred_mean="p") ...: Out[1]: bayesian_R2(mean=0.49, eti_lb=0.43, eti_ub=0.55)
Calculate Bayesian \(R^2\) for circular regression. The posterior has the concentration parameter
kappa(from the VonMises distribution). Thus we compute the circular variance as \(1 - I_1(\kappa) / I_0(\kappa)\),In [2]: from scipy.special import i0, i1 ...: data = load_arviz_data('periwinkles') ...: kappa = data.posterior['kappa'] ...: data.posterior["variance"] = 1 - i1(kappa) / i0(kappa) ...: bayesian_r2(data, pred_mean='mu', scale='variance', ...: scale_kind="var", circular=True) ...: Out[2]: bayesian_R2(mean=0.76, eti_lb=0.65, eti_ub=0.84)