arviz_stats.bayesian_r2

Contents

arviz_stats.bayesian_r2#

arviz_stats.bayesian_r2(data, pred_mean=None, scale=None, scale_kind='sd', summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#

Bayesian \(R^2\) for regression models.

The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model.

The Bayesian \(R^2\) (or modeled \(R^2\)) differs from other definitions of \(R^2\) in that it is computed only using posterior quantities from the fitted model. For details of the Bayesian \(R^2\) see [1].

Briefly, it is defined as:

\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]

where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the modelled residual variance.

For a Gaussian family, this is \(\\sigma^2\). For a Bernoulli family, this is \(p(1-p)\), where \(p\) is the predicted probability of success (see [2] for details). This is computed internally if scale is not provided.

For other models, you may need to compute the appropriate scale variable representing the modeled variance (or pseudo-variance) and pass it using the scale argument.

Parameters:
dataxarray.DataTree or InferenceData

Input data. It should contain the posterior and posterior_predictive groups.

pred_meanstr

Name of the variable representing the predicted mean.

scalestr, optional

Name of the variable representing the modeled variance (or pseudo-variance). It can be omitted for binary classification problems, in which case the pseudo-variance is computed internally.

scale_kindstr

Whether the variable referenced by scale is a standard deviation (“sd”) or variance (“var”). Defaults to “sd”. If “sd”, it is squared internally to obtain the variance. Omitted if scale is None.

summary: bool

Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval

groupstr, optional

Group from which to obtain the predicted means (pred_mean) and scale (scale).

point_estimate: str

The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.

ci_kind: str

The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.

ci_prob: float

The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.

circular: bool

Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - \mathrm{Var}_{\mathrm{res}}\). Thus the scale must represent the modeled circular variance and scale_kind must be “var”. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.

round_to: int or str or None, optional

If integer, number of decimal places to round the result. Integers can be negative. If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.

Returns:
Namedtuple or array

See also

arviz_stats.residual_r2

Residual \(R^2\).

arviz_stats.loo_r2

LOO-adjusted \(R^2\).

References

[1]

Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.

[2]

Tjur, T. Coefficient of determination in logistic regression models-A new proposal: The coefficient of discrimination The American Statistician, 63(4) (2009). https://doi.org/10.1198/tast.2009.08210

Examples

Calculate Bayesian \(R^2\) for logistic regression:

In [1]: from arviz_stats import bayesian_r2
   ...: from arviz_base import load_arviz_data
   ...: data = load_arviz_data('anes')
   ...: bayesian_r2(data, pred_mean="p")
   ...: 
Out[1]: bayesian_R2(mean=0.49, eti_lb=0.43, eti_ub=0.55)

Calculate Bayesian \(R^2\) for circular regression. The posterior has the concentration parameter kappa (from the VonMises distribution). Thus we compute the circular variance as \(1 - I_1(\kappa) / I_0(\kappa)\),

In [2]: from scipy.special import i0, i1
   ...: data = load_arviz_data('periwinkles')
   ...: kappa = data.posterior['kappa']
   ...: data.posterior["variance"] = 1 - i1(kappa) / i0(kappa)
   ...: bayesian_r2(data, pred_mean='mu', scale='variance',
   ...:             scale_kind="var", circular=True)
   ...: 
Out[2]: bayesian_R2(mean=0.76, eti_lb=0.65, eti_ub=0.84)