arviz_stats.bayesian_r2

arviz_stats.bayesian_r2#

arviz_stats.bayesian_r2(data, pred_mean=None, scale=None, scale_kind='sd', summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#

Bayesian \(R^2\) for regression models.

The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model.

The Bayesian \(R^2\) (or modeled \(R^2\)) differs from other definitions of \(R^2\) in that it is computed only using posterior quantities from the fitted model. For details of the Bayesian \(R^2\) see [1].

Briefly, it is defined as:

\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]

where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the modelled residual variance.

For a Gaussian family, this is \(\\sigma^2\). For a Bernoulli family, this is \(p(1-p)\), where \(p\) is the predicted probability of success (see [2] for details). This is computed internally if scale is not provided.

For other models, you may need to compute the appropriate scale variable representing the modeled variance (or pseudo-variance) and pass it using the scale argument.

Parameters:

dataxarray.DataTree or InferenceData: Input data. It should contain the posterior and posterior_predictive groups.
pred_meanstr: Name of the variable representing the predicted mean.
scalestr, optional: Name of the variable representing the modeled variance (or pseudo-variance). It can be omitted for binary classification problems, in which case the pseudo-variance is computed internally.
scale_kindstr: Whether the variable referenced by scale is a standard deviation (“sd”) or variance (“var”). Defaults to “sd”. If “sd”, it is squared internally to obtain the variance. Omitted if scale is None.
summary: bool: Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval
groupstr, optional: Group from which to obtain the predicted means (pred_mean) and scale (scale).
point_estimate: str: The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.
ci_kind: str: The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.
ci_prob: float: The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.
circular: bool: Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - \mathrm{Var}_{\mathrm{res}}\). Thus the scale must represent the modeled circular variance and scale_kind must be “var”. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.
round_to: int or str or None, optional: If integer, number of decimal places to round the result. Integers can be negative. If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.

Returns:

Namedtuple or array

See also

arviz_stats.residual_r2: Residual \(R^2\).
arviz_stats.loo_r2: LOO-adjusted \(R^2\).

References

[1]

Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.

[2]

Tjur, T. Coefficient of determination in logistic regression models-A new proposal: The coefficient of discrimination The American Statistician, 63(4) (2009). https://doi.org/10.1198/tast.2009.08210

Examples

Calculate Bayesian \(R^2\) for logistic regression:

In [1]: from arviz_stats import bayesian_r2
   ...: from arviz_base import load_arviz_data
   ...: data = load_arviz_data('anes')
   ...: bayesian_r2(data, pred_mean="p")
   ...: 
Out[1]: bayesian_R2(mean=0.49, eti_lb=0.43, eti_ub=0.55)

Calculate Bayesian \(R^2\) for circular regression. The posterior has the concentration parameter kappa (from the VonMises distribution). Thus we compute the circular variance as \(1 - I_1(\kappa) / I_0(\kappa)\),

In [2]: from scipy.special import i0, i1
   ...: data = load_arviz_data('periwinkles')
   ...: kappa = data.posterior['kappa']
   ...: data.posterior["variance"] = 1 - i1(kappa) / i0(kappa)
   ...: bayesian_r2(data, pred_mean='mu', scale='variance',
   ...:             scale_kind="var", circular=True)
   ...: 
Out[2]: bayesian_R2(mean=0.76, eti_lb=0.65, eti_ub=0.84)

arviz_stats.bayesian_r2

Contents

arviz_stats.bayesian_r2#