arviz_stats.residual_r2

Contents

arviz_stats.residual_r2#

arviz_stats.residual_r2(data, pred_mean=None, obs_name=None, summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#

Residual \(R^2\) for Bayesian regression models.

The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model. For details of the residual \(R^2\) see [1].

Briefly, it is defined as:

\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]

where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the residual variance.

\[\mathrm{Var}_{\mathrm{res}}^s = V_{n=1}^N \hat{e}_n^s,\]

where \(\hat{e}_n^s=y_n-\hat{y}_n^s\) are the residuals for observation \(n\) in posterior sample \(s\).

The residual \(R^2\) differs from the Bayesian \(R^2\) in that it computes residual variance from the observed data, while for the Bayesian \(R^2\) all variance terms come from the model, and not directly from the data.

Parameters:
dataxarray.DataTree or InferenceData

Input data. It should contain the posterior_predictive and observed_data groups.

pred_meanstr

Name of the variable representing the predicted mean.

obs_namestr, optional

Name of the variable representing the observed data.

summary: bool

Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval

groupstr, optional

Group from which to obtain the predicted means (pred_name). Defaults to “posterior”.

point_estimate: str

The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.

ci_kind: str

The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.

ci_prob: float

The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.

circular: bool

Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - Var_{\mathrm{res}}\). where \(Var_{\mathrm{res}}\) is computed using the circular variance which goes from 0 to 1. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.

round_to: int or str or None, optional
If integer, number of decimal places to round the result. Integers can be negative.

If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.

Returns:
Namedtuple or array

See also

arviz_stats.bayesian_r2

Bayesian \(R^2\).

arviz_stats.loo_r2

LOO-adjusted \(R^2\).

References

[1]

Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.

Examples

Calculate residual \(R^2\) for Bayesian logistic regression:

In [1]: from arviz_stats import residual_r2
   ...: from arviz_base import load_arviz_data
   ...: data = load_arviz_data('anes')
   ...: residual_r2(data, pred_mean='p', obs_name='vote')
   ...: 
Out[1]: residual_R2(mean=0.49, eti_lb=0.45, eti_ub=0.52)

Calculate residual \(R^2\) for Bayesian circular regression:

In [2]: data = load_arviz_data('periwinkles')
   ...: residual_r2(data, pred_mean='mu', obs_name='direction', circular=True)
   ...: 
Out[2]: residual_R2(mean=0.81, eti_lb=0.79, eti_ub=0.82)