arviz_stats.residual_r2

arviz_stats.residual_r2#

arviz_stats.residual_r2(data, pred_mean=None, obs_name=None, summary=True, group='posterior', point_estimate=None, ci_kind=None, ci_prob=None, circular=False, round_to=None)[source]#

Residual \(R^2\) for Bayesian regression models.

The \(R^2\), or coefficient of determination, is defined as the proportion of variance in the data that is explained by the model. For details of the residual \(R^2\) see [1].

Briefly, it is defined as:

\[R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu} + \mathrm{Var}_{\mathrm{res}}}\]

where \(\mathrm{Var}_{\mu}\) is the variance of the predicted means, and \(\mathrm{Var}_{\mathrm{res}}\) is the residual variance.

\[\mathrm{Var}_{\mathrm{res}}^s = V_{n=1}^N \hat{e}_n^s,\]

where \(\hat{e}_n^s=y_n-\hat{y}_n^s\) are the residuals for observation \(n\) in posterior sample \(s\).

The residual \(R^2\) differs from the Bayesian \(R^2\) in that it computes residual variance from the observed data, while for the Bayesian \(R^2\) all variance terms come from the model, and not directly from the data.

Parameters:

dataxarray.DataTree or InferenceData

Input data. It should contain the posterior_predictive and observed_data groups.

pred_meanstr

Name of the variable representing the predicted mean.

obs_namestr, optional

Name of the variable representing the observed data.

summary: bool

Whether to return a summary (default) or an array of \(R^2\) samples. The summary is a named tuple with a point estimate and a credible interval

groupstr, optional

Group from which to obtain the predicted means (pred_name). Defaults to “posterior”.

point_estimate: str

The point estimate to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.point_estimate”]. Ignored if summary is False.

ci_kind: str

The kind of credible interval to compute. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_kind”]. Ignored if summary is False.

ci_prob: float

The probability for the credible interval. If None, the default value is used. Defaults values are defined in rcParams[“stats.ci_prob”]. Ignored if summary is False.

circular: bool

Whether to compute the residual \(R^2\) for circular data. Defaults to False. It’s assumed that the circular data is in radians and ranges from -π to π. \(R^2 = 1 - Var_{\mathrm{res}}\). where \(Var_{\mathrm{res}}\) is computed using the circular variance which goes from 0 to 1. We avoid using the term math::mathrm{Var}_{mu}, because as the dispersion of the circular data increases the dispersion of the mean also increase so even for a model that does not explain any of the data \(R^2\) can be much higher than 0.

round_to: int or str or None, optional

If integer, number of decimal places to round the result. Integers can be negative.: If string of the form ‘2g’ number of significant digits to round the result. Defaults to rcParams[“stats.round_to”] if None. Use the string “None” or “none” to return raw numbers.

Returns:

Namedtuple or array

See also

arviz_stats.bayesian_r2: Bayesian \(R^2\).
arviz_stats.loo_r2: LOO-adjusted \(R^2\).

References

[1]

Gelman et al. R-squared for Bayesian regression models. The American Statistician. 73(3) (2019). https://doi.org/10.1080/00031305.2018.1549100 preprint http://www.stat.columbia.edu/~gelman/research/published/bayes_R2_v3.pdf.

Examples

Calculate residual \(R^2\) for Bayesian logistic regression:

In [1]: from arviz_stats import residual_r2
   ...: from arviz_base import load_arviz_data
   ...: data = load_arviz_data('anes')
   ...: residual_r2(data, pred_mean='p', obs_name='vote')
   ...: 
Out[1]: residual_R2(mean=0.49, eti_lb=0.45, eti_ub=0.52)

Calculate residual \(R^2\) for Bayesian circular regression:

In [2]: data = load_arviz_data('periwinkles')
   ...: residual_r2(data, pred_mean='mu', obs_name='direction', circular=True)
   ...: 
Out[2]: residual_R2(mean=0.81, eti_lb=0.79, eti_ub=0.82)

arviz_stats.residual_r2

Contents

arviz_stats.residual_r2#