Simulation Based Calibration for rstan/cmdstanr models • SBC

SBC provides tools to validate your Bayesian model and/or a sampling algorithm via the self-recovering property of Bayesian models. This package lets you run SBC easily and perform postprocessing and visualisations of the results to assess computational faithfulness.

Installation

To install the development version of SBC, run

devtools::install_github("hyunjimoon/SBC")

Quick tour

To use SBC, you need a piece of code that generates simulated data that should match your model (a generator) and a statistical model + algorithm + algorithm parameters that can fit the model to data (a backend). SBC then lets you discover when the backend and generator don’t encode the same data generating process (up to certain limitations).

For a quick example, we’ll use a simple generator producing normally-distributed data (basically y <- rnorm(N, mu, sigma)) with a backend in Stan that mismatches the generator by wrongly assuming Stan parametrizes the normal distribution via variance (i.e. it has y ~ normal(mu, sigma ^ 2)).

library(SBC)
gen <- SBC_example_generator("normal")
# interface = "cmdstanr" is also supported
backend_var <- SBC_example_backend("normal_var", interface = "rstan")

You can use SBC_print_example_model("normal_var") to inspect the model used.

We generate 50 simulated datasets and perform SBC:

ds <- generate_datasets(gen, n_sims = 50)
results_var <- compute_SBC(ds, backend_var)

The results then give us diagnostic plots that immediately show a problem: the distribution of SBC ranks is not uniform as witnessed by both the rank histogram and the difference between sample ECDF and the expected deviations from theoretical CDF.

plot_rank_hist(results_var)
plot_ecdf_diff(results_var)

We can then run SBC with a backend that uses the correct parametrization (i.e. with y ~ normal(mu, sigma)):

backend_sd <- SBC_example_backend("normal_sd", interface = "rstan")
results_sd <- compute_SBC(ds, backend_sd)

plot_rank_hist(results_sd)
plot_ecdf_diff(results_sd)

The diagnostic plots show no problems in this case. As with any other software test, we can observe clear failures, but absence of failures does not imply correctness. We can however make the SBC check more thorough by using a lot of simulations and including suitable generated quantities to guard against known limitations of vanilla SBC.

More information

The package vignettes provide additional context and examples. Notably:

The main vignette has more theoretical background and instructions how to integrate your own simulation code and models with SBC.
Small model workflow discusses how SBC integrates with model implementation workflow and how you can use SBC to safely develop complex models step-by-step.

Currently SBC supports cmdstanr, rstan, and brms models out of the box. With a little additional work, you can integrate SBC with any exact or approximate fitting method as shown in the Implementing backends vignette.

References:

Theoretical support * Validating Bayesian Inference Algorithms with Simulation-Based Calibration Talts, Betancourt, Simpson, Vehtari, Gelman, 2018 * Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison Säilynoja, Bürkner, Vehtari, 2021 * Bayesian Workflow, Gelman et al., 2020 * Toward a principled Bayesian workflow in cognitive science Schad, Betancourt, Vasishth, 2021 * Bayes factor workflow Schad, Nicenboim, Bürkner, Betancourt, Vasishth, 2021

Application support * Cognitive science, response time fitting * Bioinformatics, effect of mutation prediction * Earth science, earthquake prediction * Sequential Neural Likelihood

Vignette * ECDF with codes (new implementation by Teemu Säilynoja will be available in bayesplot and SBC package soon)

FAQ

How does calibration relate to prediction accuracy?

Comparing the ground truth and the simulated result is a backbone of calibration and comparison target greatly affects the calibrated (i.e. trained) result, similar to reward in reinforcement learning. In this sense, if the U(a(y), theta) term is designed for prediction, the model will be calibrated to have best predictive result as possible.

Simulation-based Calibration: SBC

Installation

Quick tour

More information

References:

FAQ

Links

License

Citation

Developers