Getting Started

Caution

This is a work-in-progress.

The following block of code shows how to use PyStan with a model which studied coaching effects across eight schools (see Section 5.5 of Gelman et al (2003)). This hierarchical model is often called the “eight schools” model.

Every Stan model starts with Stan program code. Begin by assigning the program code to the variable schools_code.

import stan

schools_code = """
data {
  int<lower=0> J;         // number of schools
  array[J] real y;              // estimated treatment effects
  array[J] real<lower=0> sigma; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""

Like most Stan models, this model references observations. Assign the data to the dictionary schools_data.

schools_data = {"J": 8,
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}

Fitting a model using PyStan takes two steps. First we build the model using stan.build().

posterior = stan.build(schools_code, data=schools_data, random_seed=1)

This function returns an instance of stan.model.Model. (For reproducibility, we specify a random seed using the random_seed argument.) Building, in this context, involves converting the Stan program code into C++ code and then compiling that C++ code. This step may take some time.

Now we draw samples using the method stan.model.Model.sample(). By setting num_chains to 4, we will draw samples in parallel using four CPU cores.

fit = posterior.sample(num_chains=4, num_samples=1000)

This method returns an instance of stan.fit.Fit(). This instance holds everything produced by the Stan sampler. We can extract draws associated with a single variable using the familiar Python syntax.

eta = fit["eta"]  # array with shape (8, 4000)

Alternatively, we can extract all variables into a pandas DataFrame.

df = fit.to_frame()

Using the to_frame() method requires pandas. (Installing pystan will not install pandas.) Install pandas with python3 -m pip install pandas.