**StATS: Fitting a beta binomial model using BUGS (April 17, 2007)**

I've spent a bit of time trying to learn how to run a program called BUGS. The acronym stands for Bayes Using Gibbs Sampling. Here is my first serious attempt to run a BUGS program.

There are several web pages about this program, most notably The BUGS Project Welcome Page at

That page had links to several resources about BUGS, including a recently published book by George G. Woodworth, Biostatistics: A Bayesian Introduction. The author placed a PDF version of the Appendix describing how to use WinBUGS, the Windows version of BUGS on the web at

There is also a version of BUGS that works under R, called BRugs. A nice description of BRugs appears at the website devoted to the book by Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin, Bayesian Data Analysis."

In particular, the page

offers some very useful advice.

Dr. Woodworth starts with the simplest of possible examples, the beta-binomial model. To fit this, you need to prepare a model statement

`MODEL {`

p ~ dbeta(a,b)

x ~ dbin(p,n)

}This model specifies that the number of successes x is represented by a binomial distribution with probability of success p and number of trials n. It also specifies a prior distribution for p which is beta with parameters alpha and beta equal to a and b, respectively.

You also need a data statement

`DATA list(a=1,b=1,x=11,n=25)`

This data list provides information about the observed data. You collected data on 25 subjects and 11 of them were classified as success. It also provides specifications for the prior distribution. The values a=1 and b=1 produce a beta(1,1) distribution which is also known as a uniform distribution. This represents a "flat" or uninformative prior. Data analysts select a flat prior when they have little information prior to the start of the study about the true value of p or if they prefer to let the observed data provide most of the weight in the resulting posterior distribution.

The steps in running BUGS are

- check your syntax,
- load the data set,
- compile your model,
- generate initial values,
- monitor the nodes,
- update the model,
- exclude the early samples,
- calculate summary statistics.
Here are the eight steps in a BUGS model using the beta-binomial model..

Step 1. check your syntax. In WinBUGS, you need to open thedialog boxSpecification Toolby selecting

from the menu. Highlight any set of characters inside the curly braces of the MODEL statement and then click on theModel | Specificationbutton. In BRugs, you need to store the model as a text file and then call the function:check model

`checkModel("BBModel.txt")`

In either case, you should get a message that reads

`model is syntactically correct`

Step 2. load the data set. In WinBUGS, highlight the word "list" and click on thebutton. In BRugs, you need to store the data as a text file and then call the functionload data

`modelData("BBData.txt")`

In either case, you should get a message that reads

`data loaded`

Step 3. compile your model. In WinBUGS, click on thebutton. In BRugs, call the functioncompilemodelCompile()

You will see a message

model compiled

Step 4. Generate initial values. The simulation methods in BUGS require starting values. For simple models, BUGS can generate those starting values for you, but for more complex models, you have to provide starting values. The beta-binomial model is simple enough that BUGS can do this.In WinBUGs, click on the

button and in BRugs, call the functiongen inits

`modelGenInits()`

You will see a message

`initial values generated, model initialized`

Step 5. monitor the nodes.In WinBUGS, open the

dialog boxSample Monitor Toolby selecting

from the menu. Type "p" in theInference | Samplesfield and click on thenodebutton. In BRugs, call the functionset

`samplesSet("p")`

You will get the diagnostic message

`monitor set for variable 'p'`

Step 6. update the model. In WinBUGS, open thedialog boxUpdate Toolby selecting

from the menu. For this analysis, 5,000 is a reasonable value for theModel | Updatefield. Click on theupdatesbutton after you have made this change. In BRugs, call the functionupdate

`modelUpdate(5000)`

You will see the diagnostic message

`5000 updates took 0 s`

That was fast!

Step 7. exclude the early values. The Markov Chain is dependent on the initial starting value and sometimes takes a while to settle down to a process that reflects the true probability distribution (this is called a steady state). The early values in the Markov Chain may thus exhibit some artefactual patterns that can provide misleading results. You will often get better simulation results if you ignore the first portion of the chain. In WinBUGS, adjust thedialog box to exclude the first thousand samples by changing theSample Monitor Toolfield from 1 to 1000. In BRugs, call the functionbeg

`samplesSetBeg(1000)`

Step 8. calculate summary statistics. In WinBUGS, click on thebutton in thestatsdialog box. In BRugs, call the functionSample Monitor ToolsamplesStats("p")

This produces the output

`mean sd MC_error val2.5pc median val97.5pc start sample`

p 0.4421 0.09435 0.001237 0.2634 0.441 0.6286 1000 5000

What do these numbers mean? In this example, you observed x=11 successes in a sample of size n=25. The classical estimate of the probability of success would be 11/25=0.44. In the Bayesian example, your prior distribution is beta(1,1). A simple algebraic solution for the posterior distribution is easy enough to find (it is a beta distribution with alpha=12 and beta=15), but using WinBUGS or BRugs, you can also illustrate how a simulation could produce roughly the same answer.The value labeled "mean" represents the mean of the posterior distribution. The simulation estimates this value to be 0.4421. If you used the algebraic solution, you would get a posterior mean of 0.4444. The value labeled "sd" represents the standard deviation of the posterior distribution.

The value labeled "MC_error" is the uncertainty associated with this particular simulation. If this value is much smaller than the standard deviation of the posterior distribution, then your simulation has done well. If it is not at least an order of magnitude smaller, then you should consider changing the simulation (e.g., increasing the number of samples).

The values labeled "val2.5pc" and "val97.5pc" represent an interval that contains 95% of the posterior distribution. This interval is called the Credible Interval (CrI) and represents the Bayesian alternative to a confidence interval. The simulated values 0.2634 and 0.6286 show that there is a fair amount of uncertainty about the true proportion p, which is hardly surprising given the small sample size. These values for the CrI computed by the simulation compare nicely with the exact values from the algebraic solution (0.2659 and 0.6308).

The value labeled "median" represents the median of the posterior distribution. In this example, the posterior distribution is almost symmetric, so this value differs only slightly from the mean of the posterior distribution. The simulated value of the median (0.4410) and the exact value from the algebraic solution (0.4431) are very close.

The values labeled "start" and "sample" simply remind you what you set for the beginning and end of the Markvo chain simulation

Combining these steps. In BRugs, you can combine these steps by using the function BRugsFit. Here's an example:

`BBOutput <- BRugsFit(modelFile="BBModel.txt",data="BBData.txt",`

numChains=1,parametersToSave="p",nBurnin=1000,nIter=5000)

The diagnostic messages produced by this function are shown below:

`model is syntactically correct`

data loaded

model compiled

initial values generated, model initialized

1000 updates took 0 s

deviance set

monitor set for variable 'p'

4000 updates took 0 sThe output from this function is stored in BBOutput. When you list BBOutput, you get the following information:

`> BBOutput`

$Stats

mean sd MC_error val2.5pc median val97.5pc start sample

p 0.4428 0.09491 0.001376 0.2626 0.4416 0.6296 1001 5000

$DIC

Dbar Dhat DIC pD

x 4.641 3.677 5.605 0.9639

total 4.641 3.677 5.605 0.9639Notice that the output here is slightly different than the output shown above. The mean, for example, is 0.4428 rather than 0.4421. This should not be too surprising. The BUGS software computes estimates using a simulation, and these change each time you run the program. The difference between the two estimates is about the same order of magnitude as the Monte Carlo error.

Also notice that the BRgusFit function, by default, includes an additional set of statistics known as the Deviance Information Criterion. This statistic is useful for comparing models of differing complexity. There is no way to interpret this value for a single model in isolation.

Further steps.There are additional graphical and statistical summaries available in both WinBUGS and BRugs. There are also diagnostic tools to help you determine whether the markov chain has settled down sufficiently to provide good estimates.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Bayesian statistics.