P.Mean >> Category >> Mixed linear regression models (created 2007-07-03).

Mixed linear regression models, also known as random coefficient models extend the simple linear regression model to cases where you have to characterize variation between patients and within patients. Articles are arranged by date with the most recent entries at the top. Also see Category: Analysis of variance and Category: Linear regression.


10. P.Mean: A simple example of a mixed logistic regression (created 2010-10-12). I am working on a project that will require the use of mixed linear and mixed logistic regression models. I thought I should spend some time working with the latter models to familiarize myself with how they work.


9. P.Mean: Accounting for clusters in an individually randomized clinical trial (created 2009-10-13). I have a clinical trial with clusters (the clusters are medical practice), but unlike a cluster randomized trial, I am able to randomize within each cluster. From what I've read about this, I can provide an estimate for the Intraclass Correlation Coefficient (ICC) that will decrease my sample size. But I'm uncomfortable doing this. Can you help?

8. The Monthly Mean: Generalized Estimating Equations (March/April 2009)


7. P.Mean: Comparing pre and post data with a parallel control group (created 2008-09-25). I am retrospectively comparing pre and post treatment heart rates for two different populations. I was going to use a paired t-test for comparison within each population. Can I still use an independent t-test for comparison of the post treatment differences between the two populations? If not, what would be the most appropriate test?

Outside resources:

Peter Diggle. Analysis of longitudinal data. 2nd ed. New York: Oxford University Press; 2002. Description: "Diggle, Liang, and Zeger's book provides an excellent overview of methods for longitudinal models which are the source of some of the greatest complexity in Statistics today. These authors, who have pioneered some of the most important work in this area, lay out both theoretical and practical information about analysis of longitudinal data. This book is for students who want more mathematical details."

Hilary Browne. Centre for Multilevel Modelling (CMM). Excerpt: "The Centre for Multilevel Modelling (CMM) is a research centre based at the University of Bristol within the Graduate School of Education, the School of Geographical Sciences and the Department of Clinical Veterinary Science and forming part of the The Bristol Institute of Public Affairs (BIPA)" [Accessed December 5, 2009]. Available at: http://www.cmm.bristol.ac.uk/.

Michael Proschan, Dean Follmann. Cluster without fluster: The effect of correlated outcomes on inference in randomized clinical trials. Statistics in Medicine. 2008;27(6):795-809. Abstract: "Inference for randomized clinical trials is generally based on the assumption that outcomes are independently and identically distributed under the null hypothesis. In some trials, particularly in infectious disease, outcomes may be correlated. This may be known in advance (e.g. allowing randomization of family members) or completely unplanned (e.g. sexual sharing among randomized participants). There is particular concern when the form of the correlation is essentially unknowable, in which case we cannot take advantage of the correlation to construct a more efficient test. Instead, we can only investigate the impact of potential correlation on the independent-samples test statistic. Randomization tends to balance out treatment and control assignments within clusters, so it is logical that performance of tests averaged over all possible randomization assignments would be essentially unaffected by arbitrary correlation. We confirm this intuition by showing that a permutation test controls the type 1 error rate in a certain averagesense whenever the clustering is independent of treatment assignment. It is nonetheless possible to obtain a lsquobadrsquo randomization such that members of a cluster tend to be assigned to the same treatment. Conditioned on such a bad randomization, the type 1 error rate is increased. Published in 2007 by John Wiley & Sons, Ltd." [Accessed December 5, 2009]. Available at: http://dx.doi.org/10.1002/sim.2977.

Journal article: Baoyue Li, Hester Lingsma, Ewout Steyerberg, Emmanuel Lesaffre. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes BMC Medical Research Methodology. 2011;11(1):77. Abstract: "BACKGROUND: Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. METHODS: We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. RESULTS: The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. CONCLUSIONS: On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain." [Accessed on May 24, 2011]. http://www.biomedcentral.com/1471-2288/11/77.

UCLA Academic Technology Services. SPSS Paper Examples: Using SAS Proc Mixed to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models. Description: "This website shows how to use SPSS to match analysis in SAS in the paper 'Using SAS Proc Mixed to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models' by Judith Singer" [Accessed December 5, 2009]. Available at: http://www.ats.ucla.edu/stat/spss/paperexamples/singer/default.htm.

Judith D. Singer. Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models. Journal of Educational and Behavioral Statistics. 1998;23(4):323-355. Abstract: "SAS PROC MIXED is a flexible program suitable for fitting multilevel models, hierarchical linear models, and individual growth models. Its position as an integrated program within the SAS statistical package makes it an ideal choice for empirical researchers and applied statisticians seeking to do data reduction, management, and analysis within a single statistical package. Because the program was developed from the perspective of a "mixed" statistical model with both random and fixed effects, its syntax and programming logic may appear unfamiliar to users in education and the social and behavioral sciences who tend to express these models as multilevel or hierarchical models. The purpose of this paper is to help users familiar with fitting multilevel models using other statistical packages (e.g., HLM, MLwiN, MIXREG) add SAS PROC MIXED to their array of analytic options. The paper is written as a step-by-step tutorial that shows how to fit the two most common multilevel models: (a) school effects models, designed for data on individuals nested within naturally occurring hierarchies (e.g., students within classes); and (b) individual growth models, designed for exploring longitudinal data (on individuals) over time. The conclusion discusses how these ideas can be extended straighforwardly to the case of three level models. An appendix presents general strategies for working with multilevel data in SAS and for creating data sets at several levels." [Accessed December 5, 2009]. Available at: http://gseweb.harvard.edu/%7Efaculty/singer/Papers/Using%20Proc%20Mixed.pdf.

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-10-12. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


6. Stats: Simplifying repeated measurements (March 12, 2008). I received an email inquiry about a project that involved four repeat assessments on 10 different subjects. The question started out as, is my sample size 10 or is it 40?


5. Stats: The complexities of having a variable number of measures per patient (November 16, 2006). A series of messages on the MedStats email discussion group emphasized the difficulty in analyzing data where subjects contribute a variable number of measurements to the data set. If there is a relationship between the prognosis and the frequency of measurement, then you might produce some serious biases.

4. Stats: A simple example of a mixed linear regression model (October 18, 2006). I want to illustrate how to run a simple mixed linear regression model in SPSS. I will use some data on the plasma protein levels of turtles at baseline, after fasting 10 days, and after fasting 20 days.

3. Stats: (Seminar notes) Issues in the analysis of mixed linear models (July 17, 2006). The keynote address at the 18th Annual Applied Statistics in Agriculture Conference, sponsored by Kansas State University was "Random Observations with Mixed Feelings", given by Oliver Schabenberger, SAS Institute Inc. The original title was "Estimating Gene Expression Profiles Using All Available Information." Here are my notes from that seminar.


2. Stats: Profile analysis and MANOVA (April 18, 2005). Someone asked me about profile analysis as alternative analysis to MANOVA (Multivariate Analysis of Variance). Typically you would use profile analysis when the outcome variables are measuring (more or less) the same thing, but possibly at different times or in different ways.

No date

1. Stats: Longitudinal data models (no date). Longitudinal data are data where each patient is observed on multiple occasions over time. Analysis of longitudinal data are challenging because measurements on the same subject are correlated. Another way to think about this is that two measurements on the same subject will have less variation than two measurements on different subjects.

What now?

Browse other categories at this site

Browse through the most recent entries

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-10-12.