P.Mean >> Category >>  Linear regression (created 2007-06-13).

The linear regression model provides a framework for quantitative predictions of an outcome variable that is continuous, using one or more predictor variables. Also see Analysis of variance, Covariate adjustment, Logistic regression, Mixed models, Modeling issues, Nonlinear regression, and Poisson regression. Articles are arranged by date with the most recent entries at the top. You can find outside resources at the bottom of this page.


24. P.Mean: Making predictions based on just the correlation (created 2012-03-07). Dear Professor Mean, I have a math question. If the correlation, r, between two measurements is 0.1462, and I have one measurement can I calculate the other? I know it probably won't be accurate but can I get a rough approximation?


23. The Monthly Mean: Understanding multivariate regression models, Part 1 (December 2011)

22. What is a calibration curve? (March/April 2011)


21. P.Mean: Why the least squares regression line has to pass through XBAR, YBAR (created 2010-10-01). An issue came up about whether the least squares regression line has to pass through the point (XBAR,YBAR), where the terms XBAR and YBAR represent the arithmetic mean of the independent and dependent variables, respectively. The line does have to pass through those two points and it is easy to show why.

20. The Monthly Mean: Understanding the interaction of two continuous variables (May/June 2010)


19. P.Mean: The controversy over standardized beta coefficients (created 2009-09-12). I have a client who is working on her dissertation. I always warn people working on dissertations or theses that they should listen more to what their committee members say about statistics than what I say about statistics. If the committee loves the statistical analysis and I hate it, you still get your degree. If I love the statistical analysis and the committee hates it, you get nothing. For this client, a committee member asked if she could produce standardized beta coefficients in her regression models. I helped her write an argument as to why the unstandardized coefficients are better, but the committee member gave a reasonable counter-argument, so there was no point in persisting. Still, it would be helpful here to outline some of the controversy over standardized beta coefficients.


18. The Monthly Mean: Elbow regression. (November 2008). I was asked to look at some data that involved monitoring glucose and potassium levels before, during, and after a special infusion. You would expect, perhaps, that there would be a flat trend before, and upward or downward trend (possibly linear, possibly not) during administration, and a different trend (possibly linear, possibly not) after infusion. There's a simple regression model for this, which is sometimes called a piecewise linear regression, segmented regression, join point regression, or elbow regression.

17. P.Mean: How do I fit a piecewise linear regression (created 2008-10-07). I was asked to look at some data that involved monitoring glucose and potassium levels before, during, and after a special infusion. You would expect, perhaps, that there would be a flat trend before, and upward or downward trend (possibly linear, possibly not) during administration, and a different trend (possibly linear, possibly not) after infusion. There's a simple regression model for this, which is sometimes called a piecewise linear regression, segmented regression, join point regression, or elbow regression.

16. P.Mean: Pearson correlation and ordinal data don't mix (created 2008-07-11). Dear Professor Mean, I feel uncomfortable using a Pearson correlation coefficient for two variables that are measured on an ordinal scale (for example, 1=unaware, 2=aware, 3=fairly aware, 4=moderately aware, 5=very aware). But I can't explain why I am uncomfortable with this. Can you help?

Outside resources:

Applied Regression Analysis Third Edition. Norman R. Draper, Harry Smith (1998) New York: John Wiley & Sons, Inc. Description: Draper and Smith's book is the most comprehensive guide to regression that I know of. If you can't find it in Draper and Smith, it isn't important. This book is for students who want more mathematical details.

Four assumptions of multiple regression that researchers should always test. Osborne, Jason & Elaine Waters (2002). Practical Assessment, Research & Evaluation, 8(2). Retrieved October 20, 2008 from PAREonline.net/getvn.asp?v=8&n=2. Excerpt: Most statistical tests rely upon certain assumptions about the variables used in the analysis. When these assumptions are not met the results may not be trustworthy, resulting in a Type I or Type II error, or over- or under-estimation of significance or effect size(s). As Pedhazur (1997, p. 33) notes, "Knowledge and understanding of the situations when violations of assumptions lead to serious biases, and when they are of little consequence, are essential to meaningful data analysis". However, as Osborne, Christensen, and Gunter (2001) observe, few articles report having tested assumptions of the statistical tests they rely on for drawing their conclusions. This creates a situation where we have a rich literature in education and social science, but we are forced to call into question the validity of many of these results, conclusions, and assertions, as we have no idea whether the assumptions of the statistical tests were met. Our goal for this paper is to present a discussion of the assumptions of multiple regression tailored toward the practicing researcher.

Residuals and Influence in Regression. R. Dennis Cook and Sanford Weisberg. Excerpt: The book Residuals and Influence in Regression, by R. Dennis Cook and Sanford Weisberg, was published in 1982 by Chapman & Hall, ISBN 041224280X. This book is out of print, and the copyright to the book has been returned to the authors. As the copyright holders, the authors have decided to make the book available for free on this website. You can get a pdf copy of the book, provided that the downloaded file not be sold. URL: www.stat.umn.edu/rir

Raymond J. Carroll, David Ruppert. Transformation and Weighting in Regression. 1st ed. Chapman and Hall/CRC; 1988. Description: This is a bit dated, but it has some interesting ideas, like transforming both sides of the equation to fix heteroscedascity while still maintaining linearity. Excerpt: "This monograph provides a careful review of the major statistical techniques used to analyze regression data with nonconstant variability and skewness. The authors have developed statistical techniques--such as formal fitting methods and less formal graphical techniques-- that can be applied to many problems across a range of disciplines, including pharmacokinetics, econometrics, biochemical assays, and fisheries research. While the main focus of the book in on data transformation and weighting, it also draws upon ideas from diverse fields such as influence diagnostics, robustness, bootstrapping, nonparametric data smoothing, quasi-likelihood methods, errors-in-variables, and random coefficients. The authors discuss the computation of estimates and give numerous examples using real data. The book also includes an extensive treatment of estimating variance functions in regression."

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2011-01-01. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


15. Stats: What statistic should I use when? (January 4, 2008). Someone was asking about a multiple choice question on a test that reads something like this: A group of researchers investigating in patients with diabetes on the basis of demographic characteristics and the level of diabetic control. Select the most appropriate statistical method to use in analyzing the data: a t-test, ANOVA, multiple linear regression, or a chi-square test. This is one of the more vexing things that people face--what statistic should I use when.


14. Stats: Accounting for cyclical trends in a regression model (June 13, 2007). One of the doctors brought by a data set that showed the average volume of business (number of beds filled) in a month for 28 consecutive months starting in January 2005. The number of beds filled is highest in the wintertime and lowest in the summertime. Also there is slight upward trend over time. If you were trying to estimate the magnitude of this slight upward trend, you would need to account for the cyclical pattern as well. A simple way to estimate a cyclical pattern is to use a bit of trigonometry.

13. Stats: Tests of hypothesis and confidence intervals involving a correlation coefficient (January 18, 2007). Suppose you compute a correlation coefficient from a sample of patients. Can you test a hypothesis about this correlation? Can you place confidence limits around this correlation? Yes, you can, but there are a wide array of approaches that you could use.


12. Stats: Fitting a quadratic regression model (November 15, 2006). Someone came in asking about how to examine for non-linear relationships among variables. In particular, they wanted to look for a U-shaped pattern where a little bit of something was better than nothing at all, but too much of it might backfire and be as bad as nothing at all. The simplest way, but not necessarily the best way, to examine for a nonlinear relationship is to fit a quadratic model, but when I told this person about quadratic regression, I just got a blank stare.

11. Stats: An amusing correlation (June 5, 2006). I always like simple amusing examples that illustrate an important statistical point. An email by JW on EDSTAT-L offer a couple of examples.

10. Stats: Interpretation of the correlation coefficient (April 4, 2006). There are many "rules of thumb" about how to interpret a correlation coefficient. They vary slightly from one to another, but all say about the same thing. Here's a couple of interpretations I found on the web today:

9. Stats: Can you use the coefficient of determination for categorical variables (April 4, 2006). Dear Professor Mean, How can you compute a coefficient of determination (R squared) for a model that has a dichotomous variable? I thought that you could only compute this in a linear regression model?

8. Stats: What is a beta coefficient? (April 4, 2006). When you are examining the relative impact of several independent variables on an outcome variable, the estimated slopes may be deceptive. A variable with a wide range might have a very flat slope compared to a variable with a large range, even though the former may be a much more powerful predictor. You can see this intuitively by drawing a graph with a large aspect ratio (much wider than it is tall) and comparing it with the same graph with a smaller aspect ratio (closer to square). The slope looks so much bigger in the square graph, but nothing has fundamentally changed. The statistics community has developed "beta coefficients" that are the regression coefficients using a standardized variables. When you standardize, you allow for a "fair" comparison of the predictive power of variables measured on disparate ranges or even expressed in noncomparable units of measurement.

7. Stats: Economic evaluations (February 2, 2006). Several years ago, BMJ had a whole series of articles on economic evaluations. I saved the references at the time, and am just now getting back to review them. There are a lot of important lessons in these articles, and like all articles in BMJ (except for their most recent 12 months of publications), the full free text is available on the web.


6. Stats: Interpreting linear regression coefficients (June 24, 2002). In linear regression, we use a straight linear to estimate a trend in data. We can't always draw a straight line that passes through every data point, but we can find a line that "comes close" to most of the data. This line is an estimate, and we interpret the slope and the intercept of this line as follows.

5. Stats: Exploring interactions in a linear regression model (August 1, 2002). Dear Professor Mean, I have a model with two factors. When I ran the model, it showed a significant interaction between the two factors. What do I do now? --Troubled Trudy

4. Stats: SPSS dialog boxes for linear model examples (June 21, 2002). This handout will show the SPSS dialog boxes that I used to create linar regression examples. I will capitalize variable names, field names and menu picks for clarity.


3. Stats: Regression to the mean (January 27, 2000). Dear Professor Mean: In a stat course, I was introduced to the term "regression to the mean". Today we administered a pretest to 4th graders. In February we will test again, with the same exam, to see "how much they've learned". I explained to the principal that, of course they would do better, no matter how well they were taught, that this was a classic case of regression to the mean. Am I correct, close, or way off on this?


2. Stats: Guidelines for linear regression models (September 21, 1999). Linear regression models provide a good way to examine how various factors influence a continuous outcome measure. There are three steps in a typical linear regression analysis. 1. Fit a crude model, 2. Fit an adjusted model, 3. Analyze predicted values and residuals. These steps may not be appropriate for every linear regression analysis, but they do serve as a general guideline. In this presentation, you will see these steps applied to data from a breast feeding study, using SPSS software.

1. Stats: R-squared (August 18, 1999). Dear Professor Mean, On my TI-83, when calculating quadratic regression, there is a number that is found called R-squared (R^2). I understand that this is the coefficient of determination. But....I thought that R^2 had to do with linear models. What is R^2 finding for this quadratic regression? what does this number mean? is there a way to find R^2 through a "pencil and paper" process?? I know the equation for R^2 for a linear regression. But its the quadratic I need to know about. please, anyone, help!!

Theme and closely related categories:


What now?

Browse other categories at this site

Browse through the most recent entries

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2011-01-01.