[No previous issue] [Next issue]
Monthly Mean newsletter, November 2008
You are viewing the webpage version of the Monthly Mean newsletter for November 2008. This newsletter was sent out on November 5, 2008.
The monthly mean for November is 15.5.
Welcome to the Monthly Mean newsletter for November 2008. If you are having trouble reading this newsletter in your email system, please go to www.pmean.com/news/2008-11.html. If you are not yet subscribed to this newsletter, you can sign on at www.pmean.com/news. If you no longer wish to receive this newsletter (what? it's just the first one!) there is a link to unsubscribe at the bottom of this email. Here's a list of topics.
- Monthly Mean Quote.
- Can I ask you a question?
- Elbow regression.
- Private conflicts of interest.
- What's the difference between regression and ANOVA?
- Monthly Mean Article: The ADVANTAGE seeding trial: a review of internal documents, K. P. Hill et al.
- Monthly Mean Blog: Statistical Modeling, Causal Inference, and Social Science, Andrew Gelman
- Monthly Mean Book: Regression Modeling Strategies, Frank Harrell
- Monthly Mean Website: Textbook examples, UCLA Academic Technology Services.
- Monthly Mean Wikipedia entry: Histogram.
- Nick News: Nicholas the swimmer.
- Very bad joke: A physicist, chemist, and statistician.
- Tell me what you think.
1. Monthly Mean Quote.
"Evidence, which we have means to strengthen for or against a proposition, is our proper means for attaining truth." Florence Nightingale as quoted in www.causeweb.org/cwis/SPT--FullRecord.php?ResourceId=1836.
2. Can I ask you a question?
For several years now, I have enjoyed reading your webpage and have recently purchased your text- thank for your efforts in helping those of us attempting to learn biostatistics. Iíve been struggling with a design/analysis question related to repeated measures design and power analysis. Iím not sure if you are currently accepting questions of this nature, and thought I would check to see if this would be OK. I certainly understand if this is something that you would rather not involve yourself with.
I do appreciate the compliments. I'm starting a new career as an independent statistical consultant, so I am certainly glad to answer questions for anyone who has funds available.
Most of the people who write to me, though, do not have a lot of cash at hand that they can give to a consultant. I've always had the policy of providing quick answers if at all possible. For many questions, I can do this in 15 minutes or less. It's not going to be the optimal response because of time constraints and because email is not a good way to answer complex questions. Any advice I give should not be taken as a substitute for a face-to-face consultation with a professional statistician. Still, a quick answer can sometimes be of great help to someone.
I talk more about how I use these questions on my website and what to do if you don't get an answer to your email question at www.pmean.com/08/CanIAsk.html.
Did you like this article? Visit http://www.pmean.com/category/ProfessionalDetails.html for related links and pages.
3. Elbow regression.
I was asked to look at some data that involved monitoring glucose and potassium levels before, during, and after a special infusion. You would expect, perhaps, that there would be a flat trend before, and upward or downward trend (possibly linear, possibly not) during administration, and a different trend (possibly linear, possibly not) after infusion. There's a simple regression model for this, which is sometimes called a piecewise linear regression, segmented regression, join point regression, or elbow regression.
Fitting these models is fairly easy if there is one transition point (break point, join point, change point) and it is specified in advance. Multiple change points make the problem more tedious, but not substantially harder. If you know that there will be a change, but you don't know when, then the problem is substantially harder. Let me outline the simplest case, a linear regression model where the join point is specified in advance.
First, you need to adjust the independent variable so that the join point occurs at zero. So if you know that the join point is at 20, subtract 20 from all the values of the independent variable.
Then compute an indicator variable which is equal to one if the shifted independent variable is positive and zero otherwise. Then fit a model with the shifted independent variable, the indicator, and an interaction between the independent variable and the indicator.
Here's an example of an elbow regression model.
Notice that this regression line has a slight downward trend for the first six years and a sharper downward trend afterwards. I explain how to compute this equation and how to interpret the estimated coefficients. I also show some interesting variants on this model, such as equations that show a discontinuous drop
or that constrain the slope on the left hand side to be perfectly flat.
More details about elbow regression can be found at www.pmean.com/08/PiecewiseLinear.html.
Did you like this article? Visit http://www.pmean.com/category/LinearRegression.html for related links and pages.
4. Private conflicts of interest.
The open source journal PLoS Medicine has an interesting editorial that is worth commenting on.
- Making Sense of Non-Financial Competing Interests. The PLoS Medicine Editors. PLoS Med 5(9): e199 doi:10.1371/journal.pmed.0050199. [Full text] [PDF]
While the article includes the words "non-financial" in the title, the authors repeatedly refer to private issues.
Imagine you're a peer reviewer who's received a request to referee a paper. The paper reports the results of a study using cell lines derived from an aborted fetus as a diagnostic tool in identifying certain viral infections. You are also a member of a religious organization morally opposed to fetal cell research. In your review, you raise questions about the study's validity and methodology that might undermine the paper's chance of publication.
Imagine you're an editor and you receive a paper from the scientist who supervised your postdoctoral fellowship. It's been a couple of years since you left his lab, but he has supported your career and you have warm feelings toward him; plus you still join your former lab mates occasionally at their monthly pub night. You select sympathetic reviewers and you fight hard for the paper at the editorial meeting.
In my opinion, this article exaggerates the effects of non-financial conflicts of interest. In fact, I worry that broadening the definition of conflict of interest to include non-financial conflicts is sometimes an attempt to silence legitimate critics. For example, an FDA reviewer, Curt Furberg, was asked to step down from an FDA panel reviewing safety issues about COX-2 inhibitors after he made some public statements critical of a particular drug in this class, VIOXX. The accusation was that he had an "intellectual conflict of interest" and had pre-judged the issue. At the same time, FDA kept on the panel other members who had financial ties to various pharmaceutical companies who market COX-2 inhibitors.
I discuss other examples where non-financial conflicts of interest are raised in an attempt to stifle open debate and discuss financial conflicts that involve a commercial entity versus financial conflicts involving government agencies at www.pmean.com/08/PrivateConflicts.html.
Did you like this article? Visit http://www.pmean.com/category/ConflictOfInterest.html for related links and pages.
5. What's the difference between regression and ANOVA?
Someone asked me to explain the difference between regression and ANOVA. That's challenging because regression and ANOVA are like the flip sides of the same coin. They are different, but they have more in common that you might think at first glance.
A very simple explanation is that regression is the statistical model that you use to predict a continuous outcome on the basis of one or more continuous predictor variables. In contrast, ANOVA is the statistical model that you use to predict a continuous outcome on the basis of one or more categorical predictor variables. Most people will carve out one big exception to the "one or more categorical variables" statement. If you have a single categorical variable, and it only has two levels (in other words, a binary category), then most people would describe the method/approach as a two-sample t-test. A single categorical predictor with three or more levels or two plus categorical predictor variables with any number of levels would be considered an ANOVA model.
So if you're trying to predict the duration of breastfeeding in weeks using mother's age as a predictor variable, then you would use a regression model. If you are trying to predict the duration of breastfeeding in weeks using mother's marital status (single, married, divorced, widowed), the you would use an ANOVA model. If you are trying to predict the duration of breastfeeding in weeks using prenatal smoking status (smoked during pregnancy, did not smoke during pregnancy), then you would use a two-sample t-test. If you added delivery type (vaginal/c-section) to prenatal smoking status, then the two binary predictor variables would be analyzed using an ANOVA model.
There are many similarities, however, between regression and ANOVA, enough so that some people use the term regression model to refer to any model that tries to predict a continuous outcome on the basis of any number of categorical and continuous variables. I describe these similarities and explain my specific perspective on the issue at www.pmean.com/08/RegressionAndAnova.html.
Did you like this article? Visit http://www.pmean.com/category/AnalysisOfVariance.html for related links and pages.
6. Monthly Mean Article: The ADVANTAGE seeding trial: a review of internal documents, K. P. Hill et al.
The ADVANTAGE seeding trial: a review of internal documents. K. P. Hill, J. S. Ross, D. S. Egilman, H. M. Krumholz. Ann Intern Med 2008: 149(4); 251-8. [Medline] [Abstract] [Full text] [PDF]. Description: This article defines the concept of a seeding trial, a clinical trial whose main purpose is to increase the market share of a new drug under the superficial trappings of a research study. The article highlights some internal documents obtained during litigation over VIOXX that describe the marketing aims of an early trial of VIOXX with the cute acronym ADVANTAGE (Assessment of Differences between Vioxx and Naproxen To Ascertain Gastrointestinal Tolerability and Effectiveness). The authors extracted information from the internal documents using a qualitative research technique known as the constant comparative method. The authors also conducted a systematic review of published literature on seeding trials and provided a qualitative discussion of the findings of six relevant research papers.
7. Monthly Mean Blog: Statistical Modeling, Causal Inference, and Social Science, Andrew Gelman.
Statistical Modeling, Causal Inference, and Social Science. Andrew Gelman. This is a blog for statisticians, and it doesn't pull any punches. Dr. Gelman is a faculty member of the Departments of Statistics and Political Science at Columbia University, and the author of several textbooks. This blog presents a Bayesian perspective to many data analyses. www.stat.columbia.edu/~gelman/blog/
8. Monthly Mean Book: Regression Modeling Strategies, Frank Harrell.
I get a lot of questions about books. Usually it is from people who are just starting to learn statistics or to re-learn the statistics that they forgot years ago. In this section of the Monthly Mean newsletter, I provide recommendations of books that I own and that I have enjoyed.
Regression Modeling Strategies. Frank Harrell. Springer: New York, NY. ISBN: 0387952322. This is a book that every practicing statistician should read. Dr. Harrell take a modern approach to Statistics that incorporates new developments in the area of statistical modeling. In the preface, he outlines some rather ambitious goals:
This book links standard regression model with
- methods for relaxing linearity assumptions that still allow one to easily obtain predictions and confidence limits for future observations, and to do formal hypothesis tests,
- nonadditive modeling approaches not requiring the assumption that interactions area always linear x linear, Methods for imputing missing data and for penalizing variances for incomplete data,
- methods for handling large numbers of predictors without resorting to problematic stepwise variable selection techniques,
- data reduction methods (some of which are based on multivariate psychometric techniques too seldom used in statistics) that help with the problem of "too many variables to analyze and not enough observations" as well as making the model more interpretable when there are predictor variables containing overlapping information,
- methods for quantifying predictive accuracy of a fitted model,
- powerful model validation techniques based on the bootstrap, that allow the analyst to estimate predictive accuracy nearly unbiasedly without holding back data from the model development process, and
- graphical methods for understanding complex models.
This book was published in 2001 but it is more current than many books published afterwards.
9. Monthly Mean Website: Textbook examples. UCLA Academic Technology Services.
There's a lot of good information about Statistics on the web, but it takes time to sift through all the sites. In this section of the Monthly Mean newsletter, I try to list web sites that I have encountered that I have found to be particularly useful.
Textbook examples. UCLA Academic Technology Services. Excerpt: This page lists all of the books for which we have developed web pages showing how to solve the examples using common statistical packages. We encourage you to obtain the textbooks associated with these pages to gain a deeper conceptual understanding of the analyses illustrated (see our suggestions on Where to buy books). We are very grateful to the authors of these textbooks for granting us permission to create these pages and to distribute their data files via our web pages. URL: www.ats.ucla.edu/stat/examples/
10. Monthly Mean Wikipedia entry: Histogram
The quality of Wikipedia entries about Statistics is uneven, with some very good entries and others that are very confusing and poorly explained. In this section of the Monthly Mean newsletter, I'll try to highlight some of the entries that I've found interesting and reasonably well done.
The Wikipedia entry on histogram (en.wikipedia.org/wiki/Histogram) offers a simple explanation of histograms.
In statistics, a histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories.
This is not the same as a bar chart.
A histogram differs from a bar chart in that it is the area of the bar that denotes the value, not the height, a crucial distinction when the categories are not of uniform width.
It explains that histograms represent
one of the seven basic tools in quality control, which also include the Pareto chart, check sheet, control chart, cause-and-effect diagram, flowchart, and scatter diagram.
The page has a nice summary of various formulas for selecting the number of bars in a histogram.
11. Nick News: Nicholas the swimmer
I know most of you are not interested in my personal life, but for those that are, I want to provide a few tidbits. More information is available on the personal pages of the pmean website: www.pmean.com/personal. In this section of the Monthly Mean newsletter, I provide one personal update on me and my family. Most often this will be about my son, Nicholas, which is why I call this section Nick News.
Nicholas the swimmer. Summer is over and Nicholas is doing very well in first grade. This summer, he spent a lot of time at the pool and pretty much taught himself how to swim, though he did get some help from my wife, Cathy, and from the teachers at the summer care program he attended. Nicholas could toss an object into the deepest part of the pool, swim out to the middle, dive down and grab the object and then swim to the opposite side. I have an underwater camera and took a few pictures. I can't include them in the newsletter without making it big and unwieldy, but you can find these pictures at www.pmean.com/personal/swimmer.html.
12. Very bad joke: A physicist, chemist, and statistician.
A little bit of humor goes a long way, especially in an area like Statistics that some people consider (incorrectly, I might add!) to be a boring subject. In this section of the Monthly Mean newsletter, I offer a very bad joke, sometimes my own, and sometimes from other sources.
One day there was a fire in a wastebasket in the office of the Dean of Sciences. In rushed a physicist, a chemist, and a statistician. The physicist immediately starts to work on how much energy would have to be removed from the fire to stop the combustion. The chemist works on which reagent would have to be added to the fire to prevent oxidation. While they are doing this, the statistician is setting fires to all the other wastebaskets in the office. "What are you doing?" the others demand. The statistician replies, "Well, to solve the problem, you obviously need a larger sample size."
I can't take credit for this joke. It appears in Gary C. Ramseyer's First Internet Gallery of Statistics Jokes at www.ilstu.edu/~gcramsey/Gallery.html.
13. Tell me what you think.
How did you like this newsletter? I have three short open ended questions that I'd like to ask. It's totally optional on your part. Your responses will be kept anonymous, and will only be used to help improve future versions of this newsletter.
Sign up for the Monthly Mean newsletter
Review the archive of Monthly Mean newsletters
Take a peek at an early draft of the latest newsletter
Go to the main page of the P.Mean website
This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-09-23. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Website details.