[Previous issue] [Next issue]

The Monthly Mean newsletter logo

The Monthly Mean newsletter, April 2013. Released 2013-05-12.

--> Introduction
--> My model won't run
--> Do you really need those sealed envelopes?
--> Article: Data for a Brighter Democracy
--> Quote: Errors using inadequate data...
--> Trivia: What tune by the band Chicago...
--> Website: Propensity Score Software Page
--> Tell me what you think.
--> Join me on Facebook, LinkedIn and Twitter
--> Permission to re-use any of the material in this newsletter

--> Introduction. Welcome to the Monthly Mean newsletter for April 2013. The Monthly Mean is a newsletter with articles about Statistics with occasional forays into research ethics and evidence based medicine. If you are having trouble reading this newsletter in your email system, please go to the web version (www.pmean.com/news/201304.html). If you are not yet subscribed to this newsletter, you can sign on at the newsletter page (www.pmean.com/news). If you no longer wish to receive this newsletter, there is a link to unsubscribe at the bottom of this email.

--> My model won't run. Every once in a while you will see a weird error message in your statistical program that will frustrate you and keep you from getting the results you want. Here's an example of such a message, in a Stata program for conditional logistic regression (CLR).

9163 (group size) take 8906 (# positives) combinations results in numeric overflow; computations cannot proceed r(1400);

Okay, so what does this mean? I'm not an expert on CLR, so I went to Google and found a nice PDF file explaining CLR at the University of Pennsylvania Department of Biostatistics and Epidemiology website.

CLR is used in studies with a binary outcome where matching or stratification is part of the design. It is a way to avoid having to model separate intercepts for each matched pair or for each strata. This is especially critical when your sample size is small and you have lots of matched pairs/lots of strata levels. The article notes that CLR cannot handle large strata well which is probably why this example failed with a strange error message. One of the strata has 9,163 members, and you need to look at combinations involving 8,906 positive outcomes in this strata. Even with today's computers, looking at combinations involving a group of this size is untenable.

So what do you do when your strata has thousands of members? The article suggests estimating an intercept for each strata in an ordinary plain vanilla logistic regression model.

There's a lesson to be learned here, and it is not about a specific model like conditional logistic regression. If your model fails to run and produces a strange and difficult to decipher error message, consider the possibility that you are using the wrong model for your data. I know that I have the temptation to try to use brute force and fit the model no matter what the computer tells me. Increase the available memory! Upgrade to a new version! Find a faster computer! Try a different piece of software! When your program complains about your model, maybe you need to sidestep the issue by fitting a different but comparable model.

--> Do you really need those sealed envelopes? Someone was about to start the randomization process for a clinical trial and asked me if I usually use the SNOE method? I had to ask, as I had never heard that acronym before. SNOE stands for Sequentially Numbered Opaque Envelopes. Oh, of course! Now I knew. This is a method of concealed allocation used in a study where full blinding is impossible. You hide the randomization assignment in a series of sealed envelopes. The recruiting physicians will go through the informed consent process and the evaluation of inclusion/exclusion criteria first. When the physician is sure that the patient qualifies and agrees to be a part of the study, they open a sealed envelope to find out what group the patient is randomized into. I talked about the rationale for sealed envelopes or other methods for concealed allocation in the January 2009 newsletter.

Concealed allocation is a great idea for some studies, but sometimes it might be overkill. I would consider sealed envelopes (or some other form of concealed allocation) more seriously if you see a lot of "yes" answers to the following questions.

  1. Are multiple physicians recruiting patients for the study?
  2. Are the two therapies being administered substantially different (e.g., surgery versus drug)?
  3. Are the benefits of the therapy large (e.g., possible life saving)?
  4. Are the possible side effects serious?
  5. Are there already strong opinions among practicing physicians about which therapy is better?
  6. Are the patients themselves likely to have a strong opinion about which arm of the study they are assigned to?
  7. Are there financial incentives involved (e.g., payment to the doctor for each recruited subject)?
  8. Is there evidence of selection bias in previous studies?

There's no right or wrong answer here. It depends on the resources you have, the financial implications if the study fails, and how risk averse you are. But some of the factors listed above may tilt the scale on way or another if you are unsure of how to proceed.

If your study is fully blinded, then there is no need for concealed allocation because the concealment will be handled behind the closed doors of the pharmacy.

--> Article: Rod Little and Tom Louis. Data for a Brighter Democracy. Huffington Post, April 9, 2013. Excerpt: "The U.S. Census Bureau, an agency of the Department of Commerce, conducts the Decennial Census, which is hard-wired into the U.S. Constitution. Unfortunately, the name fosters the misperception that bureau staff conduct the Decennial Census, twiddle their collective thumbs for nine or so years and then suck it up and do another census."

--> Quote: Errors using inadequate data are much less than those using no data at all. Charles Babbage, as quoted at http://www.citatum.org/category/Skepticism/5

--> Trivia: What tune by the band Chicago has more numbers than words in its title. Hint: it is rumored to be about drugs, but actually is about staying up very late working on a song. First person with a correct answer gets mentioned in the next issue of this newsletter.

Last month, I asked: "Beethoven wrote nine symphonies, but there is something special about the third, sixth, and ninth symphonies (other than the fact that all of these are divisible by three). What is the common tie?" The answer is that these are the three symphonies with commonly used names (Eroica, Pastoral, and Choral). John Field and Jeremy Miles both came up with the correct answer at almost the same time.

--> Website: Stuart, Elizabeth. Propensity Score Software Page, Accessed April 18, 2013. Description: This page has links to various propensity score matching methods in R, Stata, SAS, and SPSS.

--> Very bad joke: As you can see, by late next month...

As you can see, by late next month, you'll have over four dozen husbands. Better get a bulk rate on wedding cake

This XKCD cartoon was created by Randall Munroe, who has an open source license for all his material. You can find the original comic here.

--> Tell me what you think. How did you like this newsletter? Give me some feedback by responding to this email. Unlike most newsletters where your reply goes to the bottomless bit bucket, a reply to this newsletter goes back to my main email account. Comment on anything you like, but I am especially interested in answers to the following three questions.
--> What was the most important thing that you learned in this newsletter?
--> What was the one thing that you found confusing or difficult to follow?
--> What other topics would you like to see covered in a future newsletter?

If you send a comment, I'll mention your name and summarize what you said in the next newsletter. It's a small thank you and acknowledgement to those who take the time to help me improve my newsletter. If you send feedback and you want to remain anonymous, please let me know.

I received feedback from two people. John Field liked the article on counting, especially the emphasis on the critical need for operational definitions. One anonymous person talked about his experience working with lab technicians on cell counting. Even though they had all this nice equipment to automate counting, they still did it the old fashioned way. The reason: abnormal blood samples, which is the norm in a hospital, could not be counted accurately by an automated system.

--> Join me on Facebook, LinkedIn, and Twitter. I'm just getting started with social media. My Facebook page is www.facebook.com/pmean, my page on LinkedIn is www.linkedin.com/in/pmean, and my Twitter feed name is @profmean. If you'd like to be a Facebook friend, LinkedIn connection (my email is mail (at) pmean (dot) com), or tweet follower, I'd love to add you. If you have suggestions on how I could use these social media better, please let me know.

--> Permission to re-use any of the material in this newsletter. This newsletter is published under the Creative Commons Attribution 3.0 United States License. You are free to re-use any of this material, as long as you acknowledge the original source. A link to or a mention of my main website, www.pmean.com, is sufficient attribution. If your re-use of my material is at a publicly accessible webpage, it would be nice to hear about that link, but this is optional.

What now?

Sign up for the Monthly Mean newsletter

Review the archive of Monthly Mean newsletters

Take a peek at an early draft of the next newsletter

Go to the main page of the P.Mean website

Get help

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-12-31.