P.Mean: 2012 archive

P.Mean: Archive organized by date (created 2012-01-01).

This page lists files created in calendar year 2012. There is a big gap between May and August when I had to integrate over a thousand files from my old website (full details here). Also look at the archives for 2013, 2011, 2010, 2009, 2008 , 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, and 1999. You can also browse through an archive of pages organized by topic.

December 2012

33. P.Mean: A single wildly large value makes you less confident that the mean of your data is large (created 2012-12-12). I was working on a project that seemed to be producing some counter-intuitive results. The work involved ratios, and one of the experiments had an unusually large ratio. I tried a log transformation, which tends to pull down that large ratio. It improved the precision of the results, which you might expect. But it also reduced the p-value, which you might not expect. After all, if you use a log transformation to de-emphasize large values, won't that attenuate an test that tries to show that the average value is large? This bothered me for a while, so I developed a series of simple examples to resolve the apparent inconsistency.

32. P.Mean: Animations in R (created 2012-12-08). About twenty years ago, computers got fast enough to provide smooth animations of small and moderate sized data sets. There was a lot of effort to incorporate animation, such as 3D rotation of point clouds into statistical software programs. The results looked stunning, but I'm not sure if it led to many great insights. I experimented with programs like JMP, but never really felt too comfortable with them. So, I gave up on animation for the most part. But there is one area where animation makes sense and that is in teaching.

November 2012

31. P.Mean: Data sharing (created 2012-11-21). I came across several interesting papers and editorials about data sharing.

October 2012

30. P.Mean: Mapping my runs in R (created 2012-10-10). I started running as part of a 2011 New Years resolution to build up my stamina to the point where I could run a five kilometer race. I didn't care too much how fast I ran, but I did want to run the whole way without stopping and without taking a walking break. I've run in about dozen different five kilometer and four mile races. In the middle of 2011, I bought an iPhone with a built-in GPS system. It is the coolest thing ever. I used that iPhone to track my runs and started sharing the maps of my runs with other details on my running log. You can produce nice running routes using Google Maps, but I wanted to be able to manipulate the data a bit, so I developed a simple program in R. Here are the details.

29. P.Mean: Debating the validity of snowball sampling (created 2012-10-01). Someone on a discussion forum for IRB members criticized snowball sampling for a range of reasons, but (interesting from my perspective) for the reason that it is bad research. He asked "Why would anybody want to use snowball sampling? As non-probability sampling the results can't be generalized to a known universe." That's an interesting perspective, but one I disagree with. Here are my thoughts on the issue.

September 2012

28. P.Mean: Borderline p-values (created 2012-09-19). Dear Professor Mean, I originally reported a p-value of 0.04 for a Chi-Square test, but I was told to use the Fisher's Exact Test instead. The p-value for Fisher's Exact Test is 0.06. Do I have to drop the discussion of statistical significance?

27. P.Mean: When should I use the Fisher's Exact Test and when should I use the Chi-Square Test (created 2012-09-19). Dear Professor Mean, I was running crosstabs in SPSS for a two-by-two table and the p-values disagree. The p-value for the Pearson Chi-Square is 0.04 and the p-value for the Fisher's Exact Test (2-sided) is 0.06. Which one should I use?

26. P.Mean: What's the name of the test for comparing two proportions? (created 2012-09-12). A commonly used statistical test is the comparison of two independent proportions. For example, you are looking at the rate of steroid induced hyperglycemia among patients receiving high doses of steroids compared to the rate among patients receiving low doses. There are several terms that you can use here because there are several equivalent ways to test this hypothesis. I prefer to refer to the statistical method here as logistic regression. Here's why.

August 2012

24. P.Mean: Data management in R versus SAS (created 2012-08-27). Someone on LinkedIn was arguing that data management is easier in SAS than in R. A lot of times these claims are subjective. What is easier for one person may be more difficult for another. Also, you need to consider whether it is easy in that it is efficient (uses little computer time), fast to program, less likely to need debugging, or simpler for a non-statistician to understand and cross-check the code. It's probably some mix of this. So anyway, this person on LinkedIn was challenging the group to come up with a "simple" way in R to replicatea common data management scenario. Here's my response to that challenge.

July 2012

25. P.Mean: Whiskers in a boxplot (created 2012-07-27). Someone asked about how SPSS drew the "whiskers" in a boxplot. The length of the whiskers is supposed to be the distance from the 25th percentile to the minimum (75th percentile to the maximum) unless there are outliers. Outliers are defined as anything more than 1.5 box widths away from the end of either box.

May 2012

23. P.Mean: I make an amateur mistake in BUGS (created 2012-05-22). I am just learning how to run BUGS software. I've used WinBUGS, which is a stand-alone program for Windows, and OpenBUGS, which is an Open Source version that runs on Windows and Linux (as well as on the Macintosh with the Windows emulation package WINE). My preference, though, is to run BUGS within R using the BRugs package. I wanted to look at a simple extension of the accrual model, and I made a rather amateurish mistake that I want to document here. BUGS is not a program for the faint-hearted, and, as this lesson reinforces, you need to understand the mathematical foundations of these models if you want to use BUGS successfully.

22. P.Mean: Reviewing how the binomial and negative binomial distributions work (created 2012-05-17). When you look at the binomial distribution and the negative binomial distribution side by side, they look almost identical. But the subtle differences are important. I was working on some problems involving these two distributions and thought it might be helpful to review their properties. These properties are indeed well known, but I wanted to get comfortable with them before I started tackling some more complex alternatives to these two distributions.

21. P.Mean: A simple example of change of variable (created 2012-05-15). I need to review some basic mathematical statistics in order to understand some of the Bayesian accrual models that I am developing. One of those things, that is actually quite easy, but I seem to have some trouble with is the method known as "change of variable." This is a method that allows you to characterize the probability distribution of a random variable that is transformed by a simple function. I wanted to illustrate how this works for a simple, but not trivial case, just to prove to myself that this works.

20. P.Mean: A fishy story about randomization (created 2012-05-12). I was told this story but have no way of verifying its accuracy. It is one of those stories that if it is not true, it should be. It illustrates why randomization is important. A long, long, time ago, a research group wanted to examine a pollutant to find concentration levels that would kill fish. This research required that 100 fish be separated into five tanks, each of which would get a different level of the pollutant.

April 2012

19. P.Mean: Accrual with refusals, exclusions, or dropouts (created 2012-04-22). A common issue with slow accrual is higher than expected rates of refusals, exclusions, or dropouts. If you have information on these rates, you can incorporate them into a Bayesian model of accrual. Here are the details.

18. P.Mean: The simple accrual model, redefined (created 2012-04-19). I have been writing a bit about the simple homogenous accrual model, but I am having some difficulty with the notation. So I want to redefine the model with some simpler and more consistent notation.

17. P.Mean: BUGS model for the simple Poisson accrual model (created 2012-04-18). I have been working on various extensions to the Bayesian model for patient accrual. Most of these extensions would require the use of the program BUGS. The first step to developing these extensions is to program simple models in BUGS, models where there is a closed form analytical solution. Here is an example of using BUGS to model the simple Poisson accrual model.

16. P.Mean: Fitting the homogenous accrual model in BUGS (created 2012-04-13). Several years ago, I wrote some R code for the homegenous accrual model. This is the simplest case for accrual, with an inverse gamma prior on the waiting time between successive patients. I wanted to fit the same model in BUGS, because I want to look at some extensions and I wanted to start with something simple. I am not great at BUGS yet, but I got it to work in an hour. I'm using the R interface to Open BUGS (BRugs). Here is the code.

15. P.Mean: Bad scaling choices for the SPSS ROC curve (created 2012-04-09). I was helping a colleague with an ROC curve in SPSS and when he drew the curve, I couldn't believe what I saw.

14. P.Mean: Iowa talk on accrual (created 2012-04-03). I will be giving a talk "Slipped deadlines and sample size shortfalls in clinical trials: a proposed remedy using a Bayesian model with an informative prior distribution." at the University of Iowa. Here is the handout for my talk.

March 2012

13. P.Mean: Those pesky tab characters (created 2012-03-21). I frequently move text from one program to another, and one thing that is almost always guaranteed to cause annoyances is the presence of tabs. The tab is a single character, hex 09, that can sometimes be added with the Ctrl-I key on the computer, or the TAB key on a standard computer keyboard. The problem with the tab key is that it looks just like a bunch of blanks, but it doesn't always behave like a bunch of blanks.

12. P.Mean: Free consultation means no co-authorship? (created 2012-03-19). I heard about an interaction between a client and one of the other statisticians working at the UMKC Research and Statistical Consult Service (RSCS). This statistician had mentioned the (very reasonable) expectation of getting co-authorship on any publication emanating from the consultation. Apparently this was a surprise to the client who claimed that co-authorship is inappropriate because the RSCS provides consulting for free.

11. P.Mean: Making predictions based on just the correlation (created 2012-03-07). Dear Professor Mean, I have a math question. If the correlation, r, between two measurements is 0.1462, and I have one measurement can I calculate the other? I know it probably won't be accurate but can I get a rough approximation?

10. P.Mean: Why use a Bayesian adaptive trial? (created 2012-03-07). The Bayesian adaptive trial controls the probability of randomizing a patient to each of the proposed dose groups. As data emerges during the study, the probabilities are updated so that you are less likely to randomize a patient to a dose level that has far too much toxicity, far too little efficacy, or which does not contribute much information about the dose-response curve. The Bayesian adaptive trial also allows you to close certain arms of the trial if the dose is clearly inappropriate for further study.

February 2012

9. P.Mean: How sample size calculations are reported in the literature (created 2012-02-23). I am preparing a webinar on sample size calculations and wanted to examine some examples in the published literature. There were lots of interesting examples in an open source journal called Trials. I only included a few examples in my webinar, but I wanted to save the examples I found here in case I want to expand the talk.

9. P.Mean: Questions for a panel on statistical consulting (created 2012-02-08). I am participating on a panel discussion about statistical consulting. The organizer suggested several questions that we might want to tackle if there are not that many questions from the audience. I thought they were pretty interesting questions.

8. P.Mean: Percentage of care that does not have a medical basis (created 2012-02-06). At a meeting I was attending, a statistic came up that has a controversial heritage "at least 50% of medical care has no valid scientific basis." The number cited is not always 50%, but it is almost always a number that is low enough to be alarming. Here are some resources on the basis of this statistic.

January 2012

7. P.Mean: Promoting your consulting career in the era of web 2.0 (created 2012-01-27). I am giving a short course in February, "Promoting Your Consulting Career in the Era of Web 2.0." Here is an outline of what I will talk about.

6. P.Mean: Honorable mention for my R code on accrual (created 2012-01-25). Back in October 2011, I entered a contest sponsored by Revolution Analytics, "Applications of R in Business." I spiffed up a bit of my R code on patient accrual and submitted it with a brief explanation and some simple examples. It turns out that I was one of the five honorable mentions in this contest, which was a pleasant surprise, as I am just an amateur at programming in R.

5. P.Mean: Arguing with the material in an ethics training program (created 2012-01-12). I'm taking one those web based ethics training programs that is required by the UMKC IRB. It's not a punishment for something bad I did. The IRB requires this from all researchers. I'm probably one of the worst people to take these programs because I disect every assertion and look for the data behind every claim. It takes me forever to finish these things. Anyway, here's an example of the type of thing that drives me crazy.

4. P.Mean: What to report when SPSS says the p-value is zero (created 2012-01-09). Dear Professor Mean, I'm looking at some SPSS output where the p-value is listed as .000. How should you report the value? P < .001? P < .0005? P < .0001?

3. P.Mean: Is sample size justification really different for animal studies compared to human studies? (created 2012-01-06). Dear Professor Mean, I've spent my entire career (so far) in developing statistical analysis plans for human subjects research. Recently, a neuroscientist who performs experiments on rats asked me to assist in a power analysis. My conversation with him reminded me of that YouTube video (Biostatistics vs Lab Research): "I think I only need 3 subjects..." In his case, he seemed fixated on needing only 6 rats per group---which is what he had always done in the past. Are the rules for sample size justification different for animal studies than for human studies?

2. P.Mean: Post hoc power persists becauses peer-reviewers demand it (created 2012-01-04). I was in the middle of writing a grant looking at best research practices and wanted to give an example of when best practices weren't being followed. The easiest example to find was the use of post hoc power calculations. There's been at least two decades of criticism of this practice and yet it still occurs. The example I found, however, has an interesting twist to the tale.

1. P.Mean: A very silly graph (created 2012-01-01). I know I shouldn't let this bother me, but I saw a graph today that was wrong on so many different levels. Let me explain.

This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2017-06-15. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Professional details.