||P.Mean: Archive organized by date (created 2010-01-06)
This page lists files created in calendar year 2010. Also look at the
archives for 2012, 2011,
2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, and 1999. You can
also browse through an archive of pages
organized by topic.
- P.Mean: Creating LaTex formulas on the fly (created
2010-12-20). I don't use LaTeX a lot (though I should) because I am fairly
happy with a proprietary product that I use for formulas, MathType. Still,
there are some times when it would be nice to use a bit of LaTeX, and there's
a web site that makes this easy.
- P.Mean: Location of my UMKC office (created
2010-12-09). I work part-time as an independent statistical consultant and
part time at the University of Missouri-Kansas City (UMKC). If you need to
meet with me for UMKC related work, here is how to get to my office.
- P.Mean: Are certain CAM therapies
undeserving of further study (created 2010-12-01). I have become something
of a celebrity on the Science Based Medicine site, as I have noted in an
earlier webpage. In addition to the blog post I noted earlier, there is a new
post: Of SBM and EBM Redux. Part I: Does EBM Undervalue Basic Science and
Overvalue RCTs? These posts are reminding me how important it is to write
precisely, which is good. I largely agree with many of the comments written in
these particular entries and in others at the Science Based Medicine site, but
there are still areas of fundamental disagreement. One of the major areas
where we disagree is over the value of running randomized control trials for
certain CAM (Complementary and Alternative Medicine) therapies that are
- P.Mean: Poem to help you remember the quotient
rule (created 2010-11-26). I was working on some derivatives then involved
a fraction, and the formula is a bit tricky to remember. There was a short
poem that I learned a long time ago for the derivative of a fraction, and I
can't find it anywhere on the Internet. There are some variants that are
close, but nothing quite like the poem I remember. Everything important has to
be found somewhere on the Internet, so I am posting the poem here. If anyone
can attribute this poem to the original source, please let me know.
- P.Mean: Transforming the parameter
also transforms the prior distribution (created 2010-11-25). All my work on Bayesian models recently has forced me to remember some of
my mathematical statistics that I had not touched since college. Here's
another example of this. Suppose you have a prior distribution on a parameter
θ and you want to find the comparable
prior for a transformation φ=u(θ).
- P.Mean: The odds ratio in logistic
regression is the opposite of what it should be (created 2010-11-22). I
have data in the following a table that clearly shows a positive association,
but when I run a logistic regression model, the odds ratio is reported as less
than 1. How can this be?
- P.Mean: BUGS is more than just one program
(created 2010-11-19). I am working on some Bayesian models that use a
program called BUGS. BUGS stands for Bayesian Inference Using Gibbs Sampling.
There are several ways you can run BUGS, and it is worthwhile to note why
there are multiple programs.
- P.Mean: Ambiguity in the definition of
the exponential distribution (created 2010-11-16). I'm trying to run some Bayesian analyses using a program called BUGS (Bayes
Using Gibbs Sampler), and this requires me to specify a prior distribution for
the parameter associated with an exponential waiting time. I'm having more
trouble that I should because the exponential distribution is defined two
- P.Mean: The Science-Based Medicine
blog defends itself (created 2010-11-09). I get a few fan letters from
people, which are greatly appreciated, but when I get the rare critical
response, I am even more grateful. It doesn't matter if the criticism is valid
or not. Someone who takes on the unpleasant task of critiquing my work offers
some valuable insights on: what I wrote poorly because it was incorrect, or
what I wrote poorly because it was misinterpreted, or what I wrote well but
there is a dissenting opinion. One of my webpages, P.Mean: Is there something
better than Evidence Based Medicine out there (created 2010-09-20), was
highlighted and criticized on the Science Based Medicine blog by David Gorski,
and here are some of the things I learned from that criticism. This is an
expansion of comments I left on their blog entry.
- P.Mean: Would you hire someone who knew theory
or someone who knew practice (created 2010-11-03). Someone on LinkedIn
asked if it was better to hire someone who knew theory or someone who knew
practice. Here's my response.
- P.Mean: Poster presentation at the
Missouri Technology conference (created 2010-10-04). I will be presenting a poster about the Bayesian model for accrual at the
Missouri Technology conference in Columbia, Missouri. There was some confusion
about this, partly because I submitted an abstract at the last minute. Here is
the abstract that I turned in.
- P.Mean: Why the least squares regression line
has to pass through XBAR, YBAR (created 2010-10-01). An issue came up
about whether the least squares regression line has to pass through the point
(XBAR,YBAR), where the terms XBAR and YBAR represent the arithmetic mean of
the independent and dependent variables, respectively. The line does have to
pass through those two points and it is easy to show why.
- P.Mean: If you knew that failure was not an
option, what would you do (created 2010-10-01). There is a question and
answer forum on LinkedIn where people ask all sorts of questions. A common
theme among some people there is to ask motivational questions, which I try to
respond to sometimes with an off-beat answer. There was a question along these
lines: "If you knew that failure was not an option, what would you do?" I
started off with a rather flippant answer, but then realized that there was a
more serious answer.
- P.Mean: Is there something better than
Evidence Based Medicine out there (created 2010-09-20). Someone asked me
about a claim made on an interesting blog, Science Based Medicine. The blog
claims that Science Based Medicine (SBM), that tries to draw a distinction
between that practice and Evidence Based Medicine (EBM). SBM is better because
"EBM, in a nutshell, ignores prior probability (unless there is no other
available evidence and falls for the p-value fallacy; SBM does not." Here's
what I wrote.
- P.Mean: Putting variable names into a model
automatically (created 2010-09-20). I always have trouble with including a
changing variable name into a sequence of statistical models in R, so when
someone wrote about it on the R-Help list, I thought I should try some of the
suggestions and then write them down here so I don't forget.
- P.Mean: Oh those pesky interactions! (created
2010-09-16). Someone was fitting a binary logistic regression model and
regretfully (that was his word) found two significant (p < 0.05) interactions.
The tone was that he was testing for interactions using some type of stepwise
approach, but was hoping that no interactions would appear. When they did
appear, he had a panic, not about how to interpret the interactions, but
rather whether he should include them in his publication. Here's the advice I
- P.Mean: My new twitter account (created 2010-09-15). I
started a new twitter account, mostly to follow the twitter feed of the
Department of Biomedical and Health Informatics at UMKC. I work in that
department part-time. I may use my twitter account to announce new updates to
my website. My twitter feed is @profmean.
- P.Mean: Can you compute a confidence interval for
your p-value? (created 2010-09-10). A question that comes up from time to
time is whether you can calculate a confidence interval for a p-value. It
always get statisticians into a tizzy because it seems to be such a logical
thing to do, but no one does it. Here's how I like to think about the issue.
- P.Mean: Using information theory to identify
discrepancies within and between text files (created 2010-09-02). I have been experimenting with the use of information theory to identify
patterns in text data files. This work in somewhat preliminary, but it has
some exciting possibilities. If there are certain patterns that occur
frequently at a given column of a text data file (e.g., always the letters "A"
or "B"), then these columns become important for looking for aberrant data
that might be caused by a typographical error, a misalignment of the row of
data, or a deviation from the code book. I want to show some preliminary
graphs that illustrate what these patterns look like for some files I am
working with. Warning: this is a very large webpage with graphics that
extend across dozens of pages!!
- P.Mean: Is it ethical to recruit a panhandler
that you see on the street into your research study (created 2010-09-01).
Someone asked a question about the ethics of approaching a panhandler and
sharing information about a research study. I don't know all the details, but
apparently, this study was examining veterans of the Iraq war, and this
panhandler was holding a sign saying something like please help a veteran of
the Iraq war. There was some concern about whether the monetary incentive
would be disproportionate for someone who had to beg for a living, or it might
be a problem if the panhandler was given money and a flyer about the research
study at the same time. I discussed some of my concerns about this study, but
it was from the perspective of statistical validity rather than from an
- P.Mean: Pooling different measures of risk in a
meta-analysis (created 2010-07-26). Someone on the MEDSTATS email
discussion group asked about how to pool results in a meta-analysis where some
of the summary measures are reported as odds ratios, others as relative risks,
and still others as hazard ratios. There's actually a fourth measure that is
commonly used when the outcome measure is binary (live/dead, improved/not
improved, relapsed/relapse free, etc.). That is the risk difference, and its
inverse, the number needed to treat. Here's what I wrote in response.
- P.Mean: What is a Generalized Estimating Equations
model? (created 2010-08-19). Generalized Estimating Equations (GEE) are a
model for your data that can account for dependence among some of your
measurements due to repeated measures, cluster sampling, or a longitudinal
data set. It represents an extension of the Generalized Linear Model (GLM).
Like the GLM, the GEE model allows you to specify a link function and a mean
variance relationship. With the appropriate choice of these two items, you can
specify a wide variety of models.
- P.Mean: Is Evidence-Based Medicine too rigid
(created 2010-08-19). Someone was asking about criticisms of
Evidence-Based Medicine (EBM) that the reliance on grading schemes and the
hierarchy of evidence was too rigid or was EBM providing some heuristics that
could be adapted as needed. This is hard to respond to, but it is an important
question. I view checklists and hierarchies as a necessary evil, and that
sometimes they are applied too rigidly.
- P.Mean: Competing books to the book I
am planning to write (created 2010-08-16). I have been asked by several
publishers to list competing books to the book I am planning to write. My book
is quite different than anything else out there, but perhaps the closest
competition would be books that talk about research methods. Here are some
possible competitors in that area.
- P.Mean: What should clients get from you at
the end of the first consulting session (created 2010-08-14). There has
been a lot of discussion about the nature and role of consulting on the
message boards of the Statistical Consulting section of the American
Statistical Association One particularly valuable question was what
should you do when starting a new consulting job. Here is an adaptation of one
particularly good response.
- P.Mean: Glossary for my second book (created
2010-08-11). As I mentioned in an earlier webpage,
I am talking to some publishers about writing a second book. Here's a
tentative glossary for that book. I'm only including the terms in the glossary
for now, but will eventually add definitions.
- P.Mean: What's a fair price for SPSS? (created
2010-08-06). There was a discussion on an email discussion group about
SPSS about how the SPSS software package was too expensive and how they
should consider offering a discount price for the home user. Everyone was in
favor of lower prices, of course, and compared the pricing of SPSS to that of
Stata and R. In the
spirit of debate, I offered a contrarian viewpoint. It also applies to similar
complaints I have heard about the pricing of SAS software.
- P.Mean: Fighting the claim that any size
difference is clinically important (created 2010-08-05). When working with people to select an appropriate sample size, it is
important to establish the minimum clinically important difference (MCID).
This is a difference such that any value smaller would be clinically trivial,
but any value larger would be clinically important. I get told
quite often that any difference that might be detected is important. I could
be flippant here and then tell them that their sample size is now infinite and
my consulting rate is proportional to the sample size, but I don't make
flippant comments (out loud, at least). Here's how I might challenge such a
- P.Mean: Standard operating procedures for a
statistical consulting center (created 2010-07-30). I asked a question on
one of the American Statistical Association message boards about how I
setting up a consulting service at the University of Missouri-Kansas City (UMKC),
where I work part-time. I wanted to develop some SOPs (Standard Operating
Procedures) for this center that would supplement the guidance already
available on the web. I asked if anyone else had SOPs (or anything similar)
that I could look at so I wouldn't re-invent the wheel. I got a lot of
- P.Mean: When should research in a given area
end? (created 2010-07-26). Someone asked a rather philosophical question,
is there ever an end to research in a given area? Will there ever be a "last
word" on a research topic. Here's what I wrote in response.
- P.Mean: Sample chapter: The first three steps
in selecting an appropriate sample size (created 2010-07-24). As I
mentioned in an earlier webpage, I am talking to some publishers about
writing a second book. The working title is "Jumpstart Statistics: How to
Restart Your Stalled Research Project." Here's a tentative chapter from that
book. It is not quite complete yet, but I'm hoping to finish it soon. One of
your most critical choices in designing a research study is selecting an
appropriate sample size. A sample size that is either too small or too large
will be wasteful of resources and will raise ethical concerns.
- P.Mean: Tentative table of contents for my second
book (created 2010-07-24). As I mentioned in an earlier webpage, I am
talking to some publishers about writing a second book. The working title is
"Jumpstart Statistics: How to Restart Your Stalled Research Project." Here's
a tentative table of contents.
- P.Mean: Jumpstart Statistics, a proposal for my
second book (created 2010-07-23). I want to talk to some publishers about
writing a second book. Here is what I will propose to them.
- P.Mean: Salary survey for Biostatisticians (created
2010-07-21). I am working part-time at UMKC in the Department of
Informatic Medicine and Personalized Health. They like me and want me to
increase my hours from 10 hours a week (25% time) to something more. I'll
talk to them about this, but at the same time, I want to point out that my
salary is not competitive with my peers. Here's a table from a recent survey
on salaries, published in the Amstat News.
- P.Mean: What is principal components analysis?
(created 2010-07-19). I was asked to help someone who was reviewing a
paper that used principal components analysis (PCA) as part of the
statistical methodology. I have not yet seen the article, so I could only
offer very general advice.
- P.Mean: Another counter-intuitive
probability problem (created 2010-07-04). A recent article in Science
News, rekindled the two children problem and offered an odd twist. Here's the
simple version. Suppose you have two children, one of whom is a boy. What is
the probability that both children are boys? The obvious, but incorrect
choice is 1/2. The correct answer is 1/3. How does this work?
June 2010 (8 entries)
- P.Mean: Resources using Stack Overflow
(created 2010-06-30) . A bunch of Internet resources fell into my lap all
at once. Some of them relate to a new technology (Stack Overflow/Stack
Exchange) that allows people to pose questions like an Interenet email
discussion group, but it is web-based and has some of the capabilities
associated with blogs and wikis.
- P.Mean: The SPSS t-test is confusing (created
2010-06-29). I have always disliked how SPSS (now IBM SPSS) presented the output from
their independent samples t-test. I want to explain why it is confusing and
show you an alternative based on the general linear model.
- P.Mean: Classic references in Statistics
(created 2010-06-29). A prominent statistician, Christian Robert, listed
some classic research papers in Statistics that he wanted to present to his
students in a special readings class. This was commented on by another
prominent statistician, Andrew Gelman. I'm not a prominent statistician, but
that won't stop me from adding my two cents.
- P.Mean: What I use for talks instead of
Powerpoint (created 2010-06-28). Someone on LinkedIn asked a question
about what technologies people use for their presentations (laptop,
flipchart, or whiteboard). For most of my presentations, I use none of these
technologies. Instead I create a webpage of my presentation and then print it
and hand out copies.
- P.Mean: The futility of small sample sizes for
evaluating a binary outcome (created 2010-06-16). I'm helping out with a project that involves a non-randomized comparison of
two groups of patients. One group gets a particular anesthetic drug and the
other group does not. The researcher wants to compare rates of hypotension,
respiratory depression, apnea, and hypoxia. I suggested using continuous
outcomes like O2 saturation levels rather than discrete events like hypoxia,
but for a variety of reasons, they cannot use continuous outcomes. Their
original goal was to collect data on about 20 patients in each group.
- P.Mean: An example of a bad survey (created
2010-06-11). I was asked to fill out an Internet survey to define my
"consulting needs." That's a rather strange invitation, and sounds almost
like a cheap way to develop business leads. But it was a request through
LinkedIn, so I thought it was worth filling out. I want to try to build my
contacts at LinkedIn, and filling out a short survey seemed like a small
price to pay to get a potential lead for my own consulting business. When I
went to the webpage with the actual survey, though, I was shocked and
disappointed with what I found.
- P.Mean: An interesting alternative to power
calculations (created 2010-06-09). Someone on the MedStats Internet
discussion group mentioned an alternative to power calculations called
accuracy in parameter estimation (AIPE). It looks interesting. Here are some
- P.Mean: Minimum sample size needed for a time
series prediction (created 2010-06-08). Someone asked what the minimum
sample size that was needed in a time series analysis model to forecast
future observations. Strictly speaking, you can forecast with two
observations. Draw a straight line connecting the two points and then extend
that line as far as you want in the future. But you wouldn't want to do that.
So a better question might be what is the minimum number of data points that
you would need in order to provide a good forecast of the future.
May 2010 (9 entries)
- P.Mean: What is the premier conference for
statistical consulting (created 2010-05-28). Someone asked what the
premier conference for statistical consulting. That's a rather ambiguous
question, because different people will interpret terms like "premier
conference" and "statistical consulting" differently. The answer, however, is
pretty unambiguous. In North America, it would have to be the Joint
Statistics Meetings (JSM).
- P.Mean: Lessons learned the hard way: don't
presume to know how your software handles missing value codes (created
2010-05-28). I'm working on an interesting project that involves summing
up rvu's (resource value units) across certain records for a given patient.
Some of the rvu's are missing. How should the program handle these missing
rvu's. We discussed this by email and agreed to ignore missing rvu's in the
sum. This is effectively the same as replacing the missing rvu's with zero.
There is two cases worth worrying about, though, and handling those cases
makes me realize just how tricky missing values are.
- P.Mean: How I got started in my career as an
independent statistical consultant (created 2010-05-24). LinkedIn has a
question and answer board, and one of the questions inspired me to write up
the story of how I got started in my career as an independent statistical
consultant. Here's the original question: I'm very curious as to what
events or conversations enabled you to change direction in your career. What
thought process did you go through? What resources did you use or uncover?
- P.Mean: How do I handle criticism (created
2010-05-21). Someone asked how I handle criticism. To be honest, I don't
get criticized all that much. Possibly it is that I do very little that
deserves criticism, and possibly, people are intimidated by the area I work
in (unjustifiably intimidated, by the way, but many people are just plain
scared of numbers). It is also important to note that most people don't like
to share negative opinions directly. They certainly will tell others, of
course, if something is wrong, but it takes some boldness and some bravery to
confront a person directly.
- P.Mean: How to avoid charges of plagiarism
(created 2010-05-15). I'm not an expert on this, but I got a question about how to avoid charges
of plagiarism in a thesis, especially the sections of the thesis that reviewed
existing research and theoretical background. Here's how I responded.
- P.Mean: Withdrawing from a study and taking
your data with you (created 2010-05-15). Someone asked me what the phrase
"you can withdraw from the study at any time" really means. Can a research
subject withdraw and take their data with them (that is, ask that their data
be expunged from the database)? What if they raise the objection after the
data analysis is done, because they don't like the results of the study. Can
they ask for their data to be expunged then? What if they raise the objection
after the data is published?
- P.Mean: Lessons learned the hard way: don't throw
good money after bad (created 2010-05-14). I am helping out with data management for a project involving 19 million
records from an insurance database. The file is too big to be read into R in
one piece, so I decided to read in successive segments of 100,000 records and
then write them out again as separate files. This was a big mistake and showed
me the importance of the saying: "Don't throw good money after bad."
- P.Mean: What is a good surrogate measure for
socioeconomic status (created 2010-05-03). I received a question, indirectly, about what might be a good surrogate
measure for socioeconomic status (SES). That raises two questions, actually.
What is SES, and how can we tell if a surrogate is a good surrogate for SES.
- P.Mean: More discussion on
instrumental variables (created 2010-05-03). I attended the May meeting
of the KUMC Statistics Journal Club. The topic of discussion was a paper
outlining the properties and applications of instrumental variables.
April 2010 (7 entries)
- P.Mean: My life so far: fails to meet
expectations (created 2010-04-21). I'm learning how to use LinkedIn, and
there are some people on that site who ask general philosophical questions.
Some are a bit silly but they are still fun to answer. One person asked
people to apply the traditional performance evaluation categories (Exceeds
expectations, Meets expectations, Fails to meet expectations) to their own
lives. So here is what I wrote.
- P.Mean: Interpreting p-values in a
published abstract, part 1 (created 2010-04-14). In one of my recent
webinars, I asked people to read the following abstract and interpret the
p-values presented within. The Outcome of Extubation Failure in a
Community Hospital Intensive Care Unit: A Cohort Study. Seymour CW,
Martinez A, Christie JD, Fuchs BD. Critical Care 2004, 8:R322-R327 (20 July
2004) Introduction: Extubation failure has been associated with poor
intensive care unit (ICU) and hospital outcomes in tertiary care medical
centers. Given the large proportion of critical care delivered in the
community setting, our purpose was to determine the impact of extubation
failure on patient outcomes in a community hospital ICU. Methods: A
retrospective cohort study was performed using data gathered in a 16-bed
medical/surgical ICU in a community hospital. During 30 months, all patients
with acute respiratory failure admitted to the ICU were included in the
source population if they were mechanically ventilated by endotracheal tube
for more than 12 hours. Extubation failure was defined as reinstitution of
mechanical ventilation within 72 hours (n = 60), and the control cohort
included patients who were successfully extubated at 72 hours (n = 93).
Results: The primary outcome was total ICU length of stay after the initial
extubation. Secondary outcomes were total hospital length of stay after the
initial extubation, ICU mortality, hospital mortality, and total hospital
cost. Patient groups were similar in terms of age, sex, and severity of
illness, as assessed using admission Acute Physiology and Chronic Health
Evaluation II score (P > 0.05). Both ICU (1.0 versus 10 days; P < 0.01) and
hospital length of stay (6.0 versus 17 days; P < 0.01) after initial
extubation were significantly longer in reintubated patients. ICU mortality
was significantly higher in patients who failed extubation (odds ratio =
12.2, 95% confidence interval [CI] = 1.5–101; P < 0.05), but there was no
significant difference in hospital mortality (odds ratio = 2.1, 95% CI =
0.8–5.4; P < 0.15). Total hospital costs (estimated from direct and indirect
charges) were significantly increased by a mean of US$33,926 (95% CI =
US$22,573–45,280; P < 0.01). Conclusion: Extubation failure in a community
hospital is univariately associated with prolonged inpatient care and
significantly increased cost. Corroborating data from tertiary care centers,
these adverse outcomes highlight the importance of accurate predictors of
extubation outcome. It is a bit dangerous to read only the abstract, of
course, but this was intended for a general illustration.
- P.Mean: Quiz about p-values (created
2010-04-14). In one of my webinars, I offered the following quiz
question: A research paper computes a p-value of 0.45. How would you
interpret this p-value? 1. Strong evidence for the null hypothesis; 2. Strong
evidence for the alternative hypothesis; 3. Little or no evidence for the
null hypothesis; 4. Little or no evidence for the alternative hypothesis; 5.
More than one answer above is correct; 6. I do not know the answer. This
is actually a bit of a trick question.
- P.Mean: Using entropy and the
surprisal value to measure the degree of agreement with the consensus finding
(created 2010-03-02). One of the research problems that I am working on involves evaluation of a
subjective rating system. I have been using information theory to try to
identify objects where the evaluators agree well and objects where the
evaluators do not agree well. I also am working on identifying objects that an
individual rater does poorly. The method is to measure when the surprisal of
the category that a rater selected is much lower than the entropy (the average
surprisal across all raters)
- P.Mean: What makes a good website (created
2010-04-07). Someone posed a series of questions about what makes a
perfect website design. I am not a big fan of "design" and tried to make that
point in my responses.
- P.Mean: Should I learn R instead of SAS (created
2010-04-05). I got a question from a statistician beginning her career
asking whether she should learn SAS or R. That's a very personal question and
there is no perfect answer. Here is what I wrote.
- P.Mean: Dealing with a large text file that
crashes your computer (created 2010-04-02). At a meeting, a colleague was
describing a text file that he had received that had crashed his system. No
way, I thought, could a simple text file crash your system. I offered to
investigate and he was right. The text file crashed my system too, and
repeatedly. Here's what I did to figure out how a simple text file could
crash your computer.
March 2010 (6 entries)
- P.Mean: What to say when any data
analysis is pointless (created 2010-03-25). Someone on the MEDSTATS email
discussion group asked for help. They were trying to establish a normal range
or reference interval for a set of observations involving gastric emptying.
The sample size, 14, was much too small to produce reliable results, but it
got worse than that. For one of the outcomes, the result was fourteen zeros.
What can you do with such a data set? What can you say? That a difficult
question, and here is how I would approach such a problem.
- P.Mean: Calculating weights to correct
for over and under sampling (created 2010-03-22). Someone asked how to
use weights to adjust for the fact that certain strata in a study were
recruited more vigorously than other strata. For example, suppose you sampled
at four communities and noted the age distribution as 0-14 years, 15-39
years, and 40+ years. How would you adjust for differential age
- P.Mean: Ordinal surprisals (created
2010-03-20). Closely related to the concept of ordinal entropy is ordinal surprisals.
The surprisal is the negative log base 2 of the probability, and if you
multiply the probabilities with the surprisals and add them up, you get
entropy. Can you define an ordinal surprisal in such a way that when you
multiply the ordinal surprisals by the probabilities, you get the ordinal
- P.Mean: Can sex be an outcome variable
(created 2010-03-16). Someone asked whether it was legitimate to use sex
(gender) as a dependent variable or outcome variable in a logistic regression
model. It seems wrong, on the face of it, to think that various factors can
influence whether we are male or female. It actually is perfectly fine to use
sex as an outcome variable. Here is how I would justify its use.
- P.Mean: Ordinal entropy (created
2010-03-11). I have been using the concept of entropy to evaluate a sperm morphology
classification system and to identify aberrant records in large fixed format
text files. Some of the data I have been using in these areas is ordinal with
three levels, normal, borderline, and abnormal. In all of my work so far, I
have treated all three categories symmetrically. So, for example, the entropy
of a system where 50% of the probability is associated with normal and 50% is
associated with borderline is 1. The entropy of a system where 50% of the
probability is associated with normal and 50% is associated with abnormal is
also 1. It has always bothered me a bit because it seems that the second case,
where the probabilities are placed at the two extremes, should have a higher
level of entropy. Here is a brief outline of how I think entropy ought to be
redefined to take into account the ordinal nature of a variable.
- P.Mean: Finding duplicate records in a 19
million record database (created 2010-03-02). I was asked to help find
duplicate records in a large database (19 million records). The suspected
number of duplicates was suspected to be small, possibly around 90. My
colleague's approach was running PROC FREQ in SAS on the "unique" id and then
looking for ids that have a frequency greater than 1. That did not work--it
took too long or it overloaded the system, or both. So I wanted to look at
alternatives for identifying duplicate records that would do this more
February 2010 ( 9 entries)
- P.Mean: Is intuition real? (created 2010-02-25).
Someone asked if intuition is real. My hunch is that intuition is may be
real, but it is grossly overrated.
- P.Mean: Abstract submitted to Missouri
Regional Life Sciences Summit (created 2010-02-13). Yesterday, I
submitted the following abstract for a poster session in the Missouri
Regional Life Sciences Summit. I'll find out on Monday if it will be
accepted. "Slipped deadlines and sample size shortfalls in clinical trials: a
proposed remedy using a Bayesian model with an informative prior
- P.Mean: Meta-analysis for a single mean
estimate (created 2010-02-11). Someone noted that the usual meta analysis
is carried out for the study on two treatment groups, usually for a
difference in means. What if you had several studies estimating not a
difference in means, but just estimates of a single mean. Could you conduct a
meta-analysis in this situation?
- P.Mean: Exponential interpolation
(created 2010-02-11). Someone wanted an exponential interpolation
formula. It's not quite a statistics question, but it caught my interest.
- P.Mean: Fan page for The Monthly Mean (created
2010-02-11). I've been getting some advice about Facebook. One suggestion
was to set up a "fan page". There are some differences between being a
"friend" on Facebook and being a "fan".
- P.Mean: Humility is a good thing for
researchers to have (created 2010-02-08). I've been writing a series of
articles about the seven deadly sins of researchers. One of these sins is
pride. I might need to talk about the alternative to pride, which is
humility. I believe that researchers should adopt a humble outlook. Humility
is often misunderstood as a bad thing. It is not.
- P.Mean: Consulting remotely versus
consulting in person (created 2010-02-08). Someone was asking whether
there is a trend in consulting to demand a local presence rather than
allowing a consultant to work remotely. I was unable to comment on work
trends, as I have only been an independent consultant for 14 months. I did
point out, however, some of the issues associated with remote consulting.
- P.Mean: What are the characteristics of a
good statistical consultant (created 2010-02-07). Someone was considering
a career as a statistical consultant. Besides building up a network and
gaining experience, what traits would be necessary to be successful in such a
- P.Mean: Proposed poster for the Missouri
Regional Life Sciences Summit (created 2010-02-03). I am preparing a
poster for the Missouri Regional Life Science Summit. The poster guidelines
are a bit unusual in that there is only room for a four foot by four foot
square poster. Normally, these posters can be much wider. The tentative title
is "Slipped deadlines, sample size shortfalls, and a proposed Bayesian
solution using an informative prior distribution" and here is a proposed
January 2010 (7 entries)
- P.Mean: Facebook account (created 2010-01-25).
Several people have been encouraging me to set up an account on Facebook. I
did it this evening and two hours later, I had two friends.
- P.Mean: Abstracts for a possible upcoming
talk (created 2010-01-20). I might be asked to give a talk in February and I wanted to offer two
possible choices. Here are the titles and abstracts of those talks.
- P.Mean: SPSS or Stata? (created 2010-01-19). I am an SPSS user. Some of my friends are choosing to leave SPSS and
learn STATA. What are the advantages of STATA over SPSS?
- P.Mean: Masters or Phd in Statistics? (created
2010-01-19). Someone asked me about careers in Statistics and if you get the best career
with a Masters degree or a PhD. That's a very subjective choice and individual
preferences should weigh strongly in your choice.
- P.Mean: Power calculations for comparison of
Poisson counts across two groups (created 2010-01-11). Suppose you want to compare Poisson count variables across two groups. How
much data would you need to collect? It's a tricky question and there are
several approaches that you can consider.
- P.Mean: Where can I find free online textbooks
(created 2010-01-07). Someone was away from their personal library for a
while and needed a free online statistics reference book. With a free
textbook, you get what you pay for, of course, but there are some exceptions.
- P.Mean: What is residual confounding
(created 2010-01-06). Residual confounding is a frequent explanation for
unusual research findings. Before I define the term and show an example, I
need to address a more basic issue. The term "confounding" is used frequently
but often without careful consideration of the true definition of the term. I
tend to shy away from this term and typically use "covariate imbalance"
This work is licensed under a
Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2011-01-01. Need more
information? I have a page with general help
resources. You can also browse for pages similar to this one at