I’ve been applying for a variety of jobs, and one of them asked for a statement on my research interests. I tried to emphasize the collaborative nature of my research. Here’s what I wrote.

My best research efforts have been collaborative. I feel like these efforts represent something that neither I nor my co-authors could have developed by ourselves. I’ve had many wonderful research collaborations over the years. I want to highlight two of them: patient accrual in clinical trials and with mining information from the electronic health record. I also need to describe my efforts to help others be successful in their research endeavors.

## Patient accrual

In 2006, I gave a journal club presentation on how to use control charts to monitor the process of patient accrual in a clinical trial. Too many studies, I hate to say, fail to meet their sample size requirements because researchers grossly underestimated the amount of time it would take to recruit patients. I discussed this control chart in the context of a clinical trial, but it really applies to any prospective research study that collects data from human volunteers.

Byron Gajewski, one of the other faculty members at the journal club, suggested that this problem might be better solved with a Bayesian approach. That turned my research around 180 degrees but it was worth it. His suggestion led to a very profitable avenue of research for the two of us.

The beauty of a Bayesian approach is that it requires the specification of a prior distribution. The prior distribution represents what you know and what you don’t know about how quickly volunteers will show up on your doorstep asking to join your study. You can think of the prior distribution as a way to quantify your ignorance. You choose a very broad and variable prior distribution if you know very little about accrual patterns. This might be because you’re new to the area, you’re using a novel recruiting approach, and/or there’s very little experience of others that you can draw upon. You choose a very narrow and precise prior distribution if you’ve worked on this type of study many times before, your recruiting techniques are largely unchanged, and/or there’s lots of experience of others that you can draw upon. What you don’t do, even if you are very unsure about accrual, is to use a flat or non-informative prior. A flat prior would be like admitting “I don’t know: the study might take ten weeks or it might take ten years and I think both possibilities are equally likely.” Someone with that level of ignorance would be unqualified to conduct the research.

The very act of asking someone to produce a prior distribution will force them to think about accrual, and that in and of itself is a good thing. But the advantage of specifying a prior shows during the trial itself. As the trial progresses you get actual data on the accrual rate, and you can combine that with your prior distribution, as any good Bayesian would, to get an updated estimate of how long the trial will take. Here is where the precision of the prior distribution kicks in. If you have a broad prior with high variance, then even a little bit of bad news about accrual during the trial itself will lead to a drastic revision in your estimated time to complete the trial. You will act quickly, either by adding extra centers to a multi-center trial, hiring an extra research co-ordinator to beat the bushes for volunteers, or (if the news is bad enough) cutting your losses by ending the trial early for futility. If you have a narrow prior with low variance, then you’ve done this trial often enough that you don’t panic over a bit of bad news. If the data keeps coming in and it shows a much slower accrual rate than you expected, then you will eventually reach a point where you need to take action. But there’s a cost associated with a premature overreaction that a precise prior will protect you against.

Dr. Gajewski put one of his graduate student, Joyce Jiang, on the trail, and she contributed several additional publications after completing a successful dissertation defense of her extensions in this area. I worked very closely with Drs. Gajewski and Jiang, and found an interesting theoretical contribution to Bayesian data analysis that was hidden in their work.

One of the problems with getting researchers to produce a prior distribution is that they sometimes are wrong–spectacularly wrong. If you have a strong prior attached to a prior that is sharply at odds with the actual accrual data, you’d like to find a way to discount that prior distribution, but you’d like to keep that strong prior for the precision it gives you when the prior and the actual accrual data agreed with one another. They came up with a very clever solution. Attach a hyperprior to the precision of the prior distribution. If the accrual data and the prior are in sync, the precision stays high. But if there is a serious discrepancy between the accrual data and the prior, the hyperprior shifts and leads to a much weaker prior distribution.

I dubbed the method they proposed the hedging hyperprior, and suggested that it might work in other Bayesian settings as well. It turns out to be equivalent to the modified Power prior proposed by Yuyan Duan in 2006, but the formulation of the hedging hyperprior is both simpler and more intuitive. I have presented a simple example applying the hedging hyperprior to the beta-binomial model and am preparing a manuscript for publication.

The work that Dr. Jiang did on her dissertation was not just limited to the accrual problem but included an additional Bayesian application to a validation model using expert opinion. The strength of her work in these two areas led to her appointment to a post-doctoral fellowship at Yale University. Dr. Gajewski and I continue to collaborate with her on these models. We have four peer-reviewed publications and an R package so far, and plan to collaborate with other researchers in this area.

Closely related to my research on patient accrual is an effort to audit the records of Institutional Review Boards (IRBs). Too often, researchers fail to obtain the sample size that they promised in the original research protocol, mainly because subject recruitment takes longer than expected. It is very easy to compare the protocol submitted to the IRB to the final report. In the study of 135 submissions to one IRB, more than half of the researchers failed to reach their enrollment targets and the average shortfall was more than 50%. I have approached many other IRBs to ask to replicate this work, but none have shown any interest. But I plan to continue to ask anyone who works on an IRB to help me.

## Mining the electronic health record

In January 2016, I was offered the opportunity to work on a research grant funded by the Patient Centered Outcomes Research Institute.