PMean: Next stop - BMC Medical Informatics and Decision Making

Steve Simon


I’m working part-time on a research grant and I want to publish some of the work I’ve done on this grant. The title of the paper tentatively is “Validating elastic net generated electronic health record breast cancer phenotypes against hospital tumor registries: a case control study.” My co-authors are Dan Connolly and Russ Waitman. I want to summarize the history of the effort so far and why I am considering the BMC Medical Informatics and Decision Making as the next place to submit the article.

I started this work back in February 2016

  1. Based on the recommendation of my Russ Waitman

This paper didn’t make the cut

1: This paper develops a predictive regression model to determine if a patient has breast cancer based on structured data in the electronic health record. Training and test sets are membership in the breast cancer registry.

The paper is a promising approach to selecting patients with a disease with high confidence. However

The format of this paper is incorrect

Additional comments -

* The details of building the consensus model (i.e. run the regression on control and test sets and then remove the indicators from the control sets) should be included in the methods instead of the discussion.

* Please include more technical details in addition to the statistical models (e.g.

* Table 1 – this table seems unnecessary. There are only two exclusions and the paper only discusses the breast cancer cohort. Also, “belong to second group” is never defined.

* Table 2 – some examples of the metadata are helpful

Table 3 – The contents of this table is good but the formatting could use some work to highlight that these are elements in an ascending tree hierarchy. Also

* Why does “malignant neoplasm of upper outer quadrant of female breast” have an odds ratio of only 1.06? P

* Please include a figure with ROC curves. Including all of the consensus models including the final one would be very helpful. Also

* “HER”-> EHR

* Erroneous “I” at end

2: The research describes an approach to matching patients to clinical trials using EHR data. Clinical trial recruitment is an important problem

The methods section describes how models were built to predict which patients would be in a breast cancer tumor registry. The authors are using the registry as a proxy for which patients should be enrolled in the trial. But this seems like a poor indicator of who is eligible. The authors need to do a better job of defending the use of the registry. Why wouldn’t you just us a rules-based approach that compared the EHR data to the eligibility criteria for the clinical trial? Are the patients in the tumor registry only participating in one clinical trial? Maybe you could show that elastic net models are easier to build and just as predictive of a rules based model to justify your approach.

The description of how each of the consensus models was built was detailed

In the discussion section

You also state that “A physician who has to rely on memory to identify patients is likely to identify patients who are “memorable.”” This is not supported in your methods or results section. I think you just want to discuss how an algorithmic approach is potentially unbiased.

The paragraph that starts “There are two important extensions of this work.” describes exactly the approach that I believe your research should really take. At the very least

I think this research is very timely and useful and I would encourage you to address these issues to improve this paper.

3: The authors discuss an analysis that utilizes ICD9/ICD10 from their clinical data warehouse to predict whether a patient would eventually be diagnosed with breast cancer (and reported in the SEER registry). Codes were categorized into different events (e.g.

4: This is an interesting paper. Authors might also want to address insufficient granularity of EMR diagnoses for oncology trials

5: As noted by reviewers

That feedback arrived quickly (December 2016) and we decided to shoot for a journal publication. The length of the article was brief enough that we could submit it as a brief communication to the Journal of the American Medical Informatics Association. This is the same group that sponsored the San Francisco conference. The paper was sent out<U+FFFD> August 2017.

A month later

While it is important to gain new insights into inconsistency and incompleteness of information in the EHR

We decided to resubmit

But now I’m back on task. I got several good suggestions

Here’s what they say about themselves on their website:

BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design

This journal

The other interesting thing about BMC Medical Informatics and Decision Making is that they have an open peer review process. That means that the original manuscript and the peer review comments appear on the website in addition to the final manuscript (assuming it gets published, of course).

There are some minor reformatting issues

Wish me luck!