StATS: A novel diagnostic test (January 26, 2006)

A recently published article on diagnosing cancer got a lot of press. The article

noted that canines have an unusually sensitive sense of smell and might be able to diagnose cancer by sniffing breath sample from human patients. This is rather intriguing, since dogs have already been trained to locate explosives, cadavers, drugs, and so forth.

The researchers collected breath samples from 55 patients with lung cancer, 31 patients with breast cancer, and 83 volunteers with no prior cancer history.

Eligible patients were men and women older than 18 years with a very recent biopsy-confirmed conventional diagnosis of lung or breast cancer. We specifically requested that recruitment centers refer patients as soon as possible following definitive diagnosis so that breath sampling would not interfere with or delay planned conventional treatment. As we suspected that chemotherapy treatment would change the exhaled chemicals in cancer patients, we sought patients who had not yet undergone chemotherapy treatment. As we also suspected that patients with more advanced disease, and thus larger tumors, might be exhaling higher concentrations of the chemicals associated with cancer cells and would therefore be more easily identified by the dogs, we sought patients with any stage disease.

The collection of breath samples was quite simple.

For breath sampling, we obtained a cylindrical polypropylene organic vapor sampling tube (Defencetek, Pretoria, South Africa). Each tube is open at either end, is 6 inches long, has an outer diameter of 1 inch, has an inner diameter of 0.75 inches, and has removable end caps. A removable 2-inch-long insert of silicone oil-coated polypropylene “wool” captures volatile organic compounds in exhaled breath as breath passes through the tube. To collect breath samples, we asked donors to exhale 3 to 5 times through the tube. We then fitted the tubes with their end caps and sealed them in ordinary grocery store Ziplock-style bags at room temperature between the time of breath sampling and presentation to the dogs.

Each patient and control contributed multiple breath samples to the study, ranging from 4 to 18 samples per person.

The dogs had to be trained to recognize cancer samples, and in the training sessions, the trainer had to be unblinded to the location of the cancer sample, so they could reward the dogs when they identified the cancer samples correctly. The dogs were trained to indicate a positive result by sitting down by the canister that had the cancer breath sample.

During phase 1 of training, the location of the cancer breath sample was known by both experimenter and trainer (Table 2). One station contained a cancer breath sample, and the remaining 4 stations contained blank sample tubes that had not been used in any breath sampling. To encourage the dogs to seek out the exhaled chemicals associated with cancer, we placed a piece of dog food in the station with the cancer breath sample and covered the container with a piece of paper so the food would not be visible.

The second phase of training still used four blank canisters and food rewards in the cancer breath sample canister.

During phase 2 of training, only the experimenter was aware of the location of the cancer breath sample and apart from encouraging the dog with encouraging phrases such as “go to work,” gave no “sit” or other verbal commands to the dog. Clicker signal by the experimenter and subsequent food reward and praise by the trainer were given only after the dog correctly indicated on the cancer breath sample. When the dog indicated incorrectly on a control, the experimenter would not signal with the clicker and the handler would remain silent, not give the dog any praise reward, and mildly rebuke the dog by saying “no.” Samples used in phases 1 and 2 (contaminated with food scent) were not used again.

The third phase of training was similar to the second, except there were no food rewards in the canister with the cancer breath sample. After the dogs had performed sufficiently well during the training session, they were evaluated in a single blind phase.

During the single-blinded canine scent-testing experiment, using samples previously used in phase 3 of training, the level of challenge to the dogs was increased by placing a cancer breath sample in 1 station and control subject breath samples in the remaining 4 stations. Thus, dogs now had to distinguish cancer patient breath samples from those of healthy controls. Furthermore, the handler was blinded to the location and status of patient and control breath samples. Although the experimenter did not know the location and status of patient and control breath samples during the single-blinded experiments, the possibility of the experimenter giving the dogs cues was minimized by positioning the experimenter in an adjacent room, behind an opaque curtain that almost completely covered the doorway between the training and observation rooms.

This was followed by a double blind phase, the phase used to evaluate sensitivity and specificity.

We designed our double-blinded experiment so that each dog would have the opportunity to sniff breath samples from each subject and each control. During the entire double-blinded testing phase, all breath samples sniffed by dogs, for both cases and controls, were from completely different subjects not previously encountered by the dogs during training or single-blinded testing. Furthermore, all of these breath samples used during double-blinded testing, for both cases and controls, contribute to the overall results reported in Table 3. For each trial, we used a random number table to determine the location of the sample being tested in the lineup.

All other methods were identical to the single-blinded testing phase, except that we now (1) placed the target breath sample of interest, whether from patient or control, within the lineup along with 4 other controls and (2) blinded both the experimenters and dog handlers to the status of that target sample in the lineup. Whereas in the single-blinded experiments only the dog handler was blinded to knowledge of the target sample, in the double-blinded experiments, both handler and experimenter were blinded to ensure that neither experimenters nor handlers could be giving any clues to the dogs. Since the experimenters now no longer knew the status of the target breath sample, they did not activate the clicker device after a sitting indication by the dog, and therefore the handler did not reward the dog with any food. After being given the opportunity to sniff and indicate on samples, the dog was simply led out of the room. Only after leaving the training room was the dog acknowledged with the phrase “good work!” During double-blinded testing, each tube was used a median of 20 times (x = 32.35, SD = 24.46; range, 4-99).

Blinding is very important in a trial like this because of the "Clever Hans" effect, which is the ability of animals to pick up subtle and even subconscious nonverbal cues from the people around them.


[Excerpt] Clever Hans phenomenon: A form of involuntary and unconscious cuing. The term refers to a horse (Kluge Hans, referred to in the literature as "Clever Hans") who responded to questions requiring mathematical calculations by tapping his hoof. If asked by his master, William Von Osten, what is the sum of 3 plus 2, the horse would tap his hoof five times. It appeared the animal was responding to human language and was capable of grasping mathematical concepts. It was 1891 when Von Osten began showing Hans to the public. (Hans could also tell time and name people,* but we will restrict our discussion of his amazing abilities to his mathematical skills.) It was eventually discovered (in 1904) by Oskar Pfungst that the horse was responding to subtle physical cues (ideomotor reaction) or as Ray Hyman puts it "Hans was responding to a simple, involuntary postural adjustment by the questioner, which was his cue to start tapping, and an unconscious, almost imperceptible head movement, which was his cue to stop" (Hyman 1989: 425).


In the trials involving lung cancer patients, 708 of the 712 control canisters were properly identified, and 564 of the 574 cancer canisters were identified. In the trials involving breast cancer patients, 260 of the 275 control canisters were properly identified and 110 of the 116 cancer canisters were identified.

It is unclear how these results were tabulated. One possible method would be the following: If the dog did not sit down at any canister, and the fifth canister was a control breath sample, that trial was labeled a true negative. If the dog sat down at one of the four control canisters or hesitated, that trial was labeled a false positive or false negative depending on the contents of the fifth canister.

Another interpretations would be that if the dog sat down at any control canister, that was considered a false positive for that canister and failure to sit down at any control canister was considered a true positive.

The wording of the paper seems to favor the latter interpretation

The dogs’ response to each of the 5 samples sniffed was included in our analysis; dogs were allowed the opportunity to visit each sample station and thus could have potentially indicated every one of the samples in a trial, although in our experiments, this never occurred. Dog handlers did not try to prevent dogs from visiting any individual station. Therefore, since each individual sample station was considered as a unit of analysis, the use of 4 control subject breath samples along with a cancer patient sample in each experimental trial would not change sensitivity or specificity.

On the other hands the number of control samples during the double blind phase was 987 compared to 690 cancer samples, and it is hard to reconcile these numbers with the fact that at least four control samples were tested in each trial. The ratio of controls to cancers should be at least five to one and probably closer to ten to one.

Because of the number of tests performed, individual patients were used multiple times in the study and even individual breathing tubes were re-used many times.

During double-blinded testing, each tube was used a median of 20 times (x = 32.35, SD = 24.46; range, 4-99).

To account for this, the researchers used "general estimating equations (GEE) random effects linear regression, with standard errors adjusted for clustering on donor." The researchers re-analyzed the data including only the first dog-donor combination in each trial of the double blind phase, and found comparable results.

The GEE estimates were also adjusted for current smoking status since there was more smoking among the lung cancer volunteers than the control volunteers.

This research used a case-control to estimate sensitivity and specificity, which is acceptable for a "proof of concept" study, but the authors do discuss the problem of spectrum bias in this research.

However, our specificity may be overestimated because we used only healthy controls (rather than a broad spectrum of subjects that included, for example, those with bronchitis or emphysema as controls for lung cancer or those with fibrocystic breast disease or mastitis as controls for breast cancer). These questions could be better understood by further study in a prospective cohort design that included both cases and controls representing the full spectrum of disease severity seen in the general population.

There are additional limitations to this research which the authors discuss at the end of the article.

I will include this discussion in the Chance Wiki when I get the time.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Diagnostic testing.