When you make such a comparison between an exposure/treatment group and a control group, you want it to be a fair comparison. You want the control group to be identical to the exposure/treatment group in all respects, except for the exposure/treatment in question. You want an apples to apples comparison.
To insure that the researchers made an apples to apples comparison, ask the following three questions:
1.1 Did the authors use randomization?
1.2 Did the authors use matching?
1.3 Did the authors use statistical adjustments?
Vitamin C and Cancer
Paul Rosenbaum, in the first chapter of his book, Observational Studies, gives a fascinating example of an apples to oranges comparison. Cameron and Pauling published an observational study of Vitamin C as a treatment for advanced cancer. For each patient, ten matched controls were selected with the same age, gender, cancer site, and histological tumor type. Patients receiving Vitamin C survived four times longer than the controls (p < 0.0001).
Cameron and Pauling minimize the lack of randomization. "Even though no formal process of randomization was carried out in the selection of our two groups, we believe that they come close to representing random subpopulations of the population of terminal cancer patients in the Vale of Leven Hospital."
Ten years later, the Mayo Clinic conducted a randomized experiment which showed no statistically significant effect of Vitamin C. Why did the Camoeron and Pauling study differ from the Mayo study?
The first limitation of the Cameron and Pauling study was that all of their patients received Vitamin C and followed prospectively. The control group represented a retrospective chart review. You should be cautious about any comparison of prospective data to retrospective data.
But there was a more important issue. The treatment group represented patients newly diagnosed with terminal cancer. The control group was selected from death certificate records. So this was clearly an apples versus oranges comparison. It doesn't matter how bad the prognosis was for a patient diagnosed with terminal cancer; it can't be as bad as the prognosis of a patient who has a death certificate.
Surgical trial without controls
There's another story, unfortunately fictional, which also highlights the importance of a good comparison group.
A prominent surgeon came to give a special lecture at the School of Medicine. He expounded about the great advance that he had made in a specific surgical procedure. At the end of the lecture he drew thunderous applause from the audience.
At first it seemed like there would be no questions, but then a young student in the front row raised her hand. "Did you use any controls?" she asked.
The surgeon seemed to be offended by this question. "Controls?" he asked. "Are you suggesting that I should have denied my surgical advance to half of my patients?"
The rest of the audience grew very quiet. But the young woman was not intimidated. "Yes," she said, "that's exactly what I meant."
The surgeon grew even angrier at this, slammed his fist on the podium and shouted "Why that would have condemned half of my patients to certain death!"
There was silence for a few seconds. Then the entire auditorium burst out in laughter when the young woman asked "Which half?"
Covariate imbalance
If you want to judge how effective a new therapy is, you need a comparison group. The comparison group would be a group of subjects who receive either the standard therapy or, in some cases, no therapy (e.g., a placebo comparison).
The ideal comparison group should be similar in all respects to the new therapy group except for the therapy itself. For example, the two groups should have a similar range of ages and weights and should be composed of roughly the same proportions in gender and race/ethnicity. The groups should be evaluated concurrently.
Sometimes the groups are dissimilar on some important characteristics. This is known as covariate imbalance. Covariate imbalance is not an insurmountable problem, but it does make a study less authoritative.
In a yet to be published research study here at Children's Mercy Hospital, pre-term infants were randomized either to a group that received normal bottle feeding while they were in the hospital or to a nasogastric (ng) tube feeding group. The researchers wanted to see if the latter group of infants, because they had not become habituated to bottle feeding, would be more likely to breastfeed after discharge from the hospital.
The randomization was only partially effective at preventing covariate imbalance. The infants had comparable birth weights, gestational ages, and Apgar scores. There were similar proportions of caesarian section and vaginal births in both groups. But the mothers in the ng tube group were older on average than the mothers in the bottle fed group.
Since older mothers are more likely to breast feed than younger mothers, we had to include mother's age in an analysis of covariance model so that the effect of ng tube feeding could be estimated independent of mother's age.
Beware of situations where the two treatment groups are handled differently. An example of this would be the study of women who use oral contraceptives. These women visit a doctor at least every six months to get their prescriptions renewed. If these women are compared to a women who do not use oral contraceptives, then the former group will probably be evaluated by a doctor more frequently. An increase in the prevalence of certain diseases may actually reflect the fact these diseases are diagnosed earlier because of the frequency of hospital visits.
Similarly, if a certain drug is suspected to have certain side effects, doctor may question more closely those patients who are on that medication, creating a self-fulfilling prophecy.
1.1 Did the authors use randomization?
If the authors of the study decided who would get the new therapy and who would get the standard therapy, we have an experimental design. If the patient did the choosing, if the patient抯 doctor did the choosing, or if the groups were intact prior to the start of the research, then we have an observational design.
The distinction between experimental and observational designs is very critical. The greater control that is available in an experimental design generally leads to better quality results. In particular, an experimental designs allows the use of randomization.
Here are some examples of experimental designs and observational studies.
In Adkinson (1997), 121 children with moderate-to-severe asthma were "randomly assigned to receive subcutaneous injections of either a mixture of seven aeroallergen extracts or a placebo." Since the researchers generated the sequence of random assignment, this is an experimental design.
In Bullock (1989), "80 severe recidivist alcoholics received accupuncture either at points specific for the treatment of substance abuse (treatment group) or at nonspecific points (control group)." Since the researchers controlled the nature of the accupuncture, this is an experimental design.
In Cardo (1997), 33 health care workers who became seropositive to HIV after percutaneous exposure to HIV-infected blood were compared to 665 health care workers with similar exposure who did not become seropositive. Since the researchers did not control who became seropositive, this is an observational study.
In Hu (1997), 80,082 women between the ages of 34 and 59 years were followed for 14 years to look for instances of non-fatal myocardial infarction or death from coronary heart disease. These women were divided into low, intermediate, and high groups on the basis of their consumption of dietary fat. Since the women themselves controlled their diets, rather than having a diet imposed on them by the researchers, this represents an observational design.
Information from an experimental design is generally considered more authoritative than information from an observational design because the researchers can then use randomization. Randomization provides some level of assurance that the two groups are comparable in every way except for the therapy received.
Randomization requires the use of a random device, such as a coin flip or a table of random numbers. Systematic allocation (i.e., alternating between treatments) is not the same as randomization.
The simplest way to randomize is to layout the treatment schedule in a systematic (non-random) fashion, generate a random number for each value in the schedule and then sort the schedule by the random number.
Randomization insures that both measurable and unmeasurable factors are balanced out across both the standard and the new therapy, assuring a fair com


