Randomization is not always possible or practical. When this is the case, we have to rely on observational data to draw any conclusions. But when randomization is possible, its use makes a research study more authoritative.
Although I do not have a bibliographic citation for this example, I heard an amusing story about a study of water toxicants on fish.
This research required that the fish be separated into five tanks, each of which would get a different level of the toxicant. The researchers caught one fifth of the fish and put then in one tank, then an additional one fifth and put them in a second tank and so forth. The outcome measurements were related not to the dosage, but to the order in which the tanks were filled, with the worst outcomes being in the first tank filled. and the best outcomes in the last tank filled.
What happened was that the slow-moving, easy-to-catch fish were all allocated to the first tank. The fast-moving, hard-to-catch fish ended up in the last tank. It turned out that the sicker fish were also the slow-moving, easy-to-catch fish, the healthiest fish swam faster and avoided early capture.
A better way to design this experiment was to allocate the fish into tanks randomly. This would insure that each tank got a fair share of the fast-and-healthy and the slow-and-sick fish.
Studies without randomization often require either matching or statistical adjustments. While both matching and adjustments can help to some extent with covariate imbalance, these approaches do not work as well as randomization. In particular, some of the covariate imbalance may be due to factors that are difficult to measure. For example, patients may differ
Nevertheless, much can be learned from non-randomized. Almost everything we know about the risks of cigarette smoking came from observational designs (Gail 1996).
An editorial in the Journal of the American Medical Association (Sherwin 1997) tries to make sense of recent studies of the effect of dietary fat on obesity, heart disease, and stroke. After reviewing the results of numerous studies, the editorial comments:
"At present, most of this evidence in humans is observational and, consequently, an imperfect basis for causal inference. Large scale experimental studies that would provide more compelling data (such as the Women's Health Initiative) cost hundreds of millions of dollars and take decades to complete. Each study can only address the effects of a single nutritional change. Thus, it is still necessary to base advice to patients on dietary information that is less than certain and complete."
Randomized studies do have some weaknesses. These studies typically rely on the use of volunteers in a narrowly defined research setting. Such situations may not be reflective of how a typical patient behaves in a typical health care setting (Sackett 1997). In this particular aspect, a carefully planned observational design may provide a more relevant comparison.
Another problem with randomized designs is the limit to their size and scope. These limits may make it difficult to detect rare but important side effects. An observational approach like post marketing surveillance is more likely to be successful in these situations.
Studies of the potential harm caused by environmental exposures (such as lead based paint, second hand tobacco smoke, or electro-magnetic fields) are often impossible to randomize because of logistical and ethical issues.
These exceptions, however, do not diminish the value of experimental designs. In situations where observational and experimental studies can both be conducted, most researchers will give greater weight to the evidence in an experimental study.
Did the authors use matching?
Matching is the systematic selection, for every subject in the treatment/exposure group, of control subject with similar characteristics. For example, in a study of fetal exposure to cocaine, you might select infants born to a mother who abused cocaine during pregnacy. For every such infant, you would select a infant unexposed to cocaine in utero, but also who had the same sex, race, and socio-economic status.
Matching will prevent covariate imbalance for those variables used in matching. It will also reduce covariate imbalance for any variables closely related to the matching variables. It will not, however, protect against all covariate imbalance, especially for those covariates that are difficult to measure.
Matching often presents difficult logistical issues, because a matching control subject may not always be available. The logistics are especially difficult when there are several matching variables and when the pool of control subjects that you can draw from is not substantially larger than the pool of treatment/exposed subjects.
Matching is usually reserved for those variables that are known to be highly predictive of the outcome measure. In a cancer study, for example, matching is usually done on smoking. Many neonatology studies will match on gestational age.
Matching in a randomized design
In some randomized studies, matching will be used as well. Partly, this is a recognition that randomization will not totally remove covariate imbalance, just like a flip of 100 coins will not always result in exactly 50 heads and 50 tails.
More importantly, however, matching in a randomized study will provide extra precision. Matching creates pairs of subjects who will have greater homogeneity and therefore less variability.
The crossover design
The crossover design represents a special type of matching. In a crossover design, a subject is randomly assigned to a specific treatment order. Some subjects will receive the standard therapy first, followed by the new therapy (AB). Others will receive the new therapy first, followed by the standard therapy (BA).
Since the same subject receives both treatments, there is no possibility of covariate imbalance.
When therapies are applied in sequence, timing effects are of great concern. Are the therapies set far apart enough so that the effect of one therapy is unlikely to carryover into the other therapy? For example, if the two therapies represent different drugs, did the researchers allow enough time so that one drug was fully eliminated from the body before they administered the second drug?
The possibility of learning and fatigue effects are also potential problems in a crossover design.
Special problems arise when each subject receives the standard therapy first and then the new therapy (or vice versa). Many factors other than the change in therapy can cause a shift in the health of patients over time. Unless the researchers can point to other evidence that shows stability of the condition over time, information from this type of study is worthless.
Sometimes difficult circumstances (such as a general failure to respond to the standard therapy) will force the use of this type of design. Further discussion of lack of randomization or other issues with crossover designs can be found in Louis (1992).
Concurrent controls versus historical controls.
Sometimes researchers will assign all of the research subjects to the new therapy. The outcomes of these subjects are compared to historical records representing the standard therapy. This type of study is sometimes called a historical controls study. The very nature of a historical controls study guarantees that there will be a major discrepancy in timing. Thus, you have to consider any factors that have changed over time that might be related to the outcome. To what extent might these factors affect the outcome differentially?
1.3 Did the authors use statistical adjustments?
Statistical adjustments represent one way of correcting for covariate imbalance. There are several ways to make statistical adjustments.
First, there are direct adjustments, such as using a per capita rate. This adjustment occurs most frequently, when the outcome measure is some type of count, such as the number of infections, number of medication errors, or number of traffic deaths. This adjustment simply divides the outcome measure by some variable which measures volume of activity in the process that produced the count. It might represent the number of patients (or patient days) at risk in a study of infections. It might represent the number of medications dispensed in a study of medication errors. It might represent the number of people (or the total number of passenger miles) in a county for a study of county-wide traffic deaths.
Second there are regression adjustments. [Discuss]
Third, there are weighting adjustments. [Discuss]
[Discuss the problems of non-overlap]
[Discuss the imperfect nature of these adjustments, especially when the adjustment variable is imperfectly measured.]
[Re-iterate the problem with difficult to measure covariates.]
Summary - Who did the choosing?


