Design Your Experiments: Randomization and Other Issues


			Society for Amateur Scientists Randomization and Other Issues

Sponsored by:

Design Your Experiments Part VIII: Randomization and Other Issues

by Kevin Kilty

Recently an entrepeneur friend of mine asked me to examine a patent which the inventor, himself, was trying to market to him. The packet of materials on the patent contained reports of several engineering experiments done on the machine. One of these experiments was done very badly, indeed.

The engineers who performed the bad experiments always performed them in the same order. They would power the machine and let it reach a stable operating condition. Then they would test the machine with its maximum rated air-flow, followed by a second test at its minimum rated air flow. Each time they found the machine was far more efficient at maximum rated air flow than at minimum air flow. Their advice to the inventor was to install a larger fan, as this would increase the machine's efficiency yet further. The inventor found this advice not only reasonable, but extremely encouraging, since it made him believe he could greatly improve his device.

Unfortunately, by always performing tests in the same order, these engineers were unaware that the machine efficiency was largest always on the first run not because of the air-flow, but because it was the first run. It was apparent from their data that the first run always put the machine into a transient condition, and it was therefore out of equilibrium on the second run. If they had only followed a basic precept of experiment design--to randomize runs--they would have seen this clearly! Beware amateur scientists.

Randomization

I have mentioned randomizing in several previous installments of this series, but I am going to devote part of an entire installment to it here. No matter how well I may design an experiment and identify the controlling factors before hand, I still take a risk of there being important factors I haven't identified and will not control. Unless I take a precaution against them, a systematic variation of one of these factors might occur by chance and make my results appear as though one factor has an effect, when in fact it doesn't.

The idea behind randomizing is very simple. Randomizing runs in an experiment will transform any systematic effects of an uncontrolled factor into random, experimental noise. I would always prefer to deal with slightly more experimental noise, than with bias, systematic errors, and false signals.

What sorts of factors are troublesome?

Let me take some time to describe common examples of uncontrolled, systematic factors, and then I'll describe some means of randomizing experiments.

Many processes involving machinery are not perfectly consistent, and aren't entirely random either. Instead, most machines run at a nominal level disturbed by noise which is correlated from one moment to another. For example, if a refrigerator has become slightly too cold at one point during the day, it will be slightly too cold still a few minutes later, this is true of heaters, furnaces, gas flow control, and so forth. Engineers refer to this behavior as autoregressive. The same thing occurs on a spatial scale--if one part of a refrigerator is slightly too cool, a nearby spot is likely to be so as well. Randomization in time and space will eliminate what systematic effects this correlation might have on an experiment.
Building conditions vary throughout a day, just as do outdoor conditions.
Workers are notoriously inconsistent from one crew to another. An experiment that involves one crew always using treatment A while another uses treatment B is very likely to find a difference between the treatments even if they are identical.
People are often biased against or for particular products, ideas, persons, materials, methods of work, companies and so forth. Men using sugar water are almost as likely to report new hair growth as men using Rogaine. Randomization will transform the conscious or unconscious effects of these biases into unbiased noise.
Plots of ground have gradients in moisture, fertility, cover from wind and so forth. The ideal locations in a field will always have better yields, and randomization will allow all treatments equal access to these ideal locations. Field trials of treatments are often complete block designs in which each block receives all treatments. However, even the tiny subplots within a block vary to some degree and the treatments should be randomized within each block. Agronomists have developed Latin squares as means to randomize field plots. A Latin square is an NxN patch of sub-plots in rows and columns, organized so that each row and each column contains each of the N treatments. For example, a Latin square suitable for 4 treatments is...
```
     ACDB
     DBCA
     CABD
     BDAC
```
An entire lot of materials used in an experiment can be defective, or just different, in some way that affects the outcome.

Ways to randomize

Here is a list of ways to randomize experimental runs. This is not a list of hard and fixed rules. Keep in mind that the purpose is to prevent systematic patterns which allow uncontrolled factors to produce seemingly meaningful signals.

Use the flip of a coin to assign experimental units in a paired block to receive one treatment or the other.
Use a table of random numbers, or a dice throw, to decide the order of treatments to apply during an experiment. Never perform experiments in an order that is merely "convenient." For example, if one of the factors involved is temperature, then do not vary the temperature systematically from one run to the next, but vary it randomly instead.
Some people advocate running experiments in a particular sequence such as XYYXXYYXXYYX. I would never run in such symmetric patterns, however. Use coin flips instead, or if the pattern must contain equal numbers of Xs and Ys, then at least break the symmetry of the sequence. You can follow rows of a Latin square to make a suitable sequence with no symmetry.
Use a table of random numbers to assign shelf space or refrigerator space to experimental units.
Use a Latin square to assign treatments to plots in a field, or experimental units to an array of instruments, machines, or work crews.
Use either random samples, or stratified random samples, to choose observational units. This is mainly an issue in studies that employ sampling to obtain their data, but it may also present problems when an observational unit is a small portion of an experimental unit.
Use coin flips to choose half an experimental unit repeatedly cut down by halves to produce an observational unit of appropriate size.
Be cautious about using random number generators in software to randomize since sequences generated using the same seed will always be the same. Certain seeds may cause some random number generators to produce short cycles of repeating or nearly repeating numbers.
Never use samples that choose themselves in some way. For example, be wary of volunteers, surveys mailed en-masse but returned voluntarily, experimental units that have survived an experimental trauma unrelated to a design treatment, and so forth.
Keep in mind that on rare occasions randomization will produce a very systematic pattern. Do not be reluctant to randomize a second time.

When is something an experiment?

Peter Baum recently sent me a link to an ask-a-scientist forum in which some parent was wondering about their child's science project. It seems the child had gotten a very poor grade on the science project because it did not contain an independent and dependent variable, which, according to the science fair judge, made it a non-science project. What the child had done was to survey the preferences of some children age 6 to 10 and tabulate the results. "Why," this parent asked, "is this not a science project?"

This reminded me of a debate in the sciences at the present time over which sciences are most "scientific." People claim that some sciences (generally not their own, of course) are not scientific because they do not produce hypotheses that can be tested, they have no control over anything that could be called an experiment, and they cannot usually replicate anything at all. There are sciences which are truly experimental. These include physics, chemistry and biology. There are sciences which are historical--most notably geology, but also astronomy, cosmology, evolutionary biology, and so forth. Then there are sciences referred to as "soft." A few scientists in each of these categories like to bicker with one another.

In the experimental sciences there is a successful model of how science advances. I hate to call it a scientific method, because that term is over-worked, misunderstood, and not at all clear to me. The model is something like this. Theorists propose a model to explain some phenomenon. This model makes certain predictions. Experimentalists take these predictions and try to find them in the outcome of carefully designed experiments. If a prediction does not hold true, or if some completely unexpected result obtains, then people conclude that the experiment has refuted the theory. This simple picture of science leaves many things unstated -- like the amount of trouble that experimenters go to in planning the experiment so they may answer all of the expected objections to their results. For example they have to have very good control of unaccounted for factors, they must quantify auxillary hypotheses (things they assume to be true and are usually unstated), and they have to present their metrology convincingly. However, once the experimentalists handle all these other aspects, then experimental science and its results are pretty clear. I may not be able to explain the scientific method, but I know it when I see it.

What is amazing about the category of historical sciences is that it must include astronomy which is just about as experimental as any science gets. Despite their inability to prepare controlled experimental conditions, astronomers figure out how to make convincingly controlled experiments out of observations. The example of astromony shows that geologists and evolutionary biologists could generally follow the model of the experimental sciences if they really wished to do so (a few actually do). I personally think geology might be improved if geologists would quantify their theories, make predictions and then think about where to find observations to refute their theories. Instead they are always looking for confirming evidence. This leads to two things. They don't manage to put many unsuccessful theories to rest permanently, and their conclusions appear hedged and mushy. However, historical sciences are pretty darned successful despite not following the same model as experimental sciences. Maybe I'll let the geologists be geologists.

The soft sciences too are very successful despite not being experimental. The soft scientists I have learned to respect are skeptical, thorough, and can spot a phoney result or flawed study immediately. Magicians, a group of people I think of as applied psychologists, spot scammers faster than could any physicist. Perhaps, like historical sciences, I'll let soft sciences be. However, I will make one observation about soft sciences I think is absolutely fair. Those soft scientists the public are most likely to encounter on a daily basis--the psychologist, sociologist, or economist giving expertise on TV--is often not promoting science, but rather pop-science. Evidence, truth, reason matter not to them at all. Instead, they promote certainty, authority, rationality, all mixed with their self interest and biases to a sickening degree. They give science a bad name. Each is, like Bill Murray's summary of Dr. Venckmann, the self-serving psychologist in Ghost Busters, "... a very bad scientist."

Here is a list of things that people do to learn about the world of interest to them....

A. A biologist monitors the metabolism of an insect many times under carefully controlled conditions, and notes the results.
B. A geologist observes a group of faults in a particular region and wonders what caused all of this a long time ago.
C. A mathematician proposes a model for how gravity works in the large scale universe.
D. A high school student surveys a group of little kids to learn their vegetable preferences.
E. A climatologist runs many computer simulations to learn about earth climate.
F. Astronomers study a black hole that is not only very far away, but is also very far back in time.
G. Engineers, guided only by hunches, run a series of factorial experiments to characterize a manufacturing process.
H. An oceanographer makes a record of ocean current in the Korea Straight and organizes the data in a table.

All of these represent some way of systematically studying the world around us. Which are scientific? I think anyone would agree that A, C, or F are scientific. Yet, C doesn't involve any experimental work at all, and F doesn't involve any control on the part of the scientists. G is done under control but there is maybe no guiding theory and the engineers wouldn't be testing any hypothesis even if there was a theory involved because they hate theory. In E, the climatologist is doing something akin to playing a video game. The oceanographer (H) is simply observing something of interest, but it may bear on some other scientific issue. And D got the high school student a very bad grade on a science project. I think every item on the list is science, except perhaps E. D is poorly done science, perhaps, but only because the student didn't have an experiment design in mind, and had no hypothesis to test.

My view is that something is science if it contains the following important features...

The activity is intended to learn something about how the world behaves rather than just come to terms with it. This separates science from philosophy or ethics or religion.
Nothing involved in organizing the activity preordains its results.
Its methods and conclusions are logical. They are rational but not to the point of being unreasonable.
The activity is directed toward a useful purpose such as testing a theory or preventing a problem.
There is a carefully thought out plan to the activity. In case someone would argue with the conclusion of the activity, the plan could serve to defend the conclusion.
There is an effort to make measurements and to estimate uncertainties involved and to identify the source of the uncertainties.
There is a concerted effort to control for confusing or extraneous influences.
The activity leads to predictions that could be tested if someone were to bother to take the expense and trouble to do so.
There is skepticism enough on the part of the participants to allow that the conclusions might be mistaken, wrong, or whatever, and, if correct, are possibly only provisionally correct.

When does computer simulation substitute for experimentation?

Let me explain why I doubt that item (E) on the list of scientific activities is always scientific. The computer allows two very different modes of use. The first is as a type of calculator for problems much too tedious or complex to do by hand. I have no complaint with this. The second use is simulation. I have been a witness to simulation using the computer for more than 30 years now. Codes written to simulate very specific problems, where the physics is well understood, and the boundary and initial conditions needed to find a solution are clear, work just fine. I have no issue with numerical experiments like these. However, earth scientists turned to computer simulations to provide a controlled laboratory for experiments in unimaginably complex situations. Climate simulations are widely known; so, I'll concentrate on that topic for a moment.

Whoever uses a computer code to simulate has to be careful about making two types of mistakes. First,the code running the computer may get the physics of something wrong, and we don't know this because we have nothing to compare against. Second, it is possible to get the physics correct in the code, and to successfully demonstrate correctness in simple test situations, but still obtain incorrect answers to other problems because of unusual boundary conditions or assumptions of use that no one has checked out. Climate simulation provides an opportunity for making both mistakes. The physics is fantastically complex. Boundary conditions are not simple, in some instances they aren't even known. The initial conditions aren't known. The codes don't even pretend to do the right thing sometimes. For example, interactions between the ocean and atmosphere, ground and atmosphere, the effects of thunderstorms and convection, are not solved from first principles but are "parameterized." This means that the simulation takes its intermediate results, uses them to approximate an empirical parameter to describe one of these interactions, and alters a coefficient or a source term in the simulation. This is so crude that as late as 1994 (Science 9 Sep. 1994, p. 1528) Global climate models still required large "fudge factors" to keep them from drifting off into unrealistic climate realms.

The history of climate simulation through the 1990s shows that climate model programs have improved step-by-step as their masters became aware of one deficiency after another. However, even recognizing these deficiencies did not prevented people from drawing and publisizing conclusions using flawed simulation "experiments." One good example is the conclusion that temperature trends on Earth never last 100 years because the simulations do not exhibit them (Nature, v. 367, 17 Feb 1994, p 634-635). In fact, we have evidence of millenial scale trends.

Using computer simulations in policy decisions raises the stakes of these "experiments" further. I attended an emergency meeting of the Wyoming State Engineer in 1984 where someone had done groundwater flow computer simulations so badly, they actually demonstrated the opposite of what the person claimed. A recent article in Science (11 Feb 2000 P.960) refers to this as a worry about "...semi-stupid programs placed in positions of responsibility."

In complex problems the devil is in the details, and unless the details are absolutely correct, then experiments done with simulations tell us a lot about the program and not so much about the real thing.

I'm off my soap box. The next series of installments focuses on model building.

Reprinted from: