| |
|||
|
|
|
Related Products: Society for Amateur Scientists
|
Sponsored by:
|
Design Your
Experiments Part VIII: Randomization and Other Issues by Kevin Kilty Recently an entrepeneur friend of mine asked me to examine a patent which the inventor, himself, was trying to market to him. The packet of materials on the patent contained reports of several engineering experiments done on the machine. One of these experiments was done very badly, indeed. The engineers who performed the bad experiments always performed them in the same order. They would power the machine and let it reach a stable operating condition. Then they would test the machine with its maximum rated air-flow, followed by a second test at its minimum rated air flow. Each time they found the machine was far more efficient at maximum rated air flow than at minimum air flow. Their advice to the inventor was to install a larger fan, as this would increase the machine's efficiency yet further. The inventor found this advice not only reasonable, but extremely encouraging, since it made him believe he could greatly improve his device. Unfortunately, by always performing tests in the same order, these engineers were unaware that the machine efficiency was largest always on the first run not because of the air-flow, but because it was the first run. It was apparent from their data that the first run always put the machine into a transient condition, and it was therefore out of equilibrium on the second run. If they had only followed a basic precept of experiment design--to randomize runs--they would have seen this clearly! Beware amateur scientists. Randomization I have mentioned randomizing in several previous installments of this series, but I am going to devote part of an entire installment to it here. No matter how well I may design an experiment and identify the controlling factors before hand, I still take a risk of there being important factors I haven't identified and will not control. Unless I take a precaution against them, a systematic variation of one of these factors might occur by chance and make my results appear as though one factor has an effect, when in fact it doesn't. The idea behind randomizing is very simple. Randomizing runs in an experiment will transform any systematic effects of an uncontrolled factor into random, experimental noise. I would always prefer to deal with slightly more experimental noise, than with bias, systematic errors, and false signals. What sorts of factors are troublesome? Let me take some time to describe common examples of uncontrolled, systematic factors, and then I'll describe some means of randomizing experiments.
Ways to randomize Here is a list of ways to randomize experimental runs. This is not a list of hard and fixed rules. Keep in mind that the purpose is to prevent systematic patterns which allow uncontrolled factors to produce seemingly meaningful signals.
When is something an experiment? Peter Baum recently sent me a link to an ask-a-scientist forum in which some parent was wondering about their child's science project. It seems the child had gotten a very poor grade on the science project because it did not contain an independent and dependent variable, which, according to the science fair judge, made it a non-science project. What the child had done was to survey the preferences of some children age 6 to 10 and tabulate the results. "Why," this parent asked, "is this not a science project?" This reminded me of a debate in the sciences at the present time over which sciences are most "scientific." People claim that some sciences (generally not their own, of course) are not scientific because they do not produce hypotheses that can be tested, they have no control over anything that could be called an experiment, and they cannot usually replicate anything at all. There are sciences which are truly experimental. These include physics, chemistry and biology. There are sciences which are historical--most notably geology, but also astronomy, cosmology, evolutionary biology, and so forth. Then there are sciences referred to as "soft." A few scientists in each of these categories like to bicker with one another. In the experimental sciences there is a successful model of how science advances. I hate to call it a scientific method, because that term is over-worked, misunderstood, and not at all clear to me. The model is something like this. Theorists propose a model to explain some phenomenon. This model makes certain predictions. Experimentalists take these predictions and try to find them in the outcome of carefully designed experiments. If a prediction does not hold true, or if some completely unexpected result obtains, then people conclude that the experiment has refuted the theory. This simple picture of science leaves many things unstated -- like the amount of trouble that experimenters go to in planning the experiment so they may answer all of the expected objections to their results. For example they have to have very good control of unaccounted for factors, they must quantify auxillary hypotheses (things they assume to be true and are usually unstated), and they have to present their metrology convincingly. However, once the experimentalists handle all these other aspects, then experimental science and its results are pretty clear. I may not be able to explain the scientific method, but I know it when I see it. What is amazing about the category of historical sciences is that it must include astronomy which is just about as experimental as any science gets. Despite their inability to prepare controlled experimental conditions, astronomers figure out how to make convincingly controlled experiments out of observations. The example of astromony shows that geologists and evolutionary biologists could generally follow the model of the experimental sciences if they really wished to do so (a few actually do). I personally think geology might be improved if geologists would quantify their theories, make predictions and then think about where to find observations to refute their theories. Instead they are always looking for confirming evidence. This leads to two things. They don't manage to put many unsuccessful theories to rest permanently, and their conclusions appear hedged and mushy. However, historical sciences are pretty darned successful despite not following the same model as experimental sciences. Maybe I'll let the geologists be geologists. The soft sciences too are very successful despite not being experimental. The soft scientists I have learned to respect are skeptical, thorough, and can spot a phoney result or flawed study immediately. Magicians, a group of people I think of as applied psychologists, spot scammers faster than could any physicist. Perhaps, like historical sciences, I'll let soft sciences be. However, I will make one observation about soft sciences I think is absolutely fair. Those soft scientists the public are most likely to encounter on a daily basis--the psychologist, sociologist, or economist giving expertise on TV--is often not promoting science, but rather pop-science. Evidence, truth, reason matter not to them at all. Instead, they promote certainty, authority, rationality, all mixed with their self interest and biases to a sickening degree. They give science a bad name. Each is, like Bill Murray's summary of Dr. Venckmann, the self-serving psychologist in Ghost Busters, "... a very bad scientist." Here is a list of things that people do to learn about the world of interest to them....
My view is that something is science if it contains the following important features...
When does computer simulation substitute for experimentation? Let me explain why I doubt that item (E) on the list of scientific activities is always scientific. The computer allows two very different modes of use. The first is as a type of calculator for problems much too tedious or complex to do by hand. I have no complaint with this. The second use is simulation. I have been a witness to simulation using the computer for more than 30 years now. Codes written to simulate very specific problems, where the physics is well understood, and the boundary and initial conditions needed to find a solution are clear, work just fine. I have no issue with numerical experiments like these. However, earth scientists turned to computer simulations to provide a controlled laboratory for experiments in unimaginably complex situations. Climate simulations are widely known; so, I'll concentrate on that topic for a moment. Whoever uses a computer code to simulate has to be careful about making two types of mistakes. First,the code running the computer may get the physics of something wrong, and we don't know this because we have nothing to compare against. Second, it is possible to get the physics correct in the code, and to successfully demonstrate correctness in simple test situations, but still obtain incorrect answers to other problems because of unusual boundary conditions or assumptions of use that no one has checked out. Climate simulation provides an opportunity for making both mistakes. The physics is fantastically complex. Boundary conditions are not simple, in some instances they aren't even known. The initial conditions aren't known. The codes don't even pretend to do the right thing sometimes. For example, interactions between the ocean and atmosphere, ground and atmosphere, the effects of thunderstorms and convection, are not solved from first principles but are "parameterized." This means that the simulation takes its intermediate results, uses them to approximate an empirical parameter to describe one of these interactions, and alters a coefficient or a source term in the simulation. This is so crude that as late as 1994 (Science 9 Sep. 1994, p. 1528) Global climate models still required large "fudge factors" to keep them from drifting off into unrealistic climate realms. The history of climate simulation through the 1990s shows that climate model programs have improved step-by-step as their masters became aware of one deficiency after another. However, even recognizing these deficiencies did not prevented people from drawing and publisizing conclusions using flawed simulation "experiments." One good example is the conclusion that temperature trends on Earth never last 100 years because the simulations do not exhibit them (Nature, v. 367, 17 Feb 1994, p 634-635). In fact, we have evidence of millenial scale trends. Using computer simulations in policy decisions raises the stakes of these "experiments" further. I attended an emergency meeting of the Wyoming State Engineer in 1984 where someone had done groundwater flow computer simulations so badly, they actually demonstrated the opposite of what the person claimed. A recent article in Science (11 Feb 2000 P.960) refers to this as a worry about "...semi-stupid programs placed in positions of responsibility." In complex problems the devil is in the details, and unless the details are absolutely correct, then experiments done with simulations tell us a lot about the program and not so much about the real thing. I'm off my soap box. The next
series of installments focuses on model building. Reprinted from:
|