Design Your Experiments


			Society for Amateur Scientists Design Your Experiments

Sponsored by:

Design Your Experiments Part XII: Calibration

By Kevin Kilty

At some time or another anyone who runs experiments and makes measurements is faced with the need to calibrate something. Calibration is in all respects just another experiment.It involves a design that will control for all important factors, and it will also probably require the fitting of measured data to a model.

What are the requirments of calibration?

I must have a model of how the instrument behaves. This is a model of response versus input.
I must design an experiment to apply inputs to the instrument which also controls for spurious factors.
I'll determine the parameters or coefficients of the model by making measurements and fitting the model with regression. These are specifically measurements against a standard of some sort, which might be an artifact, a fixed point, or a calibrated sensor. I mentioned many possibilities in Part XI, which included using WWV for time and frequency calibration, using fixed points for temperature calibration, and using interferometry or gage blocks for distance calibration.
I must analyze the model to make certain it is adequate to the task. If it is not then I need to refine my model.

No matter what it is that I am calibrating, I will perform the calibration in one of two ways. I will either make a comparison against fixed points or standards--an absolute calibration; or, I will make a comparison against a calibrated sensor--a comparison calibration. The variety of problems involved in either is inexhaustable (or maybe exhausting) but description of two examples will help to illustrate some of the issues involved.

Example 1: Load cell calibration

I had metrology students build a load cell using strain gauges last year. I asked them to calibrate the cell, and it provides a good example.

First, let's look at the construction. We constructed the cell using a thin aluminum strip, clamped at one end and applying the calibrating force as a precision mass hung from its free end. The load cell is a cantilever. The sensor consists of two strain gauges placed symmetrically on the upper and lower faces of the cantilever, and wired as a half-bridge. Using two gauges like this automatically compensates for temperature variations (thermal emfs). This is a differential measurement. Elastic theory indicates that the strain we will measure is a linear function of suspended mass. Moreover logical consistency would demand that without any suspended mass the sensor should indicate zero strain. The calibration plan consists of suspending a series of precision masses from the cantilever, measuring the resulting strain, and then building a relationship between mass and strain using a least squares analysis.

What does this construction and calibration plan involve? First, there is the question of whether I plan to measure force or mass with this load cell. If I intend to measure mass, then I'll use the value of each precision mass directly. If, on the other hand, I intend to measure force, then I have to convert the precision mass to a precision force by multiplying by the local value of the acceleration of gravity. This value changes from place to place over a range of 0.5% of the nominal value, and I need a precise local value. Next there is the issue of the precision of the masses themselves. I obtained them from a set of balances, and verified their mass values using a recently calibrated electronic scale. I do not know how accurate the scale actually is. This scale had a resolution of 0.001g, but it had only 4 digits of display. Therefore, if a mass has a nominal value of 1 gram I will know its mass to a precision of 0.0005g; but, a mass with a nominal value of 2000g I will know the mass of to a precision of only 0.5g. If I were to aim for greater precision I'd have to be concerned about how to handle the weights without contaminating them and altering their masses. Luckily in this instance I didn't have to be concerned about tricky issues of technique. We simply added masses with foreceps.

I had students do the calibration experiment itself in one run. They began by nulling the bridge (zeroing the gauges), and then adding successively larger masses and noting the strain. After each mass was added we waited momentarily for the cantilever to stop bouncing, and for the measuring system to equilibrate. In order to maintain some sense of randomizing the experiment, I had students step backward from maximum mass toward zero using completely different mass values than they used on the way to the maximum. At zero mass it was important to make certain that the strain reading was also zero. Otherwise, if the gauge did not return to zero, it would be evidence that the aluminum had yielded plastically during the experiment, which would spoil the entire effort. The resulting data, in order of measurement, is...

Table of calibration data
------------------------------
Mass     Strain (Microstrains)
------------------------------
   0             0
 147.5         426
 295           853
 726          2103
 442.5        1277
 179           524
   0             0
------------------------------

Interestingly the compliance of the beam was very different from what I expected from my theoretical calculations, and I concluded that the aluminum had probably strain hardened during its being fashioned into a thin beam, probably from the shears used to cut the aluminum.

The simplest model of response of the beam to loading is also the one suggested by theory. The beam strain is proportional to load.

Y = a + bL

I'll determine the coefficients a, and b using Excel. The output of a regression I show below. The coefficient a should ideally be zero.

Analysis of Variance 
----------------------------------------------------------- 
Source      df      SS     MS       F      Significance F  
===========================================================
Regression   1   3399305 3399305 323155.7    3.2E-13  
Residual     5     52.59  10.52  
-----------------------------------------------------------
Total        6   3399358        
 
Coefficients         Standard Error     t-Stat    P-value   Lower 95%   Upper 95%   
---------------------------------------------------------------------------------
Intercept 0.483        1.788021        0.27      0.797957    -4.11       5.08  
Load      2.894        0.00509       568.47      3.2E-13      2.88       2.91   
---------------------------------------------------------------------------------
  
RESIDUAL OUTPUT  
 
Observation    Predicted Y     Residual     Standard Residual
------------------------------------------------------------- 
    1          0.482748       -0.48275        -0.16305 
    2        427.2951         -1.29507        -0.43742 
    3        854.1074         -1.1074         -0.37403 
    4       2101.267           1.732518        0.585166 
    5       1280.92           -3.91972        -1.32391 
    6        518.4448          5.555176        1.876287 
    7          0.482748       -0.48275        -0.16305 
-------------------------------------------------------------

This shows that my model explains an overwhelming amount of the variation in the data. An F statistic probability of 3.2x10^-13 is extremely significant. The next issue to address is whether or not the intercept coefficient is needed. An elastic physical model should not contain an intercept. After all, with perfect elasticity in the aluminum, when there is zero mass there is zero strain. The ANOVA shows that the intercept is not only small, but has a p-value near 1.0, which indicates it is insignificant. Notice that the 95% confidence interval for the intercept brackets a zero value. Therefore, I am going to use zero for an intercept, and I should force Excel in all future analyses of this data to a zero intercept in the regression.The final issue to address is whether my model is adequate or if I might do better. I'll decide this using a graph of residual versus load, and another of residual versus order of measurement.

I show a graph showing residuals as a function of mass below. Notice that it suggests that I might be able to add a term to my model and reduce the residuals slightly because there is a slight pattern to the residual.

However, a graph showing residual as a function of the order of the measurement shows something quite interesting. All of the measurements taken up to the maximum load have small residual, but the data taken after maximum load have much larger residual. This is not a likely pattern of residual, and it suggests that my errors do not meet the requirements of being independent and random. I have no explanation of what happened to the beam after maximum load. There was no plastic yielding because the beam returned to zero strain when it was unloaded. It is possible that the student didn't let the instrument equilibrate fully after dropping the load to the next lower mass. Yet this doesn't explain the change of sign between the 5th and 6th residual.

I am not optimistic about improving on my model after analyzing the residual, but if I were to improve it I would need to add an additional term involving the load. A statistician would likely add a term in load squared, which is the next higher polynomial term. What does the physics of the problem tell me? A reasonable model would have to involve only terms that are odd in the load. The reason for this is that if I pull up on the beam, or add negative load in other words, the strain has to reverse sign. A term in load squared will not behave this way, but a term in load cubed will. Therefore, the next term I should consider adding to my model is load cubed. The revised model is

Y = bL(1+cL²)

I am not optimistic about success, but I build a new column of load cubed, and input this to my Excel regression analysis. The data look like...

Calibration Data
Mass    Mass Cubed    Microstrain
--------------------------------- 
  0           0             0  
147.5   3209047           426  
295    25672375           853  
726    3.83E+08          2103  
442.5  86644266          1277  
179     5735339           524  
0             0             0
---------------------------------

I'll not flog this dead horse much further, but just show the important part of the Excel output, which is the analysis of variance.

Analysis of Variance 
----------------------------------------------------------- 
Source      df      SS     MS       F      Significance F  
===========================================================
Regression   2   3399310 1699655 141213.5    2.01E-10  
Residual     4   48.144  12.04 
=========================================================== 
Total        6   3399358        
 
Coefficients         Standard Error     t-Stat    P-value   Lower 95%   Upper 95%   
---------------------------------------------------------------------------------
Load       2.886969    0.01225         235.6629   1.95E-09   2.852957   2.920982  
Load cubed 1.38E-08   2.27E-08         0.608132   0.575938   -4.9E-08   7.68E-08
---------------------------------------------------------------------------------

As I expected, adding another term helped not at all as the reduction in F statistic and increase in its probability show. In fact, the mean square (MS) has risen because adding this term decreased the degrees of freedom available to estimate noise. The coefficient for the cubic term is not only very small, you will notice that the 95% confidence interval for its value bracket a value of zero. We may as well remove it from the model.

I now have the best possible model of reponse of the load cell, and I can solve it for load as a function of strain, which is what I will use ordinarily.

Load = MicroStrain/2.894

The uncertainty I can find from propagation of error and the results of regression. The coefficient of 0.003 is what Excel gave me for standard error of the linear term after I force the coefficient to zero.


u_Load/Load = u_b/b = ±.003/2.891 = ±0.11%

It is not a wonderfully precise instrument, but I can't do any better until I can explain what happened to the 5th and 6th residuals and rectify the problem.

One final area of concern takes me back to the issue of the precision masses. Regression analysis is nearly always done by assuming that there is no uncertainty in the independent variable, but that only the dependent variable has any added noise or error. If this is not true, if there is uncertainty in the independent variable, then the analysis might become much more difficult. Obviously there is some uncertainty in the value of the precision masses in this case. Unfortunately the issues involved would take up a huge portion of several installments, and I'll bypass them for now; but do beware of them.

Example 2: Temperature calibration

Bob Bond had a really nice article on the Platinum Thermometer in the SAS Bulletin, and you may refer to his work for useful information, including links to Scientific American articles. My example shows a common method of calibration known as comparison calibration. Temperature calibration has one great advantage over mass and force calibration. The standards do not depend on artifacts like the precision masses, but rather depend on physical processes which anyone can duplicate anywhere, with proper attention to detail. I calibrate a thermometer by measuring its output when I hold it at the temperature of some well-established fixed points. I then build a model of thermometer behavior from this data. For example, The Callendar-Van Dusen model, having 3 coefficients plus base resistance, is a common model for platinum resistance thermometers. It expresses thermometer resistance as a function of temperature, and looks like

R = R₀(1 + AT + BT² + C(T-100)T³) for (-200<T<0)
or
R = R₀(1 + AT + BT²) for (0<T<630)

where R is resistance (Ohms), T is temperature (degrees Celsius), A, B, and C are coefficients determined through the calibration measurements.

A model used often for thermistors is 1/T = A + B(LnR) + C(LnR)². More information about this is available in the Handbook of Modern Sensors by Jacob Fraden, AIP Press, 1997. The following table lists several fixed point temperatures of interest to amateurs. Amateurs who are interested in very low temperatures would need colder fixed points of course.

Table of fixed points
---------------------------------------
Fixed point             Temperature (C)
---------------------------------------
Melting of mercury              -38.836
Triple point of water             0.01
Boiling of water                100.00
Triple point of benzoic acid    127.37
Melting of tin                  231.968
---------------------------------------

These cover a broad range of commonly encountered temperature. The melting points of mercury and tin, involving as they do metals, require only a rough measurement of pressure, which is a big advantage of using metallic fixed points. SAS sells an apparatus for the water triple point which is very accurate. I suggest that this same apparatus could be adapted easily to the benzoic acid triple point. The most troublesome fixed point on this short list is the boiling point of water. This fixed point depends substantially on pressure and requires an accurate measurement of, and correction for, atmospheric pressure at the time of measurement. The lab technique varies from one fixed point to another, and you can get a sense of how to produce each fixed point by reading publications of the NIST. Most of the procedures are a bother and a person would hate to get involved in them too often. Therefore, a method of stratified calibration is useful. The strategy I would follow mixes absolute calibration with a comparison calibration and goes like this.

Once per year or so, I would calibrate three thermometers having a construction similar to that of the thermometers which I frequently calibrate. These become my three standard thermometers. I would calibrate them as accurately as possible according to NIST guidelines using fixed points. Then whenever I needed to calibrate some other thermometers, I would use the fixed point apparatus to achieve temperatures near the fixed points to compare the readings of my calibratee against one of my standard thermometers. It is much less trouble for me to use the fixed point apparatus to merely get close to a fixed point, than to get right on the fixed point.

The reason for using three standard thermometers is to resolve any failure of one. I would use a majority reading of 2 of the 3 to resolve a discrepancy in reading between just two of them. All that comparison calibration requires is reasonable control of temperature, a direct comparison against standards, and fitting model coefficients by least squares.

Even though comparison calibration is much simpler than fixed point calibration, there are yet many factors to control. For instance, I need to eliminate any factor causing a gradient of temperature of more than 0.001K across the apparatus because this may cause my thermometers to reach different temperatures. The thermometers should be of similar construction so that they have similar bandwidth and reach the same temperature in the same amount of time. Temperature sensors which require a resistance measurement, like thermistors and RTDs, will self-heat because resistance measurement uses a current passing through the device. To prevent self heating from providing a biased calibration, I must use a very small current for measurement. Bob Bond suggested a means of estimating this. I can eliminate thermal EMF's in the measuring circuit by making a second measurement after reversing current and averaging the two measurements.

Before I quit this installment I should make one final observation. The temperature sensors used measurement equations that involved quadratic and cubic terms. All of these equations have to be inverted in use, because we are not interested in resistance of the sensor as a function of temperature, but rather temperature as a function of resistance. A measurement equation, when it is inverted, will have two possible temperatures corresponding to a given resistance, or in other words, the equation has several branches. Calibration is not finished until you figure out which branch of the calibration equation to use. The problem can become much worse for cubic and higher-order model equations.

Reprinted from: