Hornworm Experiment Statistical Errors

Submitted to American Biology Teacher

Hornworm Experiment Statistical Errors

Rice (2004) said he did not give statistics in an American Biology Teacher article because "... most classes taught by readers of this journal would not be prepared for these analyses." Yet he contradicted himself in the same issue because Rice and Griffin (2004) encouraged students to statistically analyze data. Unfortunately, Rice and Griffin (2004) provided many incorrect examples of data analysis and presentation as follows:

1. It is incorrect to use equation-derived means in the two tables rather than the experimental means. Data were taken on days 12 and 15, so it made no sense to extrapolate to day 13 for the tables. The two types of means differed substantially. The mean maximum control masses were 7.39 g by experiment and 8.18 g by equation. These differences indicated that the equations fitted poorly. The absurdity of using equation-derived means rather than experimental means is illustrated by the experimental control mean of 7.39 g falling outside the equation control mean's 95% confidence interval of 7.58 to 8.77 in Table 1. The conclusion on page 490 that Table 1 found significant differences among all three treatments appears invalid. According to page 489, the only statistical difference was between the control and the late season leaf treatment.

From Figure 1, the 15 points on day 12 for the early season leaf treatment were roughly as follows: 9.4, 8.9, 8.5, 8.1, 7.3, 4.0, 3.9, 3.4, 2.6, 2.3, 2.1, 1.9, 1.6, 1.1, 1.0. A web statistics calculator indicated that the mean was 4.4 g and the 95% confidence interval was 2.7 to 6.1 g. That is much wider than the 4.49 to 5.15 g in Table 1. It is another illustration that the equation-derived data are unrealistic compared to the experimental data.

2. Figure 1 lacked the required control data, and the two treatments graphed were not statistically different with p = 0.751 according to page 489. Figure 1 should not have contained raw experimental data. It would have been more desirable to present the data with statistics in a table, which would have eliminated the problems caused by the misuse of fitted curves.

If a graph was used, the treatment means should have been plotted. Each mean should have had a bar indicating the 95% confidence interval or other statistic. Readers could then have easily seen if treatment means were significantly different and how close any fitted curves approached the experimental means.

3. It is not necessarily correct that a "t-test does not take differences in initial worm weight into account." It can when a t-test is used before treatments begin. If significant differences are found among groups before treatments begin, worms can be regrouped until there are no significant differences.

When experimental organisms vary widely in size and treatment groups are relatively small, it is logical to visually match them at the start so that each treatment group has roughly the same size distribution. For example, each group might have four large, five medium and three small individuals. This technique can be used even when it is not possible to determine initial masses (Hershey and Merritt 1987). This matching method could even be applied retroactively in the worm experiment given that there was an initial mass for each worm.

4. Table 1 has the upper and lower 95% confidence limits reversed. The upper 95% confidence limit must be greater than the lower 95% confidence limit.

5. One of the most serious problems with the data was the high worm mortality yet it was not discussed. If many of the worms were sick or dying from nontreatment effects, the experimental results may be invalid, particularly if more worms died in some treatments than others. That may be the case judging from the number of data points in Figure 1. For day 12, there appeared to be 15 early season points but 20 late season points. Was the mortality the same in all treatments? What was the overall mortality for the experiment given that it was a whopping 23% after just 5 days?

6. Tables 1 and 2 made the common error of using too many significant figures. The text used three significant figures for experimental data on worm masses, e.g. 7.39 g. If the worm masses were measured to three significant figures, a fitted equation cannot be used to estimate masses to four significant figures.

7. Figure 1 made the common error of extrapolating beyond the data. The fitted curves should not have extended beyond the final data on day 15. The curve's sharp drop after day 15 could not be confirmed. It is not realistic biologically that worm mass would drop to zero by day 21 for the late season treatment. Worm mass is expected to flatten out when the population reaches maturity.

8. It is standard to mention n = x in graphs or data tables where x equals the number of replicates per treatment (Harris et al. 2004). The experiment started with 146 worms but n was not specified. The exact number of treatments seemed to be five. Five does not divide evenly into 146.

In closing, it is extremely important to make students aware that a significant difference does not necessarily confirm that an experimental method is valid as Rice (2004) argued, i.e. "If the method was invalid, such a distinction would not have emerged." It is logical that any significant difference, or nonsignificant difference, obtained from an invalid method is also scientifically invalid, i.e. garbage in, garbage out.

David R. Hershey, Ph.D.
[email protected]

References

Harris, D.E., Hannum, L. and Gupta, S. (2004). Contributing factors to student success in anatomy and physiology: Lower outside workload and better preparation. American Biology Teacher, 66, 168-175.

Hershey, D.R. and Merritt, R.H. (1987). Influence of photoperiod on crop productivity and form of Begonia X semperflorens-cultorum grown as bedding plants. Journal of the American Society for Horticultural Science, 112, 252-256.

Rice, S.A. (2004). Response to Brine shrimp "bioassay" problems. American Biology Teacher, 66, 475.

Rice, S.A. and Griffin, J.R. (2004). The hornworm assay: Useful in mathematically-based biological investigations. American Biology Teacher, 66, 487-491.