**Botany online** 1996-2004. No further update, only

Generally, answers to questions of the numerical capture, analysis or representation of mass phenomenons like allele frequencies in populations are given by statistics and probability calculus. Readings of experiments may be grouped around a mean (= mean value). To find out whether two series of readings represent basically the same or significantly different values, the t-test is performed. It answers the question of how far the means of the two readings differ from each other.

The

chi^{2}-test is done to see if a result corresponds with the theoretically expected values. The smaller thechi^{2}-value the more probable it is that a deviation is caused merely by chance.

MENDEL, his rediscoverers and the geneticists of this century never got exact but only approximate ratios in their crossings. Ratios like 3:1 or 1:1 are idealized values. Though the interpretation of the mechanism they are based on is plausible, several questions have to be asked not only by a mathematician but also by a practical geneticist:

How large is the deviation from the theoretically expected values allowed to be?

How many specimens have to be counted to regard the results as reliable?

Is there a way to reach the same results with less
bother?

Answers to these questions are given by
**statistics** or
**probability calculus**. Therefore,
a clear yes or no is never to be expected, instead, the answers will
be with how many percent probability an event corresponds with an
assumption or whether there is a significant difference between two
series of readings. The geneticist is helped by several formulas that
he can use for his values and by calculated standards documented in
tables that he can refer to. The decisive precondition for the use of
the mathematical approaches is the choice of the right formula. It
has to be clear whether the own, experimentally gained data satisfy
the respective conditions. They all have to share the same
dimensions, absolute values cannot be mixed up with relative values
(in percent). Further preconditions that have to be taken into
account when performing the respective statistic tests can be found
in reviews like that of ZAR (1984).

The **mean** of a series of values
is calculated as follows:

X = sum x_{i} / n

where x_{i} represents the single values and n the number
of the values.

**Mean variation (=
variance):**When depicting the readings in a histogram, it
is usually easily discernible whether they are grouped around a mean
or not. If the readings stem from a **Gaussian
normal distribution**, the resulting curve will be
bell-shaped with an increasing number n. Only if this is the case, it
makes sense to go on working with it statistically as is shown in the
following.

The picture (below) shows that the curve of a Gaussian normal
distribution can be described by the position of its maximum that
corresponds to its mean X and its points of inflection. The distance
between X and one of the points of inflection is called mean
variation or **standard deviation**.
The square of the mean variation is the variance. A series of
readings is always a more or less large spot check of a totality.
Spot checks do always have a relative error, the
**standard error** or standard
deviation of the mean, whose size is dependent on the number of
readings. It can mathematically be expressed by 1 / *square
root* n.

The readings have at first to be standardized, the type and the degree of divergence of a Gaussian normal distribution have to be taken into account. The mean variation of a spot check (s = square root of the medium quadratic deviation) can be ascertained by the following formula:

s =square root[x_{i}- X]^{2}/ n - 1

By integration of the Gaussian normal distribution, the area
marked by the base line and the curve area from __+__ 1
*sigma*, 2 *sigma*, 3 *sigma* etc. can be
calculated:

Parameter of a Gaussian distribution: The value P refers to the part of the area that is enclosed by the curve and the base line between the values + and -1

sigma, (light blue area) and + and -2sigma(light blue + medium blue area), respectively, or + and -3sigma(light blue + medium blue + dark blue areas).

This means that 68.3 percent of all readings of an ideal
distribution scatter with 1 *sigma*, 95.4 percent with 2
*sigma* and 99.7 percent with 3 *sigma* around the mean.
These values are important since they are used as standards for most
statistical statements. It is thus important for a practitioner to
measure and to incorporate his own readings critically so that they
can refer to such an ideal distribution.

An ever repeating question is whether two (or more) series of
readings represent significantly different results or whether
different means are caused merely by chance and have thus different
values as a result of 'errors'. To solve the problem, the relation of
the means of both series to the standard deviation have to be
compared. For the comparison of two series of readings, the t-test is
used. The aim of the comparison is the examination, how far the means
X_{a} and X_{b} differ from each other. The measure
for this is the quantity t.

The probability P that corresponds to a calculated t can be found
in probability tables. If X_{a} and X_{b} vary more
than X_{a} __+__ 3 *sigma*, it is spoken of a
significant difference. The probability that both values tally is
< 0.3% meaning that the probability that both represent
distributions different from each other is > 99.7%. If the
difference is larger than X_{a} __+__ 2 *sigma* but
smaller than X_{a} __+__ 3 *sigma,* then it is spoken
of a secure difference. In this case, the probability (P) of the
correspondence is about five percent, the probability of difference
is accordingly 95 percent. 3 *sigma* and 2 *sigma*
respectively are also referred to as a one percent and a five percent
respectively degree of confidence. It is common to use fractions of
the number 1 instead of percent in statistics, P would thus be 0.01
and 0.05 respectively. Two further things become clear when looking
at the table:

- The larger the difference between two means the fewer readings (n) are necessary to safeguard a statement.
- An (expected) small difference has to be accompanied by a
drastic increase in the number of readings to obtain a significant
statement.

In a prevoiusly shown table,
the splitting ratios obtained by MENDEL have been depicted. He
extrapolated a 3:1 ratio. The
** chi^{2}-test** shows
whether this is permitted:

d = divergence of the expected result, e = expectation. The
smaller the value of *chi*^{2}, the more probable it is
that only chance is responsible for a divergence. Only absolute
numbers (never percentages) can be used for the
*chi*^{2}-test. The test shows that the correspondence
of MENDEL's numbers with his expectations is very high. The
mathematically calculated values for the expectations can also be
found in the respective tables. Later studies showed that even much
smaller amounts of data are sufficient to obtain significant
values.