Double sampling

2615
Jonah Lester
Double sampling
In double sampling you want to know more deeply a variable of a population

What is double sampling?

The double sampling is a technique used in inferential statistics when you want to know more detail and certainty about a particular variable, which characterizes a certain population.

The second population sample is generally carried out after a first sample has been taken and analyzed, the analysis of which did not yield a statistically significant conclusion about any of the study variables..

For this reason, double sampling in Statistics is also known as two-stage sampling. The usefulness of the second sample is that it helps to determine with greater precision the estimation of ratios and regressions of a certain auxiliary variable, which arises in light of the analysis of a first sample..

Another use that is given to double sampling is to collect information to carry out a sampling by strata..

Examples

Various situations where double sampling is warranted will be described below..

Quality control in the manufacture of parts

The double sampling method is frequently used in industrial quality control and is usually done in two phases..

For example, suppose an industrial machine that makes certain parts. No matter how much the machine is adjusted, no part is identical to another, since small variations may occur in its dimensions and weight. It is about determining whether a batch of parts manufactured by said machine meets the tolerance criteria for it to be accepted or rejected..

First, a random sample of pieces is taken with which you want to check if one of the variables, for example, the length of the piece, is within tolerance.

In case the average length is below or above the degree of tolerance desired for said variable, in this first sample, then it is inferred that the lot is defective and it must be discarded. In this case no new samples are required.

Conversely, if the average value is within the tolerance range, but the sample standard deviation is large enough that the addition or subtraction of the average value falls outside the range, then a second larger sample will need to be collected..

This second sample must include the original sample to redo the calculations and thus be able to make a final decision regarding the investigated variable. In this way it can be known if the batch is defective or not.

Lower sampling costs

On many occasions, the information about one of the variables to be studied is difficult to access. But there may be an auxiliary variable more easily for data collection.

In this case, two samples are taken, a large one for the auxiliary variable, less expensive, and a smaller sample, contained in the larger sample of the most expensive variable..

This method is applicable whenever it is determined that there is a correlation between both variables, which is generally a proportionality relationship..

An example of this situation appears in forest sciences, where it is desired to determine the percentage of trees affected by a parasitic plant (ringworm).

As these are very extensive regions and difficult to access, the study of the complete population of trees is not feasible in time and costs. These steps are then followed:

Step 1: taking samples

A preliminary sampling would consist of the use of aerial photography and the forest is subdivided into lots. From here a few lots are randomly chosen and it is estimated, by analyzing the images of the chosen lots, how many trees are affected by ringworm, since the color of the trees is affected by the parasite..

Step 2: field work

But the photographic analysis can be imprecise, so a few batches of the first sample are chosen, preferably at random, to do field work..

Step 3: comparison

The field result is then compared with the photographic one for the interception of the two sets of batches. This comparison can be carried out, for example, by making a graph in which the horizontal axis is the value obtained for each batch through photography and on the vertical axis the value obtained per batch through fieldwork..

This graphic method allows to visually identify whether or not there is a correlation between both results and to determine, through a regression analysis, the coefficient of proportionality or ratio between both samples..

After the largest sample, that is, the photographic sample, the average value of infected trees and their standard deviation are taken. But since the proportionality coefficient and its error with the field samples were determined, then it is possible to correct the result of the larger sample (the photographic one).

This result can then be extrapolated to the entire tree population.

Advantages and disadvantages of double sampling

In the examples described, the cost advantage is evident, since replacing an easily accessible variable with another that is difficult to access saves time and money..

A disadvantage is that, in the case of double sampling for quality control, there is a risk of going through good batches of products that are out of tolerance..

Exercise

We want to estimate the number of diseased trees in a 162-hectare forest. As the forest is very extensive, it is subdivided into 100 parcels of the same area. 18 plots are randomly chosen and by means of a photographic study it is estimated that in these 18 plots there are 8.5 diseased trees with a standard error of plus or minus 4.5 trees.

From these 18 plots, 8 plots are randomly chosen in which the field study is carried out. For these eight plots, the photographic study shows 10 diseased trees with an error of plus or minus 5.3 trees..

On the other hand, for those same eight plots the field study shows 12.4 diseased trees with an error of plus minus 6.3 trees.

It asks:

  • a) Determine the proportionality coefficient between the field study by linear regression.
  • b) Estimate the number of diseased trees using the photographic method in the hundred plots.
  • c) Apply the correction with the coefficient of proportionality obtained, to estimate the real number of diseased trees in the entire forest.

Solution

A graph is made of the number of trees per photographic count vs field count for the eight lots selected for both studies..

Photo count versus field count. Source: F. Zapata.

A trend line is fitted and its slope determined. In this case it is obtained that the coefficient of proportionality is 1.23. That is, if X is the number per photographic count, then it is estimated that the field count will be Y = 1.23 X.

The number of diseased trees according to the photographic count in the 18 selected lots will be:

18 x 8.5 = 153

But since the entire forest was divided into 100 plots of equal area, the number of diseased trees estimated by the photographic method is: (100/18) x 153 = 850.

The correction factor obtained from the comparison between the field and photographic study is now applied:

Estimated actual number of diseased trees in the forest = 1.23 x 850 = 1046.

References

  1. Double Sampling for Ratio Estimation, PennState College. Recovered from psu.edu
  2. Double, Multiple and Sequential Sampling, NC State University. Recovered from ncsu.edu
  3. Simple Random Sampling. Recovered from investopedia.com
  4. What is double sampling? Recovered from: nist.gov
  5. Sampling. Recovered from: en.wikipedia.org
  6. Multistage Sampling. Recovered from: en.wikipedia.org

Yet No Comments