Sampling is an accepted way of making estimates at a much lower cost and when done properly, statistics can provide useful information regarding the likelihood that information obtained from the sample is representative of the entire population from which the sample was taken. This article provides a primer of information that the person considering a sample will need to evaluate. We include interactive calculators that determine how large the sample size will need to be in light of these key considerations.
The first step is to determine the type of information being sought. There are two types of information, with each type having its own sampling method. The two sampling types are:
- Attribute Sampling – Attribute sampling, also known as proportional sampling, allows one to measure the probability of discrete possible outcomes. Examples of discrete outcomes that could be measured with attribute sampling include:
- Correct vs. incorrect
- Win vs. lose
- Yes vs. no
- Overcharged vs. not overcharged
- Male or female
- Heavier than 150 pounds vs. 150 pounds or lighter
2. Variable Sampling – Variable sampling allows one to measure amounts. For example, variable sampling could be used to measure how much one might have been overcharged, or how heavy something is.
Sample Size Determinants
Since the whole point of using a sample (vs. measure the entire population) is to save money, those paying for the statistical work obviously prefer to have the smallest sample that will meet their objectives. Sample size is determined by the following factors that the person conducting the sample gets to determine (at least initially)
- Precision – How close do you want your estimate to be? The more precise you want the estimate, the higher the sample size will need to be.
- Confidence – How confident do you want to be that the estimate is within the precision range described above? The higher the level of confidence, the higher the sample size will need to be. In scientific endeavors, confidence is routinely expressed as either 99% or 95%, but proof requirements under the law may not require this high a burden on the party making an assertion.
- Standard deviation – How much dispersion occurs in the population? Because dispersed results are more difficult to measure, the more dispersed the data, the higher the sample size will need to be. See next section for information regarding how to estimate standard deviation for each of the two types of sampling.
- A one vs. a two tail test – Are we concerned with both understatements AND overstatements of our estimate? Or, is it acceptable for us to estimate merely overstatements or understatements, but not both? If we are interested in only one of these two possibilities (known as a one tail test), then our conclusion is easier to obtain and the sample size can be smaller. Most commonly, the estimator is interested in both overstatements and understatements, so a two-tail test is being performed.
Note that population size is not included in the above list. For any population large enough to warrant sampling, population size becomes irrelevant.
Determining standard deviation
Of the above questions, the standard deviation is often the most difficult parameter to define. If insufficient information exists regarding the standard deviation, a pilot test can be useful. The pilot is a sample itself, although much smaller than is needed to perform the ultimate estimate.
For attribute sampling, calculating the expected results is simple. As noted above, attribute sampling involves discrete possibilities such as off or on. Accordingly, to measure standard deviation, take a small pilot test and then calculate the percentage that has the desired attribute. In the sample size calculator provided below, this percentage is called the “probability of success”.
For variable sampling, the following calculator calculates the standard deviation, or standard deviation for the pilot test. If each observation is input in size (or time) order, the calculator will also plot the best trend line for your data.
Sometimes, standard deviation within a population is predictable; meaning, there is a pattern to the standard deviation. When this occurs, one can break the population into two or more groups, and treat each group as its own population. This has the advantage of decreasing the standard deviation within each group. If we consider these groups, or strata as separate subpopulations, draw separate samples from each stratum, and combine the results, our sampling method is called stratified sampling. To ensure that the stratified sample is representative of the entire population, the size of each stratum is proportional to what exists in the entire population. Stated otherwise, the proportion of the sample taken from each stratum should be equal to the proportion of each stratum in the entire population.
Sample Size Calculators
With both of the following calculators, you can alter each of the four variables to immediately see the impact on the sample size.
The following calculator determines the sample size for attribute sampling. As noted in “Sample Size Determinants” above, you will need four inputs.
The following similar calculator determines the sample size for variable sampling.
Fulcrum Inquiry assists lawyers with damages analysis, data analysis, and statistical calculations. Please call if you need help with applying these calculators, or establishing other statistical parameters.