Statisticians describe independence as whether the occurrence of one event or characteristic makes it neither more nor less probable that other event(s) or characteristic(s) occur(s). The chi-square test described below is one of the most widely used tests for evaluating independence of variables, particularly when the number of observations and/or variables becomes larger. This article focuses on testing whether employment discrimination is occurring, but chi-square tests can be used for numerous other applications.
How the U.S. Government Defines and Tests for Employment Discrimination
The “Uniform Guidelines on Employee Selection Procedures” published by the U.S. Equal Employment Opportunity Commission (EEOC) describes their purpose as follows:
“These guidelines incorporate a single set of principles which are designed to assist employers, labor organizations, employment agencies, and licensing and certification boards to comply with requirements of Federal law prohibiting employment practices which discriminate on grounds of race, color, religion, sex, and national origin. They are designed to provide a framework for determining the proper use of tests and other selection procedures.”
These EEOC’s guidelines define adverse impact as:
“A substantially different rate of selection in hiring, promotion, or other employment decision which works to the disadvantage of members of a race, sex, or ethnic group.”
The U.S. Department of Labor (DOL) provides a similar definition, as follows:
“Disparate impact is a theory of employment discrimination based on the disproportionate effect of a racially neutral criterion/ process. The theory refers to the discriminatory effects of uniformly applied employment criteria/processes that are neutral on their face but which more harshly affect minorities or women and cannot be justified by business necessity or job relatedness. Because the disparate impact analysis addresses the effects of a particular requirement on groups of people, it is generally a statistical proof.”
The EEOC’s guidelines describe a simplistic way of testing for adverse impact using simple ratios, as follows:
“A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact. … Greater differences in selection rate may not constitute adverse impact where the differences are based on small numbers and are not statistically significant …”
The DOL similarly defines the same 80% test as follows:
“A substantially different rate of selection in hiring, promotion, transfer, training or in other employment decisions which works to the disadvantage of minorities or women. If such rate is less than 80 percent of the selection rate of the race, sex, or ethnic group with the highest rate of selection, this will generally be regarded as evidence of adverse impact. Adverse impact analyses based on the 80% rule may be buttressed by a test of statistical significance.”
For additional background information, see RIF Statistical Audits Reduce Discrimination Risks.
The following illustrates the EEOC’s and DOL’s adverse impact calculation using the four-fifths or 80% rule. Assume that there are 700 employees (consisting of 400 men and 300 women) in a company that is facing a layoff or reduction in force (RIF) of 200 people. The employer decides to lay off 100 men and 100 women. The four-fifths or 80% rule would be calculated as follows:
|Before the RIF||After the RIF||% Not RIF|
|Men||400||300||300/400 = 75%|
|Women||300||200||200/300 = 67%|
|Total||700||500||500/700 = 71%|
|Adverse Impact Ratio (calculated as 67%/75%)||89%|
The EEOC’s and DOL’s 80% test passed. However, alternative statistical tests may conclude a different answer.
Inferential Statistics May Provide a More Complete Answer
The classic tool for teaching probability involves placing different colored balls into a container, and then randomly removing the balls. The objective is to determine the probability that a certain color of ball will be removed. If everything were to occur “perfectly”, each removed ball would always bear the same relationship as the starting percentage of each color of ball in the container. Of course, this cannot occur because (i) only one entire ball (not a percentage of a ball) is removed at a time, and (ii) random events could cause one color of ball to be removed with a greater frequency. With small sample sizes, the outcome of a single ball removal dramatically affects the percentage of a particular colored ball that has been pulled. One could calculate the probability of every possible permutation of balls being pulled. This is possible when small sample sizes are involved; the possibi