Why Electronic Discovery is Expensive
Actually, discovery using electronic tools is not expensive at all. The high cost of electronic discovery is instead caused by having to review a large volume of electronic records (largely email) that businesses are creating and keeping. The ever-declining cost of electronic storage is the culprit. Currently, a gigabyte of storage costs less than twenty cents. In 1990, the same storage cost at least 100,000 times more. Because current storage is almost free, many businesses make the mistake of keeping information far beyond what is needed or useful.
The false economy of storing additional information becomes apparent once litigation and its related discovery obligations occur. Then, the data kept on twenty cents of storage will cost upward of $50,000 for a junior lawyer to review.
At first, technology solutions that sped lawyer review warranted appropriate attention. One popular software solution gets its name from the claim that lawyers can review ten times the data in the same time. But faster review of enormous volumes really does not solve the problem. Reviewing thousands of gigabytes of data (a common volume in a moderate-sized company) at one-tenth the cost, or $5,000 per gigabyte, still costs way too much!
Solving the Electronic Discovery Cost Problem
To avoid excess discovery costs before a subpoena arrives, follow the proactive advice located at Electronic Storage Best Practices. Fulcrum’s personnel can assist with addressing these issues.
The second best way of reducing cost is to eliminate documents from consideration without any human review. This can be automated with keyword searches. As part of the normal discussions regarding the scope of discovery obligations, counsel should work together to create a list of key words that will be deemed to be potentially relevant. Both producing and receiving parties benefit by such arrangements since both parties incur costs reviewing whatever ends up getting produced. As with all discovery, the Court or a discovery referee can get involved to address instances where either party is being unreasonable.
The obvious problem with use of key word searches is the concern that a relevant document may be missed simply because it does not contain the words used in the search criteria. To address this issue, statistical sampling of the irrelevant documents should be used. Such sampling is used throughout the sciences and industrial applications to identify the rates of error that exist in large population. For example, in manufacturing quality control, statistical testing is routinely used to address whether a batch of manufactured product meets required specifications. If the sample shows an unacceptable deficiency, the entire lot is rejected.
Statistical sampling can be used in the same way for testing documents that are tentatively not going to be produced because they did not meet the necessary Boolean search criteria. Such hopefully irrelevant documents should be sampled and then manually reviewed to determine the effectiveness of the search criteria in identifying relevant records. As with testing manufactured product, a relatively small number of relevant missed records instructs the parties that the process needs to be reevaluated, with remedial action taken. After making adjustments, statistical sampling is performed again to verify that the revised process identified a sufficient percentage of desired documents.
The landmark electronic case, Zubulake vs. UBS Warburg, used sampling to reduce discovery cost and establish whether additional discovery was necessary. Similarly, courts have accepted statistical sampling in a wide variety of other endeavors as a means of scientifically estimating results in a large population that could not practically be addressed by examining the entire population.
Use of key words and related statistical testing of the process should not cause undue risk to the producing party, particularly when the process used for the selection and related testing are disclosed. Anyone who follows electronic discovery cases can quickly recall examples where severe sanctions befell a party who failed to produce relevant records. For example see (i) Morgan Stanley’s botched discovery re: Sunbeam, and (ii) our own article regarding lawyer sanctions re: Qualcomm, Inc. v. Broadcom. However, in both these cases and others, sanctions occurred because the parties either were grossly negligent, or actively misrepresented the truth. The cases do not memorialize a pattern of sanctions or other punishment when a party attempted to conscientiously apply a disclosed process when locating relevant documents from a large morass of irrelevant records.
Statistical Sampling Guidance
Since the whole point of using a sample (vs. inspecting the entire population) is to save money, those using this approach obviously prefer to have the smallest sample that will meet their objectives. Sample size is determined by the following factors that the person conducting the sample gets to determine (at least initially):
- Precision – How close do you want your estimate to be? The more precise you want the estimate, the higher the sample size will need to be. When applied to this situation, precision refers to the percentage of relevant documents that can be tolerably missed.
- Confidence – How confident do you want to be that the estimate is within the precision range described above? The higher the level of confidence, the higher the sample size will need to be. In scientific endeavors, confidence is routinely expressed as either 99% or 95%. For purposes of ensuring that relevant documents have produced, one would normally want this high level of confidence.
- Expected deviation – How many exceptions do we expect to occur in the population? Because dispersed results are more difficult to measure, the more dispersed the data, the higher the sample size will need to be. In electronic discovery sampling, we anticipate having practically no exceptions (i.e., documents that should have been produced that were not identified). Consequently, the sample size can be smaller. Of course, if our actual results do not confirm this, it is back to the drawing board; but, that is what the testing was endeavoring to determine.
Notably, the size of the population has no meaningful impact on the sample size.