NACDL - Making Sense of Pretrial Risk Assessments

Making Sense of Pretrial Risk Assessments

Pretrial risk assessment tools have emerged as the favored reform in the movement to lower pretrial jailing and limit the abuses of money bail. Critics have voiced concern, however, over the threat that risk assessments perpetuate racial bias inherent in the criminal justice system. When a risk assessment tool recommends that a client be detained or released with conditions — particularly a client of color — lawyers must be prepared to point out the tool’s limitations and biases.

Access to The Champion archive is one of many exclusive member benefits. It’s normally restricted to just NACDL members. However, this content, and others like it, is available to everyone in order to educate the public on why criminal justice reform is a necessity.

Actuarial risk assessment tools have come to dominate the critical debate over reforming the nation’s pretrial systems. The spread of risk assessments in recent years has been breathtaking. Over 60 jurisdictions, including several states, now employ a risk assessment.1 Twenty-five percent of people in this country live in a jurisdiction that uses a pretrial risk assessment.2 

The tools, particularly those that use algorithms to forecast pretrial outcomes, have risen to prominence for several interrelated reasons. The first is an emerging and overwhelming consensus that too many arrestees are being held in jail prior to trial. One of every five of the 2.3 million people in U.S. jails and prisons is a person awaiting trial.3 These pretrial detainees account for 76 percent of those held by local jails. There is increasing bipartisan agreement that states must reduce the fiscal strains imposed by such extensive pretrial detention.

Recognition of the scope of pretrial detention has been paired with an increased appreciation of its disastrous consequences for both the arrestee and society. Arrestees jailed for 48 hours can lose their employment, housing, and custody of their children, the economic effects of which ripple well beyond the arrestee’s family. Unsurprisingly, study after study has demonstrated that being jailed directly increases an individual’s likelihood of being convicted, and, once convicted, her likelihood of a harsher sentence.4 

Another factor is the growing rejection of jailing based on wealth. Ninety percent of felony defendants could be released if they could afford bail.5 Bail schedules, which proscribe an amount of bail based on the charged offense, are a major culprit in this widespread financial detention. Nearly 70 percent of counties rely to some degree on bail schedules.6 Bail schedules offer efficiency in a pretrial system in which judges often make release decisions in a matter of seconds. Proponents assert that bail schedules also prevent disparities resulting from biased judges. But they are inherently unfair for lower-income people and those who disproportionately occupy this social stratum, such as people of color and women.

In the movement to both lower pretrial jailing overall and limit the abuses of money bail in particular, risk assessments have emerged as the favored reform. Described by such themes as “Moneyballing Criminal Justice,” or “replacing wealth with risk,” the general idea is that risk assessments offer comparable gains in efficiency and uniformity as bail schedules, but with an objective tool that is more accurate and less biased than judicial determinations of risk.

Increasingly, however, enthusiasm for risk assessments has been met by apprehension. Much of the controversy centers on the fact that the tools, while perhaps capable of facilitating greater pretrial release, may increase pretrial detention or still encourage excessive detention.

Critics have also voiced concern over the threat that risk assessments perpetuate racial bias inherent in the criminal justice system. The publication ProPublica recently declared that a popular tool was “biased against Blacks.” The article sparked a contentious debate over ProPublica’s analysis and, more fundamentally, what it means for a statistical tool to be “race neutral.”

Since the ProPublica debate, there has been a wave of scholarly articles exploring the technical, legal, and moral implications of risk assessment tools. The discussion suggests that the rapid spread of risk assessments has perhaps outpaced our understanding of both their operation and proper place in pretrial systems. Yet it also offers an opportunity to address these outstanding issues.

Risk Assessment Basics

Pretrial risk assessment tools are basically mathematical recipes to group people into “risk categories.” There are two types of tools: actuarial algorithms — the primary focus of this article — and clinical interviews. Some of the better-known algorithmic instruments in use are the widely-used Laura & John Arnold Foundation Public Safety Assessment (“PSA”), the Federal Pretrial Risk Assessment Instrument (“PTRA”), and private company Northpointe’s COMPAS.

Algorithmic risk assessment tools forecast outcomes based on historical data. Their objective is to find patterns in past data that correlate with some definition of success.7 In the pretrial context, tools define “success” as when an arrestee does not fail to appear (“FTA”) for court and/or is not re-arrested8 during the pretrial period.9 

The tools then assign a range of scores to individuals based on their risk profile. Those scores are then grouped into risk bins ordered from “low” to “high.” For example, individuals with a score of 1-5 may be placed into a low-risk bin; those with a score of 6-10 may be placed into a medium-risk bin; and those with a score of 11-15 may be placed into the high-risk bin. The designation of risk bins is somewhat arbitrary. What qualifies as low or high depends on the thresholds set by tool designers, and merely denotes the risk a group presents relative to other risk bins. To illustrate, somewhere between 8.6-11 percent10 of those flagged for “new violent criminal activity” (“NVCA”) by the PSA were re-arrested for a violent charge within six months of their release. That means 89-91 percent of people flagged were not arrested for a violent crime.

Relatedly, though often touted as capable of predicting an individual’s likelihood of pretrial success, the tools are actually incapable of making individualized predictions. They instead study data from many individuals and then forecast aggregate group risk. A risk score therefore indicates that a person shares traits with a group who succeeded or failed at a certain rate. But the score provides no information about how a specific individual will behave if released.

Risk assessment tools assess several types of risk, and determining what specific risks a tool measures is critical to understanding its operation. Some tools, like the Arnold PSA, separately predict the risk of FTA, risk of arrest, and risk of arrest for a violent crime (referred to as the “violence flag”). However, many do not, and instead collapse these probabilities into one composite “risk” score, representing risk of either an FTA or any arrest. Many other tools do not distinguish between arrest for a crime of violence and general arrest.

Regardless of whether the tools combine or disaggregate crime predictions, they almost uniformly provide the probability of a new arrest, rather than the probability that the individual will actually engage in criminal activity. Since a host of factors influence whether police detect offenses and make arrests, it is perhaps more accurate to describe the tools as predicting the behavior of law enforcement.

Another recurring policy issue is the value of a tool being “validated.” But the term “validated” has no standard definition.11 Broadly speaking, validation assesses the extent to which a tool measures what it is intended to measure, typically court appearance and new arrest. While some validation processes review a tool for possible race or gender disparities,12 many do not.

Even where tools are validated, designers usually do not disclose critical information about the tools’ design. For example, validation studies generally do not reveal how various data points have been weighted,13 or how the tool designates certain risk levels as “high,” “medium,” or “low.” Further, citing privacy or logistical concerns, designers have not made the raw data used to develop the tools available to the public, largely because the data sets that designers create to build a tool are not themselves subject to public records requests. Finally, the decision making of tool designers in constructing the assessment is often considered proprietary. These hurdles have thus far prevented any widespread independent auditing of the tools.

Moreover, a validated tool is not necessarily a highly accurate tool. Like their human creators, they make errors. For example, a recent study found that the recidivism predictions generated by the COMPAS tool were no more accurate than those of ordinary people recruited through a crowdsourcing site and provided with short descriptions of arrestees.14 Margins of error are reflected by a tool’s “Area Under the Curve” or “AUC” statistics. The “AUC” value ranges from .50 to 1.00, with .50 being no better than chance, and with 1.00 meaning perfect prediction. By current industry standards, a tool with an AUC of .60 to .70 is considered acceptable, and an AUC of 0.7 or higher is considered good.15 In other words, even the best tools err 30 to 40 percent of the time.

Risk assessment tools also — by design — over-predict risk for several reasons. First, tools typically rely on data collected prior to reform. Thus, these predictions fail to account for interventions that might reduce risk, such as court date reminders, transportation assistance, or check-ins with a pretrial services agency. This leads to what some experts refer to as “zombie predictions,”16 which may lead to unnecessary detention or unnecessarily intrusive release conditions. Further, because the tools are developed on data only from people who were released, any data points from detained persons who would otherwise have been successful are omitted from the calculus. Finally, the tools are prone to over-predict risk because pretrial flight and violence are relatively rare.17 These factors gradually skew the universe of information on which the tool operates toward increased pretrial failure, thus resulting in the over-prediction of risk.

Additionally, tools arguably misrepresent “risks.” Most risk assessment tools report their outputs either in the form of a recommendation to a judge (i.e., “not recommended for OR release”) or with a seemingly normative label (i.e., “high risk”) without disclosing the actual likelihood of a given outcome (e.g. “8.5 percent chance of arrest for a violent crime within nine months”18). Given this fact, early evidence on judicial overrides of tool recommendations warrants concern: judges are much more likely to override a release recommendation than a detention recommendation.19 This dynamic threatens a one-way ratchet effect where judicial decisions to over-detain influence the tool to recommend detention more often.20 

False Positives, False Negatives, and Pretrial Constitutional Principles

Applying what we know and do not know about risk assessments, the extent to which they may properly inform a detention decision amounts to a debate about risk tolerance, or the degree of errors society must accept in releasing an arrestee. Any system of prediction must wrestle with the errors it will inevitably produce, along with the relative harms those errors will inflict. The most common endorsements of risk assessments focus on their presumed superiority to the subjective assessments of risk made by judges, a position not yet supported by research. But this focus on the superiority of risk assessments to judges obscures a critical question: How certain must we be about a defendant’s risks in order to detain? Depending on the answer, deferring detention decisions to risk assessments could be both a significant improvement over judges, yet still fundamentally unfair.

A useful frame for assessing the appropriate role of risk assessments in the detention decision is to consider the relative “costliness” of false negatives versus false positives. In the pretrial context, a false negative is a situation in which a tool forecasts that an individual can be released successfully, i.e., is “low risk,” yet that individual either does not return to court or commits a violent offense on release. A false positive, by contrast, refers to situations in which the tool predicts that an individual will either flee or offend, i.e., is “high risk,” but the individual complies with all legal requirements.

The immediate empirical challenge with comparing these groups is that someone predicted to succeed will likely be released, while a person predicted to fail will most often be detained. Thus, while the universe of released “false negatives” may be reliably assessed based on their subsequent conduct, most detained “false positives” will never be detected, as we cannot discern whether they could have been safely released.

Notwithstanding this limitation — which at least one study has attempted to address21 — the debate between the relative weight of false positives and false negatives has enormous consequences for pretrial law and policy. Public fears of pretrial crime are a major motivation behind expansions of preventive detention.22 Judges and law enforcement officials therefore will predictably be more concerned about false negatives, since they face severe public backlash if they release individuals who commit offenses.23 But these officials rarely pay any political price when they wrongly detain someone.

Defendants obviously have a different perspective. The prospect that they may be detained due to sharing a risk score with a group deemed more likely to be deviant seems inherently unfair. Indeed, this is precisely the objection raised against racial profiling. Stereotypes — whether by people or algorithms — are necessarily overbroad and intentionally blind to the individual. Being wrongly labeled “high risk” under these circumstances is particularly unsettling since, if detained, these arrestees likely can never prove that they are false positives.

Deciding which of these interests should take precedence is a question neither risk assessments nor their designers can answer. It is a legal and moral question left to the legal system and society. For defense attorneys representing arrestees who face detention based to any degree on a tool’s prediction, this raises the critical question of how to contest a label of “risky,” both in court and as a matter of policy.

Guidance on this question must begin with the legal standards governing pretrial release.

The Eighth Amendment, the only constitutional provision that explicitly addresses bail, prohibits “excessive bail.” The Supreme Court in Stack v. Boyle held that bail is excessive when set higher than what is reasonably necessary to achieve the purpose of bail. The Supreme Court later clarified in Salerno v. United States that the Eighth Amendment does not guarantee a right to bail, i.e., that an individual may be held without bail under certain circumstances. Nonetheless, for those entitled to release on bail, Stack has generally been interpreted to guarantee a right to an individualized bail determination.

Overreliance on risk assessments is in direct tension with the Eighth Amendment right against excessive bail. Risk assessments, as discussed, provide group-based projections, often from a limited number of factors. Defendants may therefore confidently claim an Eighth Amendment right to introduce evidence that contradicts a risk assessment’s non-individualized recommendation of detention.

The Supreme Court has also recognized that the right to pretrial release is fundamental. As the Court expressed in Salerno, “liberty is the norm, and detention prior to trial or without trial is the carefully limited exception.” This holding places two additional constraints on the use of risk assessments to detain. The first is that detention is only authorized if the government identifies a compelling state interest in denying liberty. It is generally accepted that preventing pretrial flight and violent crime are compelling interests that might justify detention.24 But there remains an open question of whether the lesser concerns of preventing missed court appearances or minor offenses would ever qualify as compelling. At a minimum, tools that collapse these risks into aggregate scores arguably fail to establish a sufficient justification for detention, even for “high risk” arrestees.

The second potential constraint is that after the government identifies a compelling interest, it must demonstrate that pretrial detention is necessary to achieve that interest. Detention is therefore illegal if a release condition such as pretrial support would serve the government’s interests. As under the Eighth Amendment, a higher risk score cannot, on its own, justify detention. Courts are instead obligated to examine whether monetary and nonmonetary release conditions might mitigate whatever risk a tool identifies (assuming the tool identifies a specific risk).

However, risk assessments are of limited use in identifying these release conditions. The common assumption in pretrial policy is that higher risk scores warrant more intrusive supervisory requirements. But the tools only forecast the risk that an arrestee may fail pretrial release if the court does not impose release conditions. Though a higher score might suggest that some sort of intervention is needed, the tools are incapable of identifying how amenable an individual’s risk might be to any particular release condition.

The final constitutional norm is the presumption of innocence, expressed most famously as the (paraphrased) admonition that it is better to let 10 guilty defendants go free than to convict one innocent individual.25 The Court has recognized that “[u]nless this right to bail before trial is preserved, the presumption of innocence, secured only after centuries of struggle, would lose its meaning.”26 But the Court has also characterized the presumption as merely a burden-shifting device for the prosecution to demonstrate guilt beyond a reasonable doubt at trial.27 

Even if viewed as a burden-shifting mechanism, the presumption still arguably enforces a moral imperative that the criminal justice system must minimize the risk of erroneously detaining individuals. Requiring an exacting standard like proof beyond a reasonable doubt at trial forces the state to assume a greater risk of acquitting a factually guilty person than would more forgiving standards like “clear and convincing” or “preponderance.” It stands to reason that the state must assume a comparable risk when it attempts to detain an individual, not based on an offense the individual is accused of already committing, but based on an offense the individual may commit in the future.

Taken together, the right to an individualized bail determination, the fundamental right to pretrial release, and the presumption of innocence appear to require that pretrial systems treat false positives as costlier than false negatives. At a minimum, they likely impose a higher degree of risk tolerance than is commonly acknowledged in the pretrial reform debate, which typically focuses on releasing more defendants on the condition that crime rates remain stable. Release always involves risk, but “that is a calculated risk which the law takes as the price of our system of justice.”28 

Given the profound liberty interests at stake, the legal and constitutional rights implicated, and the tendency of tools to over-predict and misrepresent “risk,” serious thought must be given to eliminating or strictly circumscribing the influence of current tools on any detention decision. Evaluating these tools through a racial justice lens only heightens these concerns.

Proponents of risk assessment herald the tools as “race neutral,” “objective,” and a promising step away from the more opaque and biased decision-making of judges. However, as illustrated by the 2016 ProPublica article and the ensuing debate about that article,29 there are multiple ways to define a tool as being “race neutral.”

The ProPublica article “Machine Bias” used data from cases in Broward County, Florida, in which the COMPAS risk assessment tool30 was used on pretrial arrestees. The authors found that the algorithm “was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.” Moreover, COMPAS misclassified white arrestees as lower risk more often than it did for people of color.

In the vigorous scholarly debate that ensued, commentators focused on the different forms of algorithmic “fairness” by which to evaluate algorithmic tools. The standard metric is referred to as “predictive parity,” and means that a given risk score (e.g., a “4” on the PSA) is correlated with the same likelihood of pretrial success (e.g., “83 percent chance of appearing for court”) regardless of race. ProPublica’s metric of “fairness” is known as the “false positive error rate,” and requires equalizing the rates at which people of different races are falsely classified as high risk.

No tool can satisfy both types of fairness at once.31 Yet, despite this “fairness” debate, algorithmic experts and designers hold steadfast to “predictive parity.” The implicit justification for this position is that the higher rate of false positives among African Americans results from the fact that African Americans commit more offenses, i.e., have a higher “base rate” of offending. While scholars increasingly recommend limiting the tool’s input data to less biased sources, like arrests for violent offenses rather than for less serious offenses, most maintain that predictive parity offers a more accurate diagnostic for pretrial systems.

But none of these scholars has attempted to square this position with the Constitution’s arguable concern with false positives in the pretrial context. And though they may recognize that African Americans’ higher incidence of serious offenses is inextricably linked with their historical and systemic oppression, none grapple with the implications of that history on allowing African Americans en masse to be labeled as “riskier” than what they truly are. This analysis is critical to informing the choice of fairness metrics.

Practical Approaches to Pretrial Risk Assessments

Given the tensions discussed earlier, defense counsel has a daunting task in navigating risk assessment scores for individual clients. Above all else, counsel must understand and effectively communicate that the tools are just that: tools. Even at their best, the tools alone cannot “fix” pretrial systems; they depend on first establishing a healthy pretrial system, and their use can still potentially undermine reform. With this context, counsel must be prepared to convince judges to follow a tool’s recommendations for release, while simultaneously assert the tool’s limitations and biases in forming any part of a decision to release a client with conditions or to detain the client, particularly a client of color. The discussion below offers approaches to manage this tension.

Building on the observation that risk assessments evaluate group rather than individual risk, their most appropriate application is in making group-level decisions. A policy approach gaining currency is to use the tools primarily to sort defendants into lower-risk groups that may be released without a hearing; moderate-risk groups that may be released with a hearing to determine appropriate release conditions; and higher-risk groups that may be eligible for a hearing where detention is considered. This approach accomplishes the sorting intended by bail schedules, except risk groups are identified by risk scores rather than access to money.

Critically, this model merely authorizes the state to seek detention for higher risk groups. For defendants with higher risk scores, defense counsel can emphasize that, according to the tool, these clients are nonetheless overwhelmingly likely to succeed if released. Thus, while the score may suggest the need for some condition on liberty, it does not necessarily indicate the need for an intrusive condition. To prevail in the detention hearing, the state must still be required to present specific facts, separate from the offense charged or other information already accounted for by the tool, demonstrating that the individual cannot be released because no condition could reasonably manage the risk.

While this approach may mitigate concerns over racial bias by reducing the overall number of people, including people of color, detained prior to trial, it does not eliminate these concerns. Even the best designed tools will over-recommend white defendants for release, while disproportionately funneling black defendants into detention hearings. Since a major justification for risk assessments is replacing the biased decisions of judges, this is a serious concern, as black defendants will still disproportionately face wrongful detention.

Another worthy approach is attacking and altering the entire language around risk reporting. As discussed above, the term “high risk” can be gravely misleading. For the types of pretrial risk people are likely most concerned about — flight and violence — high risk may not really be high risk at all from a normative perspective. Accordingly, reformers are increasingly calling to alter how tools report their risk bins, such as reporting only the probability of success associated with a risk category, rather than the risk category.

Defense counsel need not and must not wait for these reforms in their advocacy. They should instead be prepared to confront judges with the misleading character of labeling a defendant “high risk.” Especially where the tool collapses risk categories or measures only the risk of re-arrest for any offense, this confrontation might include asserting that the tool’s risk nomenclature is unduly prejudicial, or even an encroachment on the right to an individualized bail determination.

Counsel must also be prepared to contest the application of tools to their client’s specific information. Where possible, attorneys should explain mitigating factors in their client’s case, for example, that several failures to appear were due to unforeseen illness. Noting where existing criminal legal data systems may have also made errors, for example by not accounting for expungement or dismissal, or simply including incorrect data points, may help advocates challenge the client’s risk score.

Conclusion

Perhaps the most important insight from the risk assessment debate is that, despite recent enthusiasm for the tools, on their own they cannot reform pretrial justice systems. And they may critically undermine reform if not used with the utmost care. Unless a jurisdiction commits to releasing more pretrial arrestees — either by mandating that judges release certain categories of arrestees or by achieving a culture that nurtures release among prosecutors, defense counsel, and judges — risk assessments will be of little moment. Realization of this basic fact is critical to the continued success of the bail reform movement.

The authors’ views in this article do not necessarily represent those of the ACLU or its affiliates. 

Notes

  1. Sarah Picard-Fritsche et al., Demystifying Risk Assessment, Center on Court Innovation, 1, available at https://www.courtinnovation.org/sites/default/files/documents/Monograph_March2017_Demystifying%20Risk%20Assessment_1.pdf.
  2. Pretrial Justice Institute, The State of Pretrial Justice in America, 13 (Nov. 2017) (“The good news is that this analysis shows 25 percent of people living in the United States now reside in a jurisdiction that uses a validated evidence-based pretrial assessment.”).
  3. See Prison Policy Institute, Mass Incarceration: The Whole Pie 2017, available at https://www.prisonpolicy.org/reports/pie2017.html.
  4. See, e.g., Mary T. Phillips, Pretrial Detention and Case Outcomes, Part 1: Nonfelony Cases, New York Criminal Justice Agency, Inc. (Nov. 2007); Christopher T. Lowenkamp, Marie VanNostrand & Alexander Holsinger, Investigating the Impact of Pretrial Detention on Sentencing Outcomes, at 10-11 (Laura & John Arnold Found. 2013); Christopher T. Lowenkamp, Marie VanNostrand & Alexander Holsinger, The Hidden Costs of Pretrial Detention (Laura & John Arnold Found. 2013); Paul Heaton, Sandra Mayson & Megan Stevenson, The Downstream Consequences of Misdemeanor Pretrial Detention (July 2016), available at https://www.law.upenn.edu/live/files/5693-harriscountybail.
  5. Brian A. Reaves, Bureau of Justice Statistics, U.S. Dep’t of Justice, Felony Defendants in Large Urban Counties, 2009—Statistical Tables 1, 15 (2013).
  6. Pretrial Justice Inst., Pretrial Justice in America: A Survey of County Pretrial Release Policies, Practices and Outcomes, 7 (2009).
  7. Thanks to Cathy O’Neil for providing this definition.
  8. While there is significant overlap in the factors that various tools use, each tool varies somewhat as to which factors it considers, or how much weight is given to a certain factor in generating a risk score. The Arnold Foundation has publicized its factors as well as the weight each factor is given, though other tools do not make their weighting system public. See http://www.arnoldfoundation.org/wp-content/uploads/PSA-Risk-Factors-and-Formula.pdf (“PSA Formula”). As one noteworthy difference between factors that various tools employ, the Ohio and Indiana tools consider an arrestee’s “age at first arrest” while the Arnold and Virginia tools do not. See http://www.ocjs.ohio.gov/ORAS_FinalReport.pdf at Appx A; https://university.pretrial.org/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=2773499f-10a4-210d-5017-7c3d57ece01d&forceDialog=0 at 1-1; https://www.dcjs.virginia.gov/sites/dcjs.virginia.gov/files/publications/corrections/race-and-gender-neutral-pretrial-risk-assessment-release-recommendations-and-supervision.pdf at 4; PSA Formula.
  9. For example, it appears that the PSA forecasts the likelihood of a given pretrial outcome during a six-month period, while the COMPAS tool measures the likelihood of a given outcome over one year. See http://criminology.fsu.edu/wp-content/uploads/Validation-of-the-COMPAS-Risk-Assessment-Classification-Instrument.pdf at 30.
  10. See, e.g.,Laura and John Arnold Foundation, Results from First Six Months of the Public Safety Assessment - Court in Kentucky, 3 (2014), http://www.arnoldfoundation.org/wp-content/uploads/2014/02/PSA-Court-Kentucky-6-Month-Report.pdf (indicating that the NVCA is associated with an 8.6 percent likelihood of arrest for a violent charge);The New Jersey Pretrial Justice Manual, 11 (2016),  https://www.nacdl.org/NJPretrial (indicating that, in New Jersey, the NVCA is associated with an 11 percent likelihood of arrest for a violent charge).
  11. See Cynthia Mamalian, The State of the Science of Pretrial Risk Assessment (March 2011), at 19. “With respect to validation, survey results indicate that 48 percent of pretrial programs have never validated their instruments, a statistic that has remained unchanged from 2001 to 2009. One concern, however, is that there is no standard method being used for the “validation of a risk assessment instrument.”
  12. Mona J.E. Danner, Marie VanNostrand & Lisa M. Spruance, Race and Gender Neutral Pretrial Risk Assessment, Release Recommendations, and Supervision (Nov. 2016).
  13. See Danner, et al., supra note 12 at 8. An often-cited validation study found that the Virginia Pretrial Risk Assessment Instrument was “race neutral” only when the tool’s risk factors were “weighted, summed, and collapsed” but without any public disclosure as to how the factors were weighted, summed, and collapsed. The study otherwise found “a difference in the predictive ability of … risk factors for People of Color and for Whites, with the model performing better for Whites. The AUC-ROC for Whites (.686) is higher than the AUC-ROC for People of Color (.645) and the difference is statistically significant (AUCDIFF = -.041, p= .002).” (emphasis added).
  14. Julia Dressel & Hany Farid, The Accuracy, Fairness, and Limits of Predicting Recidivism, 4(1) Sci. Advances (Jan. 17, 2018).
  15. Picard-Fritsche, supra note 1, at 10.
  16. John Logan Koepke & David G. Robinson, Danger Ahead: Risk Assessment and the Future of Bail Reform (Feb. 19, 2018). Washington Law Review, Forthcoming, available at https://ssrn.com/abstract=3041622.
  17. See Thomas H. Cohen & Brian A. Reaves, Bureau of Justice Statistics, U.S. Dep’t of Justice, Pretrial Release of Felony Defendants in State Courts (Nov. 2007), available at https://www.bjs.gov/content/pub/pdf/prfdsc.pdf (noting the rarity of true flight; only six percent of all released felony defendants still had not appeared after one year). Marie VanNostrand, Ph.D. & Gena Keebler, Pretrial Risk Assessment in the Federal Court (April 14, 2009) at 22–23 (data from the federal system showing that only 3.6 percent of released persons across risk level had a “pretrial outcome” constituting “danger to [the] community”); Shima Baradaran Baughman & Frank McIntyre, Predicting Violence, 90 Texas L. Rev. 497, 557 (2012), available at https://ssrn.com/abstract=1756506 (While arrest rates are an imperfect proxy for actual commission of violence, a study using a data set across 16 years and 75 counties found “for almost all crimes, the average rearrest rates are only about 1%–2% for a pretrial violent crime.”).
  18. This appears to be one of the correlations between “high risk” and arrest for a violent crime under the COMPAS tool. See http://criminology.fsu.edu/wp-content/uploads/Validation-of-the-COMPAS-Risk-Assessment-Classification-Instrument.pdf at 51.
  19. Santa Cruz County Probation Dep’t, Alternatives to Custody Report 2 (2015).
  20. Bernard E. Harcourt, Against Prediction: Sentencing, Policing, and Punishing in an Actuarial Age, 27–28, University of Chicago Public Law & Legal Theory Working Paper, No. 94 (2005).
  21. Shima Baradaran & Frank L. McIntyre, Predicting Violence, 90 Tex. L. Rev. 497 (2012).
  22. United States v. Salerno, 481 U.S. 739, 742 (1987) (noting that the Bail Reform Act of 1984 was passed in response to “the alarming problem of crimes committed by persons on release”) (citing S. Rep. No. 98-225, at 3 (1983)). The “alarming” rate of pretrial crime noted in passing the Bail Reform Act was the arrest of one in six defendants. S. Rep. No. 98-225, at 6 (1983).
  23. See Samuel R. Wiseman, Fixing Bail, 84 Geo. Wash. L. Rev. 417, 428–32 (2016).
  24. See generally Stack v. Boyle, 342 U.S. 1 (1951) and United States v. Salerno, 481 U.S. 739 (1987).
  25. See 1 Blackstone, Commentaries on the Laws of England (1765).
  26. Stack v. Boyle, 342 U.S. 1, 4 (1951).
  27. E.g., Bell v. Wolfish, 441 U.S. 520, 533 (1979).
  28. Stack v. Boyle, 342 U.S. 1, 8 (1951) (Jackson, J., concurring).
  29. Julia Angwin et al., Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks, ProPublica (May 23, 2016), available at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
  30. The COMPAS tool was designed for sentencing determinations, though many jurisdictions use it pretrial.
  31. Avi Feller et al., A Computer Program Used for Bail and Sentencing Decisions Was Labeled Biased Against Blacks. It’s Actually Not That Clear, Wash. Post (Oct. 17, 2016), available at https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/?utm_term=.c9281bfbbf97 (determining the incompatibility between the two fairness metrics is “mathematically guaranteed”).
About the Authors

Brandon Buskey is the Deputy Director of Smart Justice Litigation at the American Civil Liberties Union.

Brandon J. Buskey
American Civil Liberties Union
New York, NY
212-607-3300
bbuskey@aclu.org 

Andrea Woods is a Staff Attorney with the American Civil Liberties Union’s Criminal Law Reform Project. She was previously an Equal Justice Works Fellow at the ACLU.

Andrea Woods
American Civil Liberties Union
New York, NY
212-607-3300
awoods@aclu.org