AI Can Assist Human Judges, But It Can’t Replace Them (Yet)
Perhaps courts could try using AI in minor cases—after obtaining consent from all parties—to get a better sense of its possibilities and limitations.
Welcome to Original Jurisdiction, the latest legal publication by me, David Lat. You can learn more about Original Jurisdiction by reading its About page, and you can email me at davidlat@substack.com. This is a reader-supported publication; you can subscribe by clicking here.
A version of this article originally appeared on Bloomberg Law, part of Bloomberg Industry Group, Inc. (800-372-1033), and is reproduced here with permission.
Over the past few years, I’ve read many articles and attended numerous panels about whether artificial intelligence could replace lawyers. But what about judges—could AI replace them?
A 2013 paper by two Oxford University professors ranked more than 700 occupations based on their “probability of computerization,” with the #1 job being the least computerizable (recreational therapist) and the #702 job being the most computerizable (telemarketer). Interestingly enough, judges came in at #271, and lawyers were ranked #115—suggesting it might be easier for AI to replace judges than lawyers.
Judges themselves don’t seem too worried about losing their jobs to AI. In his 2023 Year-End Report on the Federal Judiciary, Chief Justice John Roberts noted that in light of “breathless predictions about the future of Artificial Intelligence,” some observers “may wonder whether judges are about to become obsolete.” His view: “I am sure we are not.”
“I predict that human judges will be around for a while,” Roberts wrote later on in the report. And he’s far from alone in holding this opinion.
Replacing human judges with AI is more complicated and riskier than it might seem. That was my main takeaway from an event I attended last week, “The Future of Adjudication,” part of the Law2050 event series sponsored by Texas A&M University School of Law. It featured two AI experts: former federal judge Katherine Forrest, now a partner at Paul Weiss, and Cornell University law professor Frank Pasquale.
Pasquale noted that although AI tools generate work product that resembles the result of human reasoning, they don’t engage in reasoning themselves. And they certainly can’t feel, which reduces the appeal of replacing human judges with AI ones—at least if we want our judges to have empathy. For example, Pasquale said, an AI platform can’t appreciate the gravity of a process such as sentencing.
All this resonated with me. At least some research suggests that people view human judges as fairer than AI judges—the so-called human-AI fairness gap.
Forrest echoed Pasquale’s point, reminding the audience that artificial intelligence isn’t human intelligence—it “is of us, but not actually us.” The moral code that informs our legal codes emanates from a complex set of rules that humans have agreed upon over time, Forrest explained. Many developers try to design AI tools that reflect these human values, but not all developers are so conscientious.
Forrest also raised the problem of “model drift.” Humans might develop an AI model to work in line with our values, but the model can evolve in ways that its original designers might find deeply problematic.
The idea of AI as a replacement for human judgment is tempting, according to Pasquale. But the more closely one scrutinizes it, the more concerns emerge.
Despite those risks and challenges, Pasquale and Forrest expressed optimism over how AI could assist or enhance human judges’ decision-making. For example, AI tools can help judges process large amounts of information more quickly, especially in complex cases—not unlike law clerks.
They can also serve as sounding boards for judges, another duty of law clerks. It’s not uncommon for judges to solicit the views of their law clerks before deciding how to sentence a defendant. One could imagine an AI tool making such sentencing recommendations—and researchers are already exploring the possibility.
Courts are experimenting with using AI for routine administrative tasks, such as reviewing filings for conformity with court rules. But as the technology continues to improve, it could be worthwhile for courts to explore use cases for AI that involve actual judicial decision-making. Perhaps courts could try using AI in minor cases—after obtaining consent from all parties—to get a better sense of the technology’s possibilities and limitations.
Here’s one idea. I’m going to trial in traffic court next week to contest a ticket I received in January. The stakes are low, and I’m contesting the ticket more as a matter of principle. If the court had told me back in January that I could make my case to a “judge-bot” and get an immediate ruling, I would have accepted.
Sure, submitting my case to an AI judge would risk getting a decision I’d view as unfair. But the same is true of appearing before human judges, who have their own biases and blind spots. And if we still prefer human judges, maybe a court could develop a pilot program where the parties could consent to the initial use of an AI judge but have a right to appeal its ruling to a human judge.
AI tools aren’t perfect, but neither are humans. And if AI judges’ decisions can be roughly as fair as those of human judges, but the AI rulings arrive more quickly and cheaply, that’s worth exploring.
Thanks for reading Original Jurisdiction, and thanks to my paid subscribers for making this publication possible. Subscribers get (1) access to Judicial Notice, my time-saving weekly roundup of the most notable news in the legal world; (2) additional stories reserved for paid subscribers; (3) transcripts of podcast interviews; and (4) the ability to comment on posts. You can email me at davidlat@substack.com with questions or comments, and you can share this post or subscribe using the buttons below.
Thank you for surfacing this issue—I am currently clerking, and this question has been my preoccupation for the past several months. AI can write, sure. But can AI match humans in appreciating nuance? Fairness? As a side project to familiarize myself and my peers with AI’s power and limits, I created a word puzzle game that pits humans against AI, and then has a different iteration of the AI judge: Quadralgame.com. The challenge is to find the word that best fits four prompts at once, and in particular to come up with a “more perfect” word than the AI.
Quadral has already been illuminating: for example, the AI understands irony, but struggles with negation. Its judging usually makes rough sense, but it is moody—the judging feels subjective, and in fact it is (just compare the different scores for similar words, like “Swerve” versus “Swerves”). But isn’t it always?