Next week, a veteran New York lawyer of 30 years’ standing will face a disciplinary hearing over a novel kind of misdemeanour: including bogus AI-generated content in a legal brief.
Steven Schwartz, from the firm Levidow, Levidow & Oberman, had submitted a 10-page document to a New York court as part of a personal injury claim against Avianca airlines. The trouble was, as the judge discovered on closer reading, the submission contained entirely fictional judicial decisions and citations that the generative AI model ChatGPT had “hallucinated”.
In an affidavit, the mortified Schwartz admitted he had used OpenAI’s chatbot to help research the case. The generative AI model had even reassured him the legal precedents it cited were real. But he acknowledged that ChatGPT had proved to be an unreliable source. Greatly regretting his over-reliance on the computer-generated content, he added that he would never use it again “without absolute verification of its authenticity”. One only hopes we can all profit from his “learning experience” — as teachers nowadays call mistakes.
As many millions of people have discovered, ChatGPT can create extremely plausible, but highly fallible, content. When generative AI companies trumpet how their models are capable of passing legal and medical exams, it is not unreasonable for users to believe they are smarter than they are.
However, in the polite words of the computational linguist Emily Bender, these models are nothing more than “stochastic parrots”, mimicking machines designed to produce the most statistically probable — not the most accurate — answer, without any concept of meaning. Or, in the less polite words of one tech executive, they are world-class bullshit generators, as Schwartz has discovered to his cost.
Our naive faith in technology has a long history. We have a tendency to over-trust the computer, sometimes with fatal results, as is the case with “death by GPS syndrome”. Ignoring the evidence of our own eyes, car drivers have blindly followed errant GPS navigation systems on to exit ramps for highways or into the scorching heat of California’s Death Valley. A 2017 research paper identified 158 catastrophic incidents involving GPS devices, leading to 52 deaths. But, as the paper noted, what goes unrecorded are the countless cases when drivers are saved by those same devices.
Unless we are careful, future researchers might one day write papers on “death by GPT syndrome”. How far will users, and healthcare staff, unwisely rely on a chatbot for medical advice, for example? Warnings have been posted on OpenAI’s site telling users that ChatGPT can produce incorrect or misleading information and is not intended to give advice. But the World Health Organization has already seen fit to warn about errors caused by the precipitous adoption of untested generative AI systems — even if it remains enthusiastic about the technology’s longer-term potential for improving healthcare.
In spite of their alarming technological glitches, it is clear that generative AI models will have a massive impact on the legal and health professions, and many others. Smaller, domain-specific, open-source models are proliferating, threatening to automate away much routine knowledge work.
The professional services firm PwC has signed a 12-month contract with the legal tech start-up Harvey to assist its 4,000 lawyers. Harvey’s software, based on OpenAI’s latest GPT-4 model, will be used to analyse contracts and conduct due diligence. But PwC insists that the start-up will not provide direct legal advice or replace lawyers.
Another such model is run by the start-up Scissero, founded by Mathias Strasser, a former counsel at the US law firm Sullivan & Cromwell. Scissero has launched a chatbot called Mike (also the name of a fellow lawyer of Harvey’s in the television series Suits), which has been trained on real-world legal scenarios to draft emails and mark up legal documents.
Strasser argues that the core competence of lawyers is reading, interpreting and writing language. That is also the core competence of generative AI models. “The legal industry is based on words. Lawyers are outsourced word processors,” Strasser tells me. “With GPT, you can call on a whole army of paralegals.”
But just as senior lawyers should always take responsibility for the briefs written by their over-caffeinated human paralegals at 3am, so they must critically scrutinise the output of generative AI models and be aware of their flaws. Things can, and do, go wrong. Just ask the unfortunate Schwartz.