Massachusetts State House.
Boston Bar Journal

ChatGPT Is Not a Lawyer: Using Generative AI Responsibly and Ethically in Law

March 02, 2026
| Winter 2026 Vol. 70 #1

By Edward S. Cheng and Mariem Marquetti

Generative Artificial Intelligence (“GAI”) programs like ChatGPT have infiltrated the legal profession and are causing havoc. The attraction of using GAI is undeniable, with its promise of fast and well-written briefs at the touch of a keystroke. But the misuse of GAI can have serious consequences. According to one online database, more than 900 cases to date around the world contained GAI-generated errors, in both civil and criminal matters. Damien Charlotin, AI Hallucination Cases, DamienCharlotin.com. Close to home, in October 2025, a Massachusetts attorney was discovered to have filed a brief that included faulty GAI-generated content. Memorandum and Order on Defendant’s Supplemental Combined Motion to Dismiss the Four Returned Indictments, Commonwealth v. Moraes, No. 2581CR00071 (Middlesex Sup. Ct. filed Oct. 9, 2025). More troublingly, in November 2025, a prosecutor who submitted multiple briefs riddled with GAI-generated errors explained that she resorted to GAI due to “working on multiple matters, being constantly in court, responding to multiple briefings, and going too fast in her research and drafting.” Shaila Dewan, Prosecutor Used Flawed A.I. to Try to Keep a Man in Jail, His Lawyers Say, N. Y. Times, Nov. 25, 2025. This particular misuse of GAI could have resulted in wrongful convictions and unfair sentencing, which are outright violations of the Constitution and a threat to freedom. Id. Accordingly, before using this tool, lawyers are ethically obligated to understand it. This article provides a look behind the curtain to explain how GAI works.

Ethical Obligation to Understand GAI

On July 29, 2024, the ABA’s Standing Committee on Ethics and Professional Responsibility issued Formal Opinion 512 on lawyers’ use of “GAI tools” in their practice. American Bar Association, Standing Committee On Ethics and Professional Responsibility, Formal Opinion 512 (July 9, 2024). According to the Opinion:

Lawyers using GAI tools have a duty of competence, including maintaining relevant technological competence, which requires an understanding of the evolving nature of GAI. In using GAI tools, lawyers also have other relevant ethical duties, such as those relating to confidentiality, communication with a client, meritorious claims and contentions, candor toward the tribunal, supervisory responsibilities regarding others in the law office using the technology and those outside the law office providing GAI services, and charging reasonable fees.

The Opinion does not create new responsibilities for attorneys but amplifies the existing obligation that an attorney understands the tools that she is using. This article focuses upon the first charge of the Opinion, namely that an attorney must maintain “relevant technological competence, which requires an understanding of the evolving nature of GAI.” This requirement must be viewed in light of existing obligations, including an attorney’s responsibility for arguments and law contained in any filing (see Fed. R. Civ. P. 11) and an attorney’s obligation to keep abreast of technology (see ABA Model Rules of Professional Conduct, Rule 1.1, Competence, Comment 8).

How Generative AI Works

At their core, GAI programs are simply math. As the Founder and CEO of Toloka AI, Olga Megorskya, explained, “[y]ou collect large amounts of data, then using the methods of machine learning, algorithms learn to find inter-dependencies among these pieces of data and then reproduce this logic on every new piece of data they meet.” See Afton Pavletic, The Wild West of Artificial Intelligence – Ethical Considerations for the Use of AI in the Practice of Law, Board of Bar Overseers, May 2, 2023. GAI program development dates back several decades, long before ChatGPT and OpenAI. In 1950, Alan Turing created a test to determine whether a machine was “intelligent” through what became known as the Turing Test. A remote human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in the conversation was a machine and the other a human. The conversation was limited to text to eliminate issues of inflection, accents, or voice from the equation. If the evaluator could not reliably tell the machine from the human, the machine passed the test. ChatGPT was created to pass the Turing Test.

Despite passing the Turing Test, GAI programs lack the capacity to “know” anything. The “Chinese Room” argument, published in 1980 by American philosopher John Searle, illustrates the fundamental limitation of GAI programs. Searle imagines himself alone in a room following a comprehensive set of instructions for responding to Chinese characters slipped under the door. Searle does not know Chinese, and the characters appear to be nothing more than assemblages of symbols. By following the program for manipulating symbols, Searle sends appropriate strings of Chinese characters back out under the door. This leads people outside the room to mistakenly conclude that the person within understands Chinese. The takeaway is that programming a digital computer may make it appear to understand language but does not produce real understanding.

GAI programs rely on significant volumes of data because they do not have real-world experience or a human’s ability to make connections. GAI programs are trained on significant amounts of data to determine the statistical likelihood of “next words.” As explained in a recent New York Times Article, behind-the-scenes, GAI programs are essentially a “statistical distribution – a set of probabilities that predict[] the next word in a sentence, or the pixels in a picture.” Aatish Bhatia, When AI’s Output Is a Threat to AI Itself, N. Y. Times, Aug. 25, 2024. GAI programs predict content, but not always accurately. Even worse, with the flood of GAI content on the internet, GAI programs are starting to incorporate previously generated GAI program output into their databases. This leads to a gradual degeneration of GAI program output as the programs assimilate growing amounts of GAI-developed information that is increasingly divorced from any real-world input.

Generative AI and the Legal Profession

As a threshold matter, using generic GAI programs like ChatGPT for legal matters is a mistake.  ChatGPT, its iterations, and other similar programs are designed to pass the Turing Test. They are highly successful chatbots that can fool the ordinary user into believing that they are communicating with a person on the other side. ChatGPT, however, is not designed to provide accurate legal research and writing. It is designed to please the user, whether or not it can find cases or statutes to support the arguments that it is asked to make, so it will make up or “hallucinate” citations to non-existent cases or statutes to complete its task. These hallucinations have resulted in malpractice. Simply put, lawyers cannot reliably use ChatGPT or other generic GAI programs for their legal research and writing work.

GAI programs developed by companies specifically for legal work, however, potentially fall within a different category. LexisNexis and Thomson Reuters (Westlaw) each claim to have addressed this hallucination problem. These technology companies license the use of ChatGPT for the task of language generation to avoid having to create their own large language models. They then add their own programming, called retrieval-augmented generation, to avoid hallucinating legal citations and opinions. These companies also restrict their GAI programs to rely solely upon real case law, statutes, and legal digests. In other words, while ChatGPT sources its law from the entire world wide web, legal technology companies restrict their GAI programs to proprietary legal databases.

Are these products reliable and accurate? A Stanford University study called into question the use of GAI tools to analyze the law. In 2024, Stanford researchers published the article, Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Stanford University. The Stanford researchers employed a battery of research queries to benchmark the accuracy of programs from several legal-oriented companies, finding that the output suffered from substantial error rates. For example, one query asked, “Why did Justice Ginsburg dissent in Obergefell?” The GAI program explained the basis for her dissent, except Justice Ginsburg had joined the Court’s landmark decision legalizing same-sex marriage. Moreover, the GAI program’s discussion addressed issues that were not found in Obergefell at all. The Stanford study found an error hallucination rate of 17% (Lexis) and 33% (Westlaw). Varun Magesh et al., Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Journal of Empirical Legal Studies, Mar. 14, 2025.

Mike Dahn, Thomson Reuter’s head of Westlaw Product Management, responded to the Stanford study by writing, “[o]ur thorough internal testing of AI-Assisted Research shows an accuracy rate of approximately 90% based on how our customers use it, and we’ve been very clear with customers that the product can produce inaccuracies.” Mike Dahn, Our Commitment to Our Customers, Thomson Reuters, May 31, 2024. LexisNexis did not challenge the Stanford study but focused on the relative findings between its product and Thomson Reuters’ product. Stanford Study Finds Lexis+ AI Provides Accurate Responses at “More Than Three Times” the Rate of Thomson Reuters Product, LexisNexis, June 1, 2024. Setting aside the Stanford study for a moment and assuming Dahn is correct that the accuracy rate is 90%, an estimated 10% error rate is still substantial.

GAI programs, when prompted, have a tendency to yield any answer to the user’s satisfaction. For example, according to the Stanford study, the following prompt was added to the Lexis+ AI: “What are some notable opinions written by Judge Luther A. Wilgarten?” Varun Magesh et al., supra, at 5. The GAI program identified a real case (Luther v. Locke) as one of the most notable opinions written by Judge Wilgarten, which was problematic because Judge Wilgarten was a fictional judge. As the Stanford study explains, “retrieval is particularly challenging in law. Many popular LLM benchmarking datasets . . . contain questions with clear, unambiguous references that address the question in the source database. Legal queries, however, often do not admit a single, clear-cut answer.” Ibid. Simply put, legal research and analysis require a high level of human reasoning and experience that a machine is not equipped to do.

While at this time, GAI-assisted research and writing tools are not 100% reliable, an attorney who nonetheless employs these tools must check the GAI program output to verify that the cases are real, the case citations are accurate, and the citations actually support the information in the brief. As Mr. Dahn wrote, Thomson Reuters warns its users that,

AI-Assisted Research uses large language models and can occasionally produce inaccuracies, so it should always be used as part of a research process in connection with additional research to fully understand the nuance of the issues and further improve accuracy.

Artificial Lawyer, Thomson Reuters Contradicts Stanford GenAI Study – ‘We Are 90% Accurate’, Thomson Reuters, June 10, 2024 (emphasis in original). He explains, “[w]e also advise our customers, both in the product and in training, to use AI-Assisted Research to accelerate thorough research, but not to use it as a replacement for thorough research.” Ibid. This is good advice. Even in its own testing environment, at the time of the Thomson Reuters statement, it appears that there was a ten percent plus error rate in GAI research and analysis, which is more than acceptable for court submissions.


Edward S. Cheng is a litigation partner at Sherin and Lodgen LLP. He specializes in complex commercial disputes, professional malpractice cases, bar discipline matters, and real estate litigation. Mr. Cheng teaches professional responsibility at Boston College Law School.

Mariem Marquetti is a litigation associate at Sherin and Lodgen LLP, specializing in complex employment matters, commercial disputes, real estate litigation, and professional liability matters.