The case appeared routine: a land dispute in Andhra Pradesh, a commissioner assigned to survey the property, and objections to be decided. The judge resolved it by citing four legal precedents. The problem: none of those precedents existed.
All four had been generated by an AI tool — plausible‑sounding judgments with case names and legal reasoning created from thin air. The error emerged on appeal and reached India’s Supreme Court. In late February, a bench said a ruling based on fabricated AI citations was not merely “an error in the decision‑making process.” It was “misconduct.” Notices were sent to the attorney general, solicitor general and the Bar Council of India, which licenses roughly 1.8 million lawyers.
“It is not a question of whether we should integrate AI or not but it is the question of how far the due diligence should be,” said Sindoora VNL, a lawyer for the defendants. “The court indicated this might be a question of misconduct. Now we have to see how far they are willing to take it.”
India is not alone in confronting such dilemmas. Courts worldwide — from well‑resourced to underfunded systems — are quietly adopting AI tools faster than governance frameworks can keep up. In 2023 a Colombian judge included a ChatGPT transcript in a ruling about medical treatment for an autistic child, saying the tool had “assist[ed], not replace[d]” his reasoning. Months earlier, two New York lawyers were sanctioned after filing a brief that cited six cases invented by the chatbot.
India’s most notable AI moment was not a scandal but a candid admission. In March 2023 a judge of the Punjab and Haryana High Court paused a bail hearing in a murder case to type a question into ChatGPT, seeking wider context on bail jurisprudence when assault involved cruelty. The judge denied bail and disclosed that he had consulted the chatbot. That transparency became the story: advocates warned ChatGPT can invent facts and reflect biases from its training data.
“AI cannot replace human conscience in justice delivery,” said Mimansa Ambastha, founder of Starlex Consultants and a strategic counsel on AI and cybersecurity. “The danger is that the balance between assistance and deference can slip. And when it slips in a bail hearing, a person’s liberty is at stake.”
Bail in India is not a formality. Hundreds of thousands are held as undertrial prisoners — accused but not convicted — often spending years behind bars while cases crawl forward. To grasp the pressure that drives AI experimentation: roughly 55 million cases are pending across India’s judiciary, from the Supreme Court to district courts where judges may manage hundreds of active files. More than 180,000 cases have been unresolved despite being in trial for over 30 years. Last year the Uttar Pradesh High Court acquitted three men who had spent 38 years in prison for a 1982 murder. A 2018 government paper estimated that, at then‑current rates, it would take 324 years to clear the backlog.
AI’s promise of speed is alluring in this crisis. But Ambostha cautions the conditions foster unchecked adoption. “The judiciary must always choose surety over speed,” she told DW. Even top judges recognize complications: Chief Justice Surya Kant observed that AI is paradoxically adding work, as court staff must now verify whether AI‑generated citations actually exist before proceedings continue.
Beyond hallucinated citations lies a deeper worry: AI can inherit and amplify biases embedded in legal data. Legal datasets built from decades of judgments, police records and filings reflect societal inequalities. Models trained on these materials can reproduce those patterns in new outputs.
“AI systems do not create bias out of thin air. They replicate what they are trained on,” Ambastha said. “If historical data contains discrimination, the model will absorb it and present the output as if it were objective.”
That danger is acute in criminal cases, where algorithmic assessments might influence bail, sentencing or perceived recidivism risk. India’s prison data already reflects stark disparities: according to the National Crime Records Bureau, Muslims make up about 14.2% of the population but roughly 18.7% of undertrial prisoners; Dalits account for about 16.6% of the population but around 21% of undertrials. If predictive systems are trained on such historical policing or incarceration data, they may treat those disparities as indicators of risk rather than evidence of structural inequality.
Research on large language models used in India has found the systems can reproduce caste and religion stereotypes present in their training sets. Matheus Puppe, a Brazilian lawyer and researcher of AI and law, warns that algorithmic outputs can appear unduly authoritative, leading judges and lawyers to treat machine analysis as neutral because it is computational. “The concern is that AI may reproduce structural distortions embedded in legal systems,” Puppe said. “Once those patterns are translated into algorithms, they gain a veneer of scientific legitimacy.”
Brazil illustrates both benefits and risks. The country has integrated AI to manage massive caseloads — grouping similar petitions, identifying litigation patterns and automating routine steps — particularly effective where cases are high‑volume and predictable. But Puppe and other researchers emphasize that such tools should be operational aids, not decision‑makers.
Despite risks, courts are experimenting with assistive AI rather than banning it. Sudipto Ghosh, founder of the judicial large language model InLegalLLaMA, says his model is trained on Indian statutes, judgments and procedural law to retrieve relevant case law, generate summaries and help draft basic arguments. “The system is trained to understand the structure of Indian law,” Ghosh told DW. “It can map a query to applicable statutes and precedents, which is where much of the time in litigation is actually spent.”
India’s Supreme Court e‑committee has developed SUPACE, an AI research assistant intended to help judges sift large volumes of case law, extract relevant passages and present them accessibly. Officials stress SUPACE does not make recommendations or decisions but serves as a backend aid to improve efficiency.
Other jurisdictions have moved farther: Brazil’s judiciary uses AI tools to group filings and automate procedural steps, improving throughput in repetitive cases. Ricardo Augusto Ferreira e Silva, who studies AI in Brazil’s courts, said these systems help handle scale but must remain operational, not decisional.
Even proponents stress verification. Ghosh acknowledges models like InLegalLLaMA can still produce confident but incorrect outputs if used without checks. “You can reduce time, you can improve access,” he said. “But you cannot outsource judgment.”
The Indian judiciary’s response to AI missteps — from the Supreme Court’s finding of “misconduct” to growing discussion among judges, lawyers and regulators — reflects a wider global challenge: how to harness AI’s efficiencies without surrendering accountability, fairness or the moral judgments central to justice.
This article was supported by the Tarbell Center for AI Journalism.
Edited by: Srinivas Mazumdaru