The most read Vietnamese newspaper

| Contact us | Follow us on

World

Stellar or so-so? ChatGPT bar exam performance sparks differing opinions

By Reuters May 31, 2023 | 07:35 pm PT

The logo of OpenAI is displayed near a response by its AI chatbot ChatGPT on its website, in this illustration picture taken February 9, 2023. Photo by Reuters/Florence Lo

When Open AI rolled out the latest version of ChatGPT in March, one particular statistic put the legal profession on its heels: The artificial intelligence technology outperformed nine out of 10 human test-takers.

But a Ph.D. candidate at the Massachusetts Institute of Technology now says that GPT-4’s bar exam performance that put it in the 90th percentile of test-takers has likely been overstated, and that the chatbot actually lands in the neighborhood of the 68th percentile of real test-takers—a conclusion the original researchers reject.

The percentile debate is centered on how the researchers who first looked at the GPT-4’s bar exam performance calculated its score percentile—wrote Eric Martinez in a new paper titled, "Re-Evaluating GPT-4’s Bar Exam Performance."

GPT-4’s Uniform Bar Exam score of 297 would have landed the program in the 90th percentile among those who took Illinois’ February 2019 bar exam—the benchmark cited by the original researchers. But Illinois’ July bar exam would have yielded a more accurate comparison because the February exam typically draws a larger percentage of those retaking the exam after failing and who have lower scores, Martinez wrote.

Measured against a recent July exam in Illinois, GPT-4 would have scored in the 68th percentile, he concluded.

"The fact that GPT-4’s reported ‘90th percentile’ capabilities were so widely publicized might pose some concerns that lawyers and non-lawyers may use GPT-4 for complex legal tasks for which it is incapable of adequately performing," Martinez wrote.

Chicago-Kent law professor Daniel Martin Katz and Michigan State law professor Michael James Bommarito, who conducted the original research alongside two others from legal AI company Casetext, said this week that they stand by their conclusions and the 90th percentile finding.

However, Katz and Bommarito said they plan to "correct points of confusion and misunderstanding that have arisen in public discourse" in the upcoming final version of their research paper. The draft version published in March focuses on GPT-4’s overall score, with the percentile conversion only appearing in a footnote.

Open AI did not immediately respond to requests for comment Wednesday.

The differing pass rates on the February and July bar exams can be dramatic. For example, the pass rate on Illinois' most recent July exam was 68%, compared with 43% for February.

Martinez, Bommarito and Katz all agree that converting GPT-4’s Uniform Bar Exam score into a percentile is complicated by the fact that the National Conference of Bar Examiners, which designs the exam, does not publicly release score distributions, nor do states on a regular or consistent basis.

Katz and Bommarito said that their 90th percentile conclusion is conservative, because they threw out GPT-4’s high essay scores and because they used pre Covid-19 pandemic results for comparison. Anecdotal evidence suggests that law student learning suffered amid the pandemic, they said.

Share on Facebook Share on Twitter

Hong Kong actress Carina Lau sells Shanghai villa for $9.4M

Vietnam exit U23 Asian Cup following defeat to Iraq in quarterfinals

Indonesia keeper apologizes for provocative dance in front of South Korea player

I’m disappointed with my parents-in-law for not lending me money to purchase land

Dollar gains on black market

Malaysia detects parasitic worms in 16 tons of canned fish imported from China

Crowds depart Hanoi, HCMC for long weekend holiday

Uzbekistan knock out titleholder Saudi Arabia in U23 Asian Cup football

Intermittent fasting key to Chinese actress Liu Yifei’s 6 kg weight loss

Vietnamese teen turns seashell souvenirs into Yale scholarship success story

Stellar or so-so? ChatGPT bar exam performance sparks differing opinions

Crime

Traffic

Environment

Education

Money

Exchange rate

Economy

Companies

Markets

Property

DataSpeaks

Guide

Food & Recipes

Places

Wellness

Love

Vogue

Celebrities

Arts

Trend

Other sports

Golf

Tennis

Marathon

Boxing

Football

Readers' Views