Writy.
  • Home
No Result
View All Result
Writy.
  • Home
No Result
View All Result
The AGI News
No Result
View All Result

Shanghai Jiao Tong University Researchers Develop SciEval, a Comprehensive Benchmark for Evaluating Large Language Models in Science

August 28, 2023
Shanghai Jiao Tong University Researchers Develop SciEval, a Comprehensive Benchmark for Evaluating Large Language Models in Science

Benchmark for Evaluating Large Language Models in Science

Share on FacebookShare on Twitter

Researchers at the Shanghai Jiao Tong University have developed an innovative benchmark, named SciEval, specifically designed to address the existing limitations in evaluating the scientific capabilities of Large Language Models (LLMs). The groundbreaking work was conducted by a team of researchers, including Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, and Kai Yu, who have detailed their findings in a comprehensive paper titled “SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research.”

In the paper, the authors highlighted the current limitations of existing benchmarks. They noted that the existing benchmarks are restricted to specific scientific disciplines, do not have evaluation systems dedicated to assessing scientific capabilities, rely exclusively on objective questions, and are plagued by the risk of data leakage. To overcome these significant limitations, the authors developed SciEval, a benchmark designed to provide a comprehensive and multi-disciplinary evaluation of LLMs. SciEval encompasses four key dimensions: basic knowledge, knowledge application, scientific calculation, and research ability. It includes a robust set of approximately 18,000 challenging scientific questions that span the fields of chemistry, physics, and biology, with each field being further divided into multiple sub-topics. Notably, the questions included in SciEval are of both objective and subjective types, a feature that sets it apart from other benchmarks. Additionally, the authors implemented dynamic data generation, a novel approach that effectively prevents potential data leakage, ensuring the fairness and credibility of the evaluation results.

The experiments conducted by the authors revealed insightful findings. While GPT-4 emerged as the strongest model among those evaluated, the results indicate that there is a substantial room for improvement across all models, particularly in the physics domain and in the analysis of experimental results. These findings underscore the necessity for continued research and development in this area. The authors expressed their hope that SciEval will serve as an effective and widely-adopted benchmark for assessing the scientific capabilities of LLMs, ultimately promoting their wide application in the scientific community. Those intrested can access the resources via the following link: SciEval on arXiv.

Related News

artificial intelligence and neuroscience

Integration of LLMs and Neuroimaging Sheds Light on Cognitive Processes in Reading Comprehension

September 28, 2023
RankVicuna

Researchers Introduce RankVicuna, An Open-Source Model Elevating Zero-Shot Reranking in Information Retrieval

September 27, 2023
CS1 Coding Tasks and Learning Trajectories

LLM-Based Code Generators on CS1 Coding Tasks and Learning Trajectories

September 26, 2023
Speech Data Processing

Speech Technology with Tencent AI Lab’s AutoPrep for Optimal Unstructured Speech Data Processing

September 26, 2023
Load More
Next Post
Researchers Develop KD-CoT Framework to Improve Reasoning Performance of Large Language Models

Researchers Develop KD-CoT Framework to Improve Reasoning Performance of Large Language Models

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

© 2023 AGI News All Rights Reserved.

Contact: community@superagi.com

No Result
View All Result
  • Home

Sign up for Newsletter