Writy.
  • Home
No Result
View All Result
Writy.
  • Home
No Result
View All Result
The AGI News
No Result
View All Result

GPT-4’s Performance in Educational Assessment Benchmarked Against Specialized Models

September 19, 2023
Educational Assessment Benchmarked

View of a Cyborg hand holding a bulb lamp idea concept with start up icon connected 3d rendering

Share on FacebookShare on Twitter

The research focused on the application of AI in educational assessment, a domain that has increasing interest due to its potential to revolutionize large-enrollment courses. The study’s premise was to understand how GPT-4, a general-purpose tool, fares against specialized models in grading short-answer responses. It shows the performance of GPT-4, a pre-trained Large Language Model (LLM), in the field of Automated Short Answer Grading (ASAG).

Two benchmark datasets, SciEntsBank and Beetle, which encompass general science questions for grades 3 to 6 and queries related to basic electricity and electronics, respectively, were used. The research examined GPT-4’s ability to grade based on alignment with a reference answer and, intriguingly, without it. The latter assessment required GPT-4 to draw upon its extensive training to independently judge the correctness of a student’s response.

The findings revealed that GPT-4’s performance was  robust. In the SciEntsBank dataset, it achieved its best results on the 2-way task, with an F1 score of 0.744. However, it was the Beetle dataset that presented an unexpected outcome. Here, GPT-4 performed better when the reference answer was withheld, achieving an F1 score of 0.651.

With specialized ASAG models, GPT-4’s performance was reminiscent of hand-engineered systems from half a decade ago. Models from the BERT family, which undergo both pre-training and task-specific training, still outpace GPT-4.The research brings to light the phenomenal advancements in deep-learning models for ASAG in the last five years. While GPT-4’s capabilities are impressive, especially without requiring reference answers, the BERT family’s models have showcased the benefits of task-specific training.

One significant takeaway from Dr. Kortemeyer’s study is GPT-4’s potential in higher education. Preliminary indications suggest that automated grading of comprehensive content, extending beyond short answers, is achievable. However, concerns around data security and privacy with cloud-based models like GPT-4 persist. Alternatives such as Llama 2, which can be locally installed, are being explored, though they currently lag behind GPT-4 in performance.|

As AI continues its foray into educational assessment, the balance between performance, adaptability, and data security remains a pivotal point of discussion. Only time will tell which models will emerge as frontrunners in this evolving landscape. Read full paper.

Related News

artificial intelligence and neuroscience

Integration of LLMs and Neuroimaging Sheds Light on Cognitive Processes in Reading Comprehension

September 28, 2023
RankVicuna

Researchers Introduce RankVicuna, An Open-Source Model Elevating Zero-Shot Reranking in Information Retrieval

September 27, 2023
CS1 Coding Tasks and Learning Trajectories

LLM-Based Code Generators on CS1 Coding Tasks and Learning Trajectories

September 26, 2023
Speech Data Processing

Speech Technology with Tencent AI Lab’s AutoPrep for Optimal Unstructured Speech Data Processing

September 26, 2023
Load More
Next Post
Auto-Completed Smart Contract Code

Researchers Fine-Tune LLMs to Reduce Vulnerabilities in Auto-Completed Smart Contract Code

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

© 2023 AGI News All Rights Reserved.

Contact: community@superagi.com

No Result
View All Result
  • Home

Sign up for Newsletter