Researchers from the Ubiquitous Knowledge Processing Lab (UKP Lab) at the Technical University of Darmstadt have unveiled an approach to improve the performance of smaller language models (SLMs) in the field of extractive question answering.
The team, led by Rachneet Sachdeva, Martin Tutek, and Iryna Gurevych, has delved into the potential of large language models (LLMs) for data augmentation. Their findings, published in the recent paper titled “CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration,” suggest that training SLMs with counterfactual (CF) instances – minimally altered input data – can significantly boost their out-of-domain (OOD) performance.
Counterfactual instances are a pivotal focus of the research. The team employed LLMs to automatically generate these instances to augment the training data of SLMs. The paper underscores that such data augmentation consistently heightens OOD performance and refines model calibration.
The research showcases the inherent correlation between the diversity of CF instances and the subsequent performance improvements in models. The diversity, in terms of both surface form and semantic content, stands out as a defining factor for achieving robustness against spurious correlations and bridging data distribution gaps.
The methodology, revolving around the Retrieve-Generate-Filter (RGF) approach, is also detailed in the paper. The researchers have further introduced innovative techniques, named Solo-QAG and Duo-QAG (Dual-Phase Question-Answer Generation), for efficient counterfactual generation. These methodologies, coupled with filtering steps for quality assurance, ensure high-standard, diverse, and relevant CF instances.In addition to the enhancement in OOD performance, the team has shed light on the realm of model calibration.
The research illustrates that models trained with CF augmented data exhibit better-calibrated prediction probabilities, thus further ensuring their reliability in real-world scenarios. One of the intriguing findings emphasizes the rationale-augmented calibrator models’ preference for concise explanations over comprehensive ones. The team believes this inclination stems from the diverse yet overlapping nature of Solo- and Duo-QAG-generated counterfactual instances, forcing models to discern key input features.
As the technology world rapidly evolves with advancements in artificial intelligence and machine learning, such breakthroughs from esteemed institutions like the Technical University of Darmstadt provide a beacon of progress and a glimpse into the future of computational linguistics. Read paper.