Researchers from Norwegian University of Science and Technology, Nanjing, have presented a pioneering approach aimed at addressing the vulnerabilities in auto-completed smart contract code. Their focus was primarily on Ethereum Blockchain smart contracts, given the stringent security requirements associated with these digital contracts.
Auto-completing code, while immensely beneficial in the software development process, has its drawbacks. Notably, a significant proportion of such synthesized codes, as revealed by recent studies, are riddled with vulnerabilities. In a concerning revelation, an analysis of auto-completed Python and C programs determined that approximately 40% of such synthesized codes were vulnerable to potential breaches.
The researchers’ innovative methodology, dubbed “vulnerability-constrained decoding,” attempt to diminish the generation of vulnerable code. The approach focuses on a curated dataset of previously identified vulnerable code lines. By using this data, the technique fine-tunes a state-of-the-art large language model (LLM) to not only recognize but also to avoid these vulnerabilities during the auto-completion phase.
A feature of the team’s research was the efficiency achieved during the model’s fine-tuning process. Traditional methods, which often involve re-training these complex models, could take upwards of a week, even when utilizing potent computational resources. Remarkably, the team’s new approach streamlined this process, completing it in a mere hour without sacrificing efficacy.
In subsequent evaluations, the modified model showcased a commendable reduction in the susceptibility of the generated code to vulnerabilities. Specific tests involving Ethereum smart contracts reflected a substantial reduction in vulnerabilities by 30%.
Such advancements in the realm of secure code generation are timely and critical, especially considering the escalating importance of digital security in today’s tech-driven landscape. The team’s research not only contributes a valuable methodology to the field but also lays the groundwork for future studies aiming at further enhancing security in code generation.
The researchers, acknowledging the potential and significance of their findings, have indicated the refinements in their model further. They also aim to explore the broader applicability of their approach across different technological domains, ensuring a safer and more robust digital coding environment. Check full paper here.