Recent research has spotlighted an innovative approach to address challenges in data transformation within the building sector. Recognizing the sector’s considerable energy consumption—40% of the total US energy use in 2022—the study emphasized the critical need for efficient data harmonization from varied sources.
Introducing SQLMorpher: a proposed tool employing large language models (LLMs) to generate SQL code, bridging the data transformation gap. The system harnesses the renowned strengths of LLMs in reasoning, coding, and zero-shot learning to generate and apply SQL codes from source to target datasets.
However, the implementation of LLMs is not without challenges. Schema alterations, the need for a unified prompt, and concerns over code accuracy remain areas of focus. To navigate these challenges, SQLMorpher has been designed with an iterative loop, consisting of a prompt generator, a SQL execution engine, and an optimization tool. This ensures that SQL queries are continuously refined for accuracy.
For validation, researchers benchmarked SQLMorpher against 105 real-world data transformation scenarios within the smart building domain, sourced from several energy companies. The tool was also tested on other benchmarks outside the building sector to assess its adaptability.
With performance metrics based on execution accuracy, column similarity, and iteration counts, SQLMorpher emerges as a promising candidate to reshape the future of automated data transformation in the building sector and beyond. Read paper.