JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at events@ibpsa.us.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Session

Technical Session 4: Generative AI and LLM Applications in Building Energy Modeling

Time:

Wednesday, 20/May/2026:

1:30pm - 3:00pm

Session Chair: Cary Faulkner

Location: Lake Minnetonka

Hyatt Regency Minneapolis 4th floor

Session Topics:

Generative AI

This session qualifies for AIA continuing education credits. Please confirm your attendance by completing the form here.

Presentations

1:30pm - 1:45pm

Eplusout Model Context Protocol (MCP): Large Language Model (LLM)-Enabled Simulation Results

Michael Sweeney¹, Ryan Meyer², Supriya Goel¹

1: Pacific Northwest National Laboratory, United States of America; 2: 2050 Partners

EnergyPlus simulations produce diverse outputs across CSV, HTML, text logs, and SQLite databases, making cross-format analysis challenging and time-consuming for practitioners. Large language models (LLMs), when paired with structured tool interfaces, offer a new way to streamline this process. This paper introduces EPlusOut-MCP, a Model Context Protocol (MCP) server that enables natural language queries across EnergyPlus outputs. By exposing structured access to simulation results and documentation, the system supports tasks such as time-series extraction, sizing validation, and quality control. Case studies demonstrate how EPlusOut-MCP improves transparency, accelerates QA/QC workflows, and lowers barriers to advanced analysis for both researchers and practitioners.

1:45pm - 2:00pm

LLM-Informed Efficient Bayesian Calibration of Building Energy Simulation Models under Limited Data

Yangxi Bai, Endong Wang

State University of New York College of Environmental Science and Forestry, United States of America

Efficient Bayesian calibration of energy simulation models under limited data remains challenging, particularly identifying key parameters and defining priors. This study integrates Fractional Factorial Design (FFD) with LLM-informed priors within a Bayesian calibration framework. FFD efficiently identifies crucial parameters, while LLM generates physics-informed priors. Using three-year monthly utility data from a campus building, LLM-informed priors improve modeling accuracy by 22% and achieved 2.4 times faster convergence than literature-based priors. A comparison with matched informativeness indicates that these gains primarily arise from improved prior location. The proposed approach supports practical calibration when only monthly utility data are available without sub-metering.

2:00pm - 2:15pm

Evaluating Prompt Engineering in Large Language Models (LLMs) for Transforming Occupant Behavior Modeling

Shundong Li, Nan Ma

Worcester Polytechnic Institute, United States of America

Social media contains user-generated narratives that reveal timing, duration, and context of occupancy, providing behavioral information unavailable from conventional sensing systems. This study applies prompt-engineered large language models (LLMs) to extract spatiotemporal occupant behavior from social media text. Six prompt designs across three few-shot configurations and three LLMs were tested, with GPT-4.1 using structured reasoning and prompt specificity achieving the highest accuracy (F₁ = 0.649). When incorporated into building energy simulations, LLM-derived, socially informed schedules show that DOE prototype schedules underestimate energy use by ~3.5% and overlook extended occupancy periods and load timing, indicating potential grid stress beyond typical operating hours.

2:15pm - 2:30pm

BEMEval-Doc2Schema: Benchmarking Large Language Models for Structured Data Extraction in Building Energy Modeling

Yiyuan Jia¹, Xiaoqin Fu², Liang Zhang²

1: Independent; 2: University of Arizona

Recent advances in foundation models, including large language models (LLMs), have created new opportunities to automate building energy modeling (BEM). However, systematic evaluation has remained challenging due to the absence of publicly available, task-specific datasets and standardized performance metrics. We present BEMEval, a benchmark framework designed to assess LLM performance across BEM tasks. The first benchmark in this suite, BEMEval-Res, focuses on structured data extraction from residential building documentation, a foundational step toward automated BEM processes.

BEMEval-Res introduces the Key–Value Overlap Rate (KVOR), a metric that quantifies the alignment between LLM-generated structured outputs and ground-truth schema references. Using this framework, we evaluate two leading models (GPT-5 and Gemini 2.5) under zero-shot and few-shot prompting strategies across three datasets: HERS L100, NREL iUnit, and NIST NZERTF. Results show that Gemini 2.5 consistently outperforms GPT-5, and that few-shot prompts improves accuracy for both models. Performance also varies by schema: the EPC schema yields significantly higher KVOR scores than HPXML, reflecting its simpler and reduced hierarchical depth.

By combining curated datasets, reproducible metrics, and cross-model comparisons, BEMEval establishes the first community-driven benchmark for evaluating LLMs in performing building energy modeling tasks, laying the groundwork for future research on AI-assisted BEM workflows.

2:30pm - 2:37pm

Eppy-LLM: An Agentic Workflow for Language-Driven Building Energy Modeling And Optimization Using EnergyPlus

Huiwen Zhou, Liang Zhang

University of Arizona, United States of America

This study introduces Eppy-LLM, a lightweight multi-agent framework that integrates large language models (LLMs) with EnergyPlus to support interpretable, reproducible, and adaptive analysis based on building energy modeling. The system translates natural-language design objectives into deterministic simulation workflows through a structured sequence of three agents: a semantic interpreter, a rule-based orchestrator, and an analytical feedback module. Validated using NREL’s iUnit building model, the framework achieved 100% syntax-valid simulations and demonstrated adaptive parameter selection across diverse goals, including cooling load reduction, daylight optimization, and total energy minimization. These results underscore the feasibility of bridging natural-language reasoning with physics-based simulation, paving the way for human-centered and explainable design automation.

2:37pm - 2:45pm

Combining Generative Modeling and Advanced Control for Building Scenario Generation

Dylan Wald, Rawad El Kontar, Deepthi Vaidhynathan

National Renewable Energy Laboratory, United States of America

Buildings make up a large portion of energy consumption in the U.S. today. Understanding their energy consumption patterns can improve their efficiency, but requires detailed models that rely on incomplete or unknown information. Previous work has shown that artificial intelligence (AI) can be used to predict missing information and even suggest upgrades to improve building efficiency. However, building upgrades may require undesirable upfront costs. Oppositely, advanced control could improve building efficiency with negligible upfront cost. To explore the tradeoffs between these two approaches, in this work we propose a workflow to compute optimal temperature setpoints schedules to minimize energy consumption and operational cost. Results show that modifying the temperature setpoints in a building using model predictive control (MPC) can effectively reduce its energy consumption and operational cost. This optimal operation cannot fully meet a desired goal. However, we show that by considering MPC in addition to component upgrades, a desired goal can be met with significantly less upfront costs.

2:45pm - 2:52pm

Generalized and Localized AI Models for Urban Energy Characterization: A Comparative Analysis for UBEM Inputs Inference

Rawad El Kontar¹, Maryam Almaian², Dylan Wald¹, Deepthi Vaidhynathan¹, Ryan King¹

1: NLR, United States of America; 2: GaTech, United States of America

Urban Building Energy Models (UBEMs) reply on complete and accurate data, however real-world datasets are often sparse. This paper presents a comparative analysis of localized and generalized data-driven models for inferring missing UBEM inputs. Using ResStock data and open spatial information, we evaluate multimodal neural networks and diffusion-based generative models for predicting building characteristics across categories such as envelope, equipment, and usage. Results show that localized models achieve higher accuracy but require retraining for each configuration, increasing computational cost, whereas generalized and generative models enable scalable, transferable, and adaptive characterization workflows. The proposed framework supports automated UBEM generation and scenario analysis to accelerate urban energy planning.

2:52pm - 3:00pm

Extracting Data from Construction Drawings with Multimodal Generative AI

Ryan Dubois

Columbia University, United States of America

Automated information extraction from construction drawings is a critical yet challenging task. While traditional approaches, from image processing to deep learning, have made great progress, they often suffer from inconsistent performance, extensive task-specific training, and narrow scope. Large Multimodal Models (LMMs) offer a promising alternative due to their vast knowledge bases and instruction-following capabilities, however, they struggle with spatial reasoning and dense high-fidelity images. This preliminary study evaluates the performance of two off-the-shelf Multimodal Generative AI (GenAI) workflows against two specialized architectures: an image-based Retrieval Augmented Generation (RAG) system and an image-based Multi-Agent workflow. By evaluating these workflows against a targeted set of benchmarking questions, this work provides a preliminary assessment of current GenAI capabilities in drawing interpretation and highlights specific areas for improvement in developing more robust automated workflows.