JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at 3dgeoinfosdsc2025@csis.u-tokyo.ac.jp.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

Session 3-a: 3DGeoInfo - Semantic Enrichment of 3D City Modeling

Time:

Tuesday, 02/Sept/2025:

3:45pm - 5:25pm

Session Chair: Weixiao Gao

Location: Media Hall / Kashiwa Library

Presentations

Evaluating and Enhancing Georeferencing Accuracy in BIM and 3D GIS Models for Built Environment Digital Twins

Peshawa Mohammed¹, Dympna O Sullivan¹, Avril Behan², Barry McAuley¹

¹TU Dublin, Ireland; ²SOLAS, Ireland

The development of digital twins in the built environment projects facilitates decision making during both the planning and execution phases, while also contributing to the digitalization of the built environment. Open standards models such as Industry Foundation Classes (IFC) and CityGML representing Building Information Modelling (BIM) and Geographic Information Systems (GIS) are essential components in the creation of a comprehensive digital twins. Accurate georeferencing of these models is critical for performing reliable geospatial analysis, including distance and area measurements. This factor has major effect on projects that employ a custom Coordinate Reference System (CRS), or in contexts where the distortion due to map projection is significant. Although a considerable number of studies addressed georeferencing in BIM-GIS data integration, limited has been given to the assignment of custom CRS to both IFC and CityGML models. Additionally, the effect of modelling and georeferencing methodologies on the final positional accuracy of the models remain insufficiently explored, and most of existing studies remain at the conceptual level. This study aims to address this gap by proposing a methodology to assign custom CRS to both IFC and CityGML models, while also practically examining how BIM modelling and georeferencing methodology impact the overall georeferencing accuracy of the models. The results demonstrate the effectiveness of the proposed approach in controlling the georeferencing accuracy of IFC and CityGML models employed in the generation of digital twins. The proposed approach can be utilized as a guideline for achieving accurate georeferencing when creating digital twins in the built environment projects.

Robust Joint Instance-Semantic Segmentation for Semantic Enrichment of 3D Roof Reconstruction from Noisy Labels

Valentina Schmidt^1,2,3, Martin Kada²

¹State Mapping and Surveying Office Lower Saxony; ²Technical University Berlin, Germany; ³Reimagine Spaces

Introduction

Three-dimensional (3D) building models are essential components of digital city representations, supporting applications in urban planning, environmental analysis, and disaster management. Traditional reconstruction methods rely on manually crafted geometric rules and intensive human intervention, resulting in high costs and infrequent updates. Recent advances in deep learning (DL), particularly using Convolutional Neural Networks (CNNs), have shown potential for automating 3D reconstruction. However, these methods often remain limited to research settings due to the lack of large, accurately annotated datasets capturing real-world variability.

Several studies have proposed DL-based methods for 3D building reconstruction. PolyGNN (Chen et al., 2024) employs a graph neural network for polyhedron-based reconstruction using a synthetic dataset supplemented with real-world data, while Point2Roof (Li et al., 2022) adopts a two-stage graph-based approach trained on a RoofN3D (Wichmann et al., 2019), a relatively simple dataset. Although these methods have achieved promising results in geometric reconstruction tasks, they often exhibit limited generalization when applied to real-world data due to significant domain gaps.

Public 3D building models, such as CityGML LoD2 datasets, offer opportunities to derive large-scale annotations. However, the annotations are inherently noisy due to geometric simplifications and limited semantic validation during model generation. This noise affects training quality and necessitates robust handling. Previous works (Wang et al., 2023) use self-supervision to mitigate annotation errors and focus mainly on geometry. Methods of weakly supervised learning or training with noisy labels are not yet explored in this context.

Semantic segmentation is a powerful means of enhancing scene understanding by structuring urban data and guiding interpretation. Current methods focus on building instance segmentation, treating buildings as monolithic entities or extracting roofs as planar primitives, thereby missing higher-level semantics. This restricts their ability to effectively capture the structure of complex roofs.

The specific task of roof-part instance segmentation—partitioning roofs into semantically meaningful components corresponding to canonical primitives such as gables, hips, or flats—has not been previously addressed. Nevertheless, these more detailed semantics of the roof structures can serve as prior knowledge that can effectively guide and constrain geometric processing steps, making them more targeted and reliable. Additionally, addressing these finer-grained semantic structures is critical for enabling detailed urban analysis, supporting more accurate physical simulations (e.g., wind loads, solar potential), and improving the interpretability of digital twins. However, this task is particularly challenging. Roof structures often feature complex compositions of adjacent, interleaved primitives with subtle geometric transitions. These factors complicate the reliable identification and segmentation of roof parts and require methods capable of learning robust geometric and semantic patterns. To overcome these limitations, we propose a novel approach that jointly addresses semantic and instance segmentation by explicitly targeting detailed roof structural components termed "Roof-Part Instances" (RPIs). We define an RPI as the connected subset of roof-labeled points in the 3D point cloud corresponding uniquely to a canonical roof primitive (e.g., gable, hip, flat) from a fixed taxonomy, terminating at geometric discontinuities such as ridges, valleys, eaves, or height offsets. This granularity enables more accurate semantic modeling of roof structures and supports targeted geometric processing and interpretation tasks. Semantic segmentation assigns categorical labels to each point, while instance segmentation differentiates individual object instances. Both tasks are closely interconnected, as each object instance inherently belongs to a single semantic category. MTL exploits this inherent relationship by utilizing a shared encoder that generates common representations, coupled with task-specific decoder branches, thereby enabling mutual improvement and more effective generalization across (X. Wang et al., 2019)

A distinctive contribution of our proposed method is integrating a bidirectional, multi-scale cross-attention mechanism, significantly enhancing dynamic interaction between semantic and instance decoders. Unlike previously applied pointwise fusion methods tasks (C. Zhang & Fan, 2022), cross-attention allows each decoder branch to adaptively select and weigh relevant features from its counterpart, promoting consistency and coherence between semantic predictions and instance embeddings. This mechanism effectively addresses boundary ambiguities and inconsistencies, particularly in scenarios involving densely arranged instances belonging to identical semantic classes.

2. Method and Dataset

Our method leverages typical architectures used for multitask learning (X. Wang et al., 2019). We adopt an encoder-decoder architecture utilizing a shared encoder for unified feature representation, paired with dedicated semantic and instance decoder branches. Our architecture specifically employs ConvPoint (Boulch, 2020) to extract hierarchical, fine-grained geometric details from local point neighborhoods. Additionally, we introduce a novel multi-scale cross-attention mechanism to dynamically integrate features from semantic and instance branches, effectively resolving boundary ambiguities and maintaining consistency across closely positioned instances of the same semantic class. To support multitask learning, we design a multi-objective loss function that includes the Generalized Cross-Entropy loss (Z. Zhang & Sabuncu, 2018) for semantic segmentation and a discriminative loss for learning instance embeddings. This formulation allows the model to simultaneously optimize for accurate classification and effective instance distinction. Training is conducted using the AdamW optimizer with a learning rate of 0.001 for 120 epochs. Given the inherent annotation noise from the automatically derived labels, our training approach incorporates robust, noise-resilient strategies. We employ self-supervised masked autoencoder pre-training and robust methods for training with noisy labels to improve model generalization, significantly reducing the detrimental impact of noisy annotations. For evaluation, we created a dataset derived from publicly available LoD2 data from Lower Saxony, Germany, focusing on complex residential roof structures, excluding overly simplified industrial or historical examples. Our training set comprises approximately 40,000 point cloud patches with two to eleven roof-part instances across eight semantic categories.

3. Evaluation and Discussion

Our SemRoof3D dataset is based on 3D building models (LOD2) from Lower Saxony in Germany. It focuses on residential buildings with complex roof structures. It comprises around 40,000 training patches, each containing between two and eleven roof parts across eight semantic classes. So far, we have manually annotated 500 patches for testing purposes.

We report Average Precision (AP), mean Intersection-over-Union (mIOU), and Panoptic Quality (PQ), achieving scores of 0.83, 0.67, and 0.78, respectively. Figure 1 illustrates the predicted semantic segmentation, instance masks, and corresponding confidence map, highlighting the model's ability to delineate structurally meaningful roof components despite annotation noise.

Future work will focus on refining annotation strategies and expanding semantic categorization and the dataset to additional geographic areas, thereby scaling up the model's applicability for comprehensive real-world deployment.

4. References

Boulch, A. (2020). ConvPoint: Continuous convolutions for point cloud processing. Computers and Graphics (Pergamon), 88, 24–34. https://doi.org/10.1016/j.cag.2020.02.005

Chen, Z., Shi, Y., Nan, L., Xiong, Z., & Xiang, X. (2024). ISPRS Journal of Photogrammetry and Remote Sensing PolyGNN : Polyhedron-based graph neural network for 3D building reconstruction from point clouds. 218(September 2023), 693–706.

Li, L., Song, N., Sun, F., Liu, X., Wang, R., Yao, J., & Cao, S. (2022). Point2Roof: End-to-end 3D building roof modeling from airborne LiDAR point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 193, 17–28. https://doi.org/https://doi.org/10.1016/j.isprsjprs.2022.08.027

Wang, R., Huang, S., & Yang, H. (2023). Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds. Proceedings of the IEEE International Conference on Computer Vision, 20019–20029. https://doi.org/10.1109/ICCV51070.2023.01837

Wang, X., Liu, S., Shen, X., Shen, C., & Jia, J. (2019). Associatively segmenting instances and semantics in point clouds. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 4091–4100. https://doi.org/10.1109/CVPR.2019.00422

Zhang, C., & Fan, H. (2022). An Improved Multi-Task Pointwise Network for Segmentation of Building Roofs in Airborne Laser Scanning Point Clouds. Photogrammetric Record, 37(179), 260–284. https://doi.org/10.1111/phor.12420

Zhang, Z., & Sabuncu, M. R. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 8778–8788.

Predicting Building Height from Footprint and Urban Planning information for Digital Twin Generation

Jue Ma, Chenbo Zhao, Yoshiki Ogawa, Yoshihide Sekimoto

The University of Tokyo, Japan

1. Introduction
Knowing building height is essential for understanding urban
process regimes, enabling improved urban management and
planning. Despite advancements in global scale datasets, such
as those developed by (Kamath et al., 2024), comprehensive
building-height information for Japanese cities remains notably
absent. Most existing studies rely heavily on imagery-based
methods, including satellite imagery, street view images or Syn-
thetic Aperture Radar (SAR) data derived from Sentinel-1 and
Sentinel-2 time series. (Frantz et al., 2021). For instance,
(Yang and Zhao, 2022) employed spatially-informed Gaussian
Process Regression with Sentinel-1 data for major Chinese cit-
ies, while (Cai et al., 2024) utilized building footprints com-
bined with a self-adaptive buffer for building photon selection
methods to improve height estimations.
To address these gaps, we propose a novel building-height pre-
diction method specifically tailored to Japanese urban contexts.
In contrast to previous studies that primarily measure height for
existing buildings, our objective extends to provide building-
height estimations necessary for generating hypothetical urban
scenarios within urban digital twins. This functionality is es-
sential for simulation urban development processes and evalu-
ating planning policies through digital twin applications. For
the first time, our approach integrates urban planning inform-
ation, explicitly considering building footprints, local zoning
regulations, and detailed roof-type attributes derived directly
from building geometry. Incorporating roof-type features pre-
dicted from footprints, alongside traditional urban morpholo-
gical characteristics, our method significantly enhances predict-
ive accuracy. The main contributions of our research are: it
fills the critical data gap in global building-height datasets, par-
ticularly addressing the absence of detailed Japanese data; and
it provides a robust, high-resolution building-height prediction
model directly applicable to digital twin frameworks, facilitat-
ing the generation of realistic 3D urban models.

2. Methods
2.1 Data
This study utilizes the PLATEAU dataset, a detailed 3D urban
model produced from aerial surveys and other geospatial meas-
urements, developed and maintained by the Ministry of Land,
Infrastructure, Transport and Tourism (MLIT) in Japan. As
of August 2024, when our dataset was acquired, PLATEAU
provides publicly accessible 3D urban model data covering
212 municipalities in Japan. We employed Plateaukit (Ozeki,
2024), an open-source tool designed specifically for access-
ing and processing PLATEAU data are available in GeoJSON,
CityJSON, and Parquet formats; in this research, we utilize the
Parquet format. Both our target variable—measured building
height—and urban morphological features are sourced directly

from this dataset.
Urban planning information incorporated in our study primarily
refers to the ”Use Districts” (Japanese: youto chiiki), a founda-
tional zoning regulation in Japan. These districts regulate build-
ing usage and parameters such as Floor Area Ratio (FAR) and
Building Coverage Ratio (BCR), and are categorized into 12
distinct types covering residential, commercial, and industrial
purposes (Ma et al., 2024). Given that zoning regulations sig-
nificantly influence building height and form in Japanese cities,
we regard them as crucial inputs for urban digital twin applic-
ations that simulate urban planning scenarios, and thus involve
them in our model. We sourced this zoning regulation data from
the MLIT Urban Planning Information Dataset (Toshi-keikaku
Kettei Joho), which provides nationwide coverage. We found
that zoning information for 98 municipalities was already in-
tegrated into the PLATEAU dataset, enabling direct usage for
our analysis.
In addition, road network data were included to compute two
urban morphological characteristics—level and width of the ad-
jacent road—as features in our model. These road network data
were provided by the Japan Digital Road Map Association.
Finally, we introduced roof type data generated through a roof
classification algorithm developed by our research group. This
model integrates satellite imagery with building footprint data
to classify roof structures into several typical roof categories,
achieving classification accuracy exceeding 95%. Roof type in-
formation is uncommon in most building datasets; hence, its
inclusion provides valuable architectural insights that enhance
the accuracy of our predictive model.
For data cleaning, we selected cities that simultaneously in-
cluded complete datasets mentioned above. Buildings with
measured heights below 3 meters (one floor) or extremely large
outlier values, as well as any observations with missing values,
were removed to ensure data integrity and predictive reliability.

2.2 Feature Engineering
Referring to existing building-height prediction mod-
els (Milojevic-Dupont et al., 2020, Stipek et al., 2024),
and incorporating knowledge from urban science, we selected
a total of 13 predictive features (Table 1). These include
building geometric characteristics, neighborhood context, and
regulatory information.
2.3 Machine Learning Methods
To enhance the predictive effectiveness of our machine learn-
ing models, we first preprocessed the zoning regulation data.
Given the considerable diversity and overlap in building heights
across the numerous categories of zoning regulations (”Use
Districts”), we applied K-means clustering to regroup these
zoning categories based on their height characteristics. Spe-
cifically, we computed statistical measures for each original
”Use District” category—mean, standard deviation, maximum,

minimum, median, and skewness of building heights—as clus-

tering features. This clustering process resulted in three distinct
zoning groups, each modeled separately to ensure greater ac-
curacy.
Supervised learning methods have been widely applied in
building-height prediction tasks (Milojevic-Dupont et al.,
2020). In line with previous research, our study employed su-
pervised machine learning algorithms utilizing publicly avail-
able building and urban planning data. We compared three ma-
chine learning algorithms: Random Forest Regression (RFR),
Support Vector Regression (SVR), and Extreme Gradient
Boosting (XGBoost).
We conducted model training and evaluation using a dataset
comprising 15,036 buildings from five selected Japanese cit-
ies: Sendai, Maebashi, Omuta, Chino, and Tokushima. The
dataset was split into a training set (70%) and a test set (30%).
Considering that a substantial proportion of buildings were be-
low 10 meters in height, we implemented stratified sampling
within each zoning group to balance the height distribution in
the training dataset, thus ensuring model robustness across the
entire height spectrum. All models were trained using cross-
validation techniques, with the training subset from the afore-
mentioned cities, to validate the accuracy and generalizability
of the height predictions on the testing subset.

3. Results
The predictive performance of the three supervised learning al-
gorithms is presented in Table ??. The models were evaluated
using Mean Absolute Percentage Error (MAPE), Mean Abso-
lute Error (MAE), and the coefficient of determination (R2).
Overall, RFR achieved the best predictive performance among
the three models, particularly in terms of the coefficient of
determination (R2) and mean absolute error (MAE). Notably,
RFR obtained the highest R2 value of 0.67 in Group 2, which
corresponds to the zoning category with the most consistent
height distribution. This suggests that the model captures the
relationship between building features and height most effect-
ively in more regulated or homogeneous areas.
SVR showed relatively high MAE and unstable R2 values, es-
pecially in Group 1, indicating poor generalization for more di-
verse urban forms. Although the full-dataset performance of
SVR was comparable to RFR in terms of MAPE, its predictive
reliability was notably lower.
XGBoost yielded consistent results across groups but underper-
formed compared to RFR. Its relatively lower R2 values and
higher errors suggest that while robust.

CM2LoD3: Reconstructing LoD3 Building Models Using Semantic Conflict Maps

Antonia Bieringer, Franz Hanke, Olaf Wysocki, Boris Jutzi

Technical University of Munich, Germany

Detailed 3D building models are crucial for urban planning, digital twins, and disaster management applications.
While Level of Detail 1 (LoD)1 and LoD2 building models are widely available, they lack detailed facade elements essential for advanced urban analysis.
In contrast, LoD3 models address this limitation by incorporating facade elements such as windows, doors, and underpasses. However, their generation has traditionally required manual modeling, making large-scale adoption challenging.
In this contribution, CM2LoD3, we present a novel method for reconstructing LoD3 building models leveraging Conflict Maps (CMs) obtained from ray-to-model-prior analysis.
Unlike previous works, we concentrate on semantically segmenting real-world CMs with synthetically generated CMs from our developed Semantic Conflict Map Generator (SCMG).
We also observe that additional segmentation of textured models can be fused with CMs using confidence scores to further increase segmentation performance and thus increase 3D reconstruction accuracy.
Experimental results demonstrate the effectiveness of our CM2LoD3 method in segmenting and reconstructing building openings,
with the 61% performance with uncertainty-aware fusion of segmented building textures.
This research contributes to the advancement of automated LoD3 model reconstruction, paving the way for scalable and efficient 3D city modeling.
Our project is available: textit{[anonymized for the submission]}