Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
| Session | ||
Session 3-a: 3DGeoInfo - Semantic Enrichment of 3D City Modeling
| ||
| Presentations | ||
Evaluating and Enhancing Georeferencing Accuracy in BIM and 3D GIS Models for Built Environment Digital Twins 1TU Dublin, Ireland; 2SOLAS, Ireland The development of digital twins in the built environment projects facilitates decision making during both the planning and execution phases, while also contributing to the digitalization of the built environment. Open standards models such as Industry Foundation Classes (IFC) and CityGML representing Building Information Modelling (BIM) and Geographic Information Systems (GIS) are essential components in the creation of a comprehensive digital twins. Accurate georeferencing of these models is critical for performing reliable geospatial analysis, including distance and area measurements. This factor has major effect on projects that employ a custom Coordinate Reference System (CRS), or in contexts where the distortion due to map projection is significant. Although a considerable number of studies addressed georeferencing in BIM-GIS data integration, limited has been given to the assignment of custom CRS to both IFC and CityGML models. Additionally, the effect of modelling and georeferencing methodologies on the final positional accuracy of the models remain insufficiently explored, and most of existing studies remain at the conceptual level. This study aims to address this gap by proposing a methodology to assign custom CRS to both IFC and CityGML models, while also practically examining how BIM modelling and georeferencing methodology impact the overall georeferencing accuracy of the models. The results demonstrate the effectiveness of the proposed approach in controlling the georeferencing accuracy of IFC and CityGML models employed in the generation of digital twins. The proposed approach can be utilized as a guideline for achieving accurate georeferencing when creating digital twins in the built environment projects. Robust Joint Instance-Semantic Segmentation for Semantic Enrichment of 3D Roof Reconstruction from Noisy Labels 1State Mapping and Surveying Office Lower Saxony; 2Technical University Berlin, Germany; 3Reimagine Spaces
Three-dimensional (3D) building models are essential components of digital city representations, supporting applications in urban planning, environmental analysis, and disaster management. Traditional reconstruction methods rely on manually crafted geometric rules and intensive human intervention, resulting in high costs and infrequent updates. Recent advances in deep learning (DL), particularly using Convolutional Neural Networks (CNNs), have shown potential for automating 3D reconstruction. However, these methods often remain limited to research settings due to the lack of large, accurately annotated datasets capturing real-world variability. Several studies have proposed DL-based methods for 3D building reconstruction. PolyGNN (Chen et al., 2024) employs a graph neural network for polyhedron-based reconstruction using a synthetic dataset supplemented with real-world data, while Point2Roof (Li et al., 2022) adopts a two-stage graph-based approach trained on a RoofN3D (Wichmann et al., 2019), a relatively simple dataset. Although these methods have achieved promising results in geometric reconstruction tasks, they often exhibit limited generalization when applied to real-world data due to significant domain gaps. Public 3D building models, such as CityGML LoD2 datasets, offer opportunities to derive large-scale annotations. However, the annotations are inherently noisy due to geometric simplifications and limited semantic validation during model generation. This noise affects training quality and necessitates robust handling. Previous works (Wang et al., 2023) use self-supervision to mitigate annotation errors and focus mainly on geometry. Methods of weakly supervised learning or training with noisy labels are not yet explored in this context. Semantic segmentation is a powerful means of enhancing scene understanding by structuring urban data and guiding interpretation. Current methods focus on building instance segmentation, treating buildings as monolithic entities or extracting roofs as planar primitives, thereby missing higher-level semantics. This restricts their ability to effectively capture the structure of complex roofs. The specific task of roof-part instance segmentation—partitioning roofs into semantically meaningful components corresponding to canonical primitives such as gables, hips, or flats—has not been previously addressed. Nevertheless, these more detailed semantics of the roof structures can serve as prior knowledge that can effectively guide and constrain geometric processing steps, making them more targeted and reliable. Additionally, addressing these finer-grained semantic structures is critical for enabling detailed urban analysis, supporting more accurate physical simulations (e.g., wind loads, solar potential), and improving the interpretability of digital twins. However, this task is particularly challenging. Roof structures often feature complex compositions of adjacent, interleaved primitives with subtle geometric transitions. These factors complicate the reliable identification and segmentation of roof parts and require methods capable of learning robust geometric and semantic patterns. To overcome these limitations, we propose a novel approach that jointly addresses semantic and instance segmentation by explicitly targeting detailed roof structural components termed "Roof-Part Instances" (RPIs). We define an RPI as the connected subset of roof-labeled points in the 3D point cloud corresponding uniquely to a canonical roof primitive (e.g., gable, hip, flat) from a fixed taxonomy, terminating at geometric discontinuities such as ridges, valleys, eaves, or height offsets. This granularity enables more accurate semantic modeling of roof structures and supports targeted geometric processing and interpretation tasks. Semantic segmentation assigns categorical labels to each point, while instance segmentation differentiates individual object instances. Both tasks are closely interconnected, as each object instance inherently belongs to a single semantic category. MTL exploits this inherent relationship by utilizing a shared encoder that generates common representations, coupled with task-specific decoder branches, thereby enabling mutual improvement and more effective generalization across (X. Wang et al., 2019) A distinctive contribution of our proposed method is integrating a bidirectional, multi-scale cross-attention mechanism, significantly enhancing dynamic interaction between semantic and instance decoders. Unlike previously applied pointwise fusion methods tasks (C. Zhang & Fan, 2022), cross-attention allows each decoder branch to adaptively select and weigh relevant features from its counterpart, promoting consistency and coherence between semantic predictions and instance embeddings. This mechanism effectively addresses boundary ambiguities and inconsistencies, particularly in scenarios involving densely arranged instances belonging to identical semantic classes. 2. Method and Dataset Our method leverages typical architectures used for multitask learning (X. Wang et al., 2019). We adopt an encoder-decoder architecture utilizing a shared encoder for unified feature representation, paired with dedicated semantic and instance decoder branches. Our architecture specifically employs ConvPoint (Boulch, 2020) to extract hierarchical, fine-grained geometric details from local point neighborhoods. Additionally, we introduce a novel multi-scale cross-attention mechanism to dynamically integrate features from semantic and instance branches, effectively resolving boundary ambiguities and maintaining consistency across closely positioned instances of the same semantic class. To support multitask learning, we design a multi-objective loss function that includes the Generalized Cross-Entropy loss (Z. Zhang & Sabuncu, 2018) for semantic segmentation and a discriminative loss for learning instance embeddings. This formulation allows the model to simultaneously optimize for accurate classification and effective instance distinction. Training is conducted using the AdamW optimizer with a learning rate of 0.001 for 120 epochs. Given the inherent annotation noise from the automatically derived labels, our training approach incorporates robust, noise-resilient strategies. We employ self-supervised masked autoencoder pre-training and robust methods for training with noisy labels to improve model generalization, significantly reducing the detrimental impact of noisy annotations. For evaluation, we created a dataset derived from publicly available LoD2 data from Lower Saxony, Germany, focusing on complex residential roof structures, excluding overly simplified industrial or historical examples. Our training set comprises approximately 40,000 point cloud patches with two to eleven roof-part instances across eight semantic categories. 3. Evaluation and Discussion Our SemRoof3D dataset is based on 3D building models (LOD2) from Lower Saxony in Germany. It focuses on residential buildings with complex roof structures. It comprises around 40,000 training patches, each containing between two and eleven roof parts across eight semantic classes. So far, we have manually annotated 500 patches for testing purposes. We report Average Precision (AP), mean Intersection-over-Union (mIOU), and Panoptic Quality (PQ), achieving scores of 0.83, 0.67, and 0.78, respectively. Figure 1 illustrates the predicted semantic segmentation, instance masks, and corresponding confidence map, highlighting the model's ability to delineate structurally meaningful roof components despite annotation noise. Future work will focus on refining annotation strategies and expanding semantic categorization and the dataset to additional geographic areas, thereby scaling up the model's applicability for comprehensive real-world deployment. 4. References Boulch, A. (2020). ConvPoint: Continuous convolutions for point cloud processing. Computers and Graphics (Pergamon), 88, 24–34. https://doi.org/10.1016/j.cag.2020.02.005 Chen, Z., Shi, Y., Nan, L., Xiong, Z., & Xiang, X. (2024). ISPRS Journal of Photogrammetry and Remote Sensing PolyGNN : Polyhedron-based graph neural network for 3D building reconstruction from point clouds. 218(September 2023), 693–706. Li, L., Song, N., Sun, F., Liu, X., Wang, R., Yao, J., & Cao, S. (2022). Point2Roof: End-to-end 3D building roof modeling from airborne LiDAR point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 193, 17–28. https://doi.org/https://doi.org/10.1016/j.isprsjprs.2022.08.027 Wang, R., Huang, S., & Yang, H. (2023). Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds. Proceedings of the IEEE International Conference on Computer Vision, 20019–20029. https://doi.org/10.1109/ICCV51070.2023.01837 Wang, X., Liu, S., Shen, X., Shen, C., & Jia, J. (2019). Associatively segmenting instances and semantics in point clouds. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 4091–4100. https://doi.org/10.1109/CVPR.2019.00422 Zhang, C., & Fan, H. (2022). An Improved Multi-Task Pointwise Network for Segmentation of Building Roofs in Airborne Laser Scanning Point Clouds. Photogrammetric Record, 37(179), 260–284. https://doi.org/10.1111/phor.12420 Zhang, Z., & Sabuncu, M. R. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 8778–8788. Predicting Building Height from Footprint and Urban Planning information for Digital Twin Generation The University of Tokyo, Japan 1. Introduction 2. Methods from this dataset. 2.2 Feature Engineering minimum, median, and skewness of building heights—as clus- tering features. This clustering process resulted in three distinct 3. Results CM2LoD3: Reconstructing LoD3 Building Models Using Semantic Conflict Maps Technical University of Munich, Germany Detailed 3D building models are crucial for urban planning, digital twins, and disaster management applications. | ||