Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
24: Coffee Break & Posters Session 3: Computer Vision and AI Applications
Time:
Thursday, 30/Nov/2023:
10:00am - 11:00am

Location: Polivalente


Show help for 'Increase or decrease the abstract text size'
Presentations

Replay-Based Online Adaptation for Unsupervised Deep Visual Odometry

Yevhen Kuznietsov1, Marc Proesmans1, Luc Van Gool1,2,3

1KU Leuven, Belgium; 2ETH Zurich, Switzerland; 3INSAIT Sofia, Bulgaria

Online adaptation is a promising paradigm that enables dynamic adaptation to new environments. In recent years, there has been a growing interest in exploring online adaptation for various problems, including visual odometry, a crucial task in robotics, autonomous systems, and driver assistance applications. In this work, we leverage experience replay, a potent technique for enhancing online adaptation, to explore the replay-based online adaptation for unsupervised deep visual odometry. Our experiments reveal a remarkable performance boost compared to the non-adapted model. Furthermore, we conduct a comparative analysis against established methods, demonstrating competitive results that showcase the potential of online adaptation in advancing visual odometry.



IR-guided energy optimization framework for depth enhancement in Time of Flight imaging

AMINA ACHAIBOU1,2, Filiberto Pla1, Javier CALPE2

1Universitat Jaume I, Spain; 2ANALOG devices

This paper introduces an optimization energy framework based on infrared (IR) guidance to improve depth consistency in Time of Flight (ToF) image systems. The primary objective is to formulate the problem as an image energy optimization task, aimed at maximizing the coherence between the depth map (D) and the corresponding IR image, both captured simultaneously from the same ToF sensor. The concept of depth consistency relies on the underlying hypothesis concerning the correlation between depth maps and their corresponding IR images.

The proposed optimization framework adopts a weighted approach, leveraging an iterative estimator. The image energy is characterized by introducing spatial conditional entropy as a correlation measure and spatial error as image regularization. To address the issue of missing depth values, a preprocessing step is initially applied, by using a depth completion method based on IR guided belief propagation, which was proposed in a previous work.

Subsequently, the proposed framework is employed to regularize and enhance the inpainted depth. The experimental results demonstrate a range of qualitative improvements in depth map reconstruction, with a particular emphasis on the sharpness and continuity of edges.



Development and Testing of an MRI-Compatible Immobilization Device for Head and Neck Imaging

Francisco Zagalo1,2, Susete Fetal3,4, Paulo Fonte3,4, Antero Abrunhosa2, Sónia Afonso2, Luís Lopes4, Miguel Castelo-Branco2

1Department of Physics, University of Coimbra, 3004-516 Coimbra, Portugal; 2Institute for Nuclear Sciences Applied to Health (ICNAS), University of Coimbra, 3000-548 Coimbra, Portugal; 3Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal; 4LIP - Laboratory of Instrumentation and Experimental Particle Physics, 3004-516 Coimbra, Portugal

MRI imaging with long acquisition times is prone to motion artifacts that can compromise image quality and lead to misinterpretation.

Aiming to address this challenge at the sub-millimeter level, we developed and evaluated a maxilla immobilization approach, which is known to have better performance than other non-invasive techniques, using a personalized mouthpiece connected to an external MRI-compatible frame.

The effectiveness of the device was evaluated by analyzing MRI imagery ob-tained in different immobilization conditions on a human volunteer. The SURF and Block Matching algorithms were accessed, supplemented by cus-tom software.

Compared with simple cushioning, the immobilizer reduced the amplitudes of involuntary slow-drift movements of the head by more than a factor two in the axial plane, with final values of 0.25 mm and 0.060 degrees. Faster in-voluntary motions, including those caused by breathing (which were identi-fiable), were also suppressed, with final standard deviation values below 0.045 mm and 0.025 degrees.

It was also observed a strong restriction of intentional movements, transla-tionally and angularly, by factors from 7.8 to 4.6, with final values of 0.5 mm and 0.2 degrees for moderate forcing.



Bipartite Graph Coarsening For Text Classification Using Graph Neural Networks

Nícolas Roque dos Santos1, Diego Minatel1, Alan Demétrius Baria Valejo2, Alneu de Andrade Lopes1

1University of São Paulo, Brazil; 2Federal University of São Carlos, Brazil

Text classification is a fundamental task in Text Mining (TM) with applications ranging from spam detection to sentiment analysis. One of the current approaches to this task is Graph Neural Network (GNN), primarily used to deal with complex and unstructured data. However, the scalability of GNNs is a significant challenge when dealing with large-scale graphs. Multilevel optimization is prominent among the methods proposed to tackle the issues that arise in such a scenario. This approach uses a hierarchical coarsening technique to reduce a graph, then applies a target algorithm to the coarsest graph and projects the output back to the original graph. Here, we propose a novel approach for text classification using GNN. We build a bipartite graph from the input corpus and then apply the coarsening technique of the multilevel optimization to generate ten contracted graphs to analyze the GNN's performance, training time, and memory consumption as the graph is gradually reduced. Although we conducted experiments on text classification, we emphasize that the proposed method is not bound to a specific task and, thus, can be generalized to different problems modeled as bipartite graphs. Experiments on datasets from various domains and sizes show that our approach reduces memory consumption and training time without significantly losing performance.



Novelty Detection in Human-Machine Interaction Through a Multimodal Approach

José Salas-Cáceres, Javier Lorenzo-Navarro, Modesto Castrillón-Santana, David Freire-Obregón

Universidad de Las Palmas de Gran Canaria, Spain

As the interest in robots continues to grow across various domains, including healthcare, construction and education, it becomes crucial to prioritize improving user experience and fostering seamless interaction.

These human-machine interactions (HMI) are often impersonal. Our proposal, built upon previous work in the field, aims to use biometric data of individuals to detect whether a person has been encountered before. Since many models depend on a threshold set, an optimization method using a genetic algorithm was proposed. The novelty detection is made through a multimodal approach using both voice and facial images from the individuals, although the unimodal approaches of just each single cue were also tested. To assess the effectiveness of the proposed system, we conducted comprehensive experiments on three diverse datasets, namely VoxCeleb, Mobio and AveRobot, each possessing distinct characteristics and complexities. By examining the impact of data quality on model performance, we gained valuable insights into the effectiveness of the proposed solution. Our approach outperformed several conventional novelty detection methods, yielding superior and therefore promising results.



Self-Supervised Monocular Depth Estimation on Unseen Synthetic Cameras

Cecilia Diana Albelda1, Juan Ignacio Bravo Pérez-Villar1,2, Javier Montalvo1, Álvaro García Martín1, Jesús Bescós Cano1

1Video Processing and Understanding Lab, Univ. Autónoma de Madrid, 28049 Madrid, Spain; 2Deimos Space, 28760 Madrid, Spain

Monocular depth estimation is a critical task in computer vision, and self-supervised deep learning methods have achieved remarkable results in recent years. However, these models often struggle on camera generalization, i.e. at sequences captured by unseen cameras. To address this challenge, we present a new public custom dataset created using the CARLA simulator, consisting of three video sequences recorded by five different cameras with varying focal distances. This dataset has been created due to the absence of public datasets containing identical sequences captured by different cameras. Additionally, it is proposed in this paper the use of adversarial training to improve the models' robustness to intrinsic camera parameter changes, enabling accurate depth estimation regardless of the recording camera. The results of our proposed architecture are compared with a baseline model, hence being evaluated the effectiveness of adversarial training and demonstrating its potential benefits both on our synthetic dataset and on the KITTI benchmark as the reference dataset to evaluate depth estimation.



Presumably Correct Undersampling

Gonzalo Nápoles1, Isel Grau2

1Tilburg University, The Netherlands; 2Eindhoven University of Technology, The Netherlands

This paper presents a data pre-processing algorithm to tackle class imbalance in classification problems by undersampling the majority class. It relies on a formalism termed Presumably Correct Decision Sets aimed at isolating easy (presumably correct) and difficult (presumably incorrect) instances in a classification problem. The former are instances with neighbors that largely share their class label, while the latter have neighbors that mostly belong to a different decision class. The proposed algorithm replaces the presumably correct instances belonging to the majority decision class with prototypes, and it operates under the assumption that removing these instances does not change the boundaries of the decision space. Note that this strategy opposes other methods that remove pairs of instances from different classes that are each other's closest neighbors. We argue that the training and test data should have similar distribution and complexity and that making the decision classes more separable in the training data would only increase the risks of overfitting. The experiments show that our method improves the generalization capabilities of a baseline classifier, while outperforming other undersampling algorithms reported in the literature.



Time Distributed Multiview Representation for Speech Emotion Recognition

Marcelo Eduardo Pellenz, Flávia Letícia de Mattos, Alceu de Souza Britto Jr.

Pontifical Catholic University of Paraná (PUCPR), Brazil

In recent years, speech-emotion recognition (SER) techniques have gained importance, mainly in human-computer interaction studies and applications. This research area has different challenges, including developing new and efficient detection methods, efficient extraction of audio features, and time preprocessing strategies. This paper proposes a new multiview model to detect speech emotion in raw audio data. The proposed method uses mel-spectrogram features optimized from audio files and combines deep learning algorithms to improve the detection performance. This combination relied on the following algorithms: CNN (Convolutional Neural Network), VGG (Visual Geometry Group), ResNet (Residual neural network), and LSTM (Long Short-Term Memory). The role of the CNN algorithm is to extract the characteristics present in the images of the mel-spectrograms applied as input to the method. These characteristics are combined with the VGG and ResNet networks, which are pre-trained algorithms. Finally, the LSTM algorithm receives all this combined information to identify the predefined emotions. The proposed method was developed using the RAVDESS database and considering eight emotions. The results show an increase of up to 12% in accuracy compared to strategies in the literature that use raw data processing.



But that's not why: Inference adjustment by interactive prototype revision

Michael Gerstenberger1, Thomas Wiegand1, Peter Eisert1,2, Sebastian Bosse1

1Fraunhofer Heinrich-Hertz Institut (HHI), Germany; 2Humboldt University Berlin

Prototypical part networks predict not only the class of an image but also explain why it was chosen. In some cases, however, the detected features do not relate to the depicted objects. This is especially relevant in prototypical part networks as prototypes are meant to code for high-level concepts such as semantic parts of objects. This raises the question how the inference of the networks can be improved. Here we suggest to enable the user to give hints and interactively correct the model's reasoning. It shows that even correct classifications can rely on unreasonable or spurious prototypes that result from confounding variables in a dataset. Hence, we introduce simple yet effective interaction schemes for inference adjustment that enable the user to interactively revise the prototypes chosen by the model. Spurious prototypes can be removed or altered to become sensitive to object-features by a novel mode of training. Interactive prototype revision allows machine learning na\"{i}ve users to adjust the logic of reasoning and change the way prototypical part networks make a decision.



Impact of Synthetic Images on Morphing Attack Detection Using a Siamese Network

Juan Tapia1, Christoph Busch2

1Hochschule Darmstadt, Germany; 2Hochschule Darmstadt, Germany

This paper evaluated the impact of synthetic images on Morphing Attack Detection (MAD) using a Siamese network with a semi-hard-loss function. Intra and cross-dataset evaluations were performed to measure synthetic image generalisation capabilities using a cross-dataset for evaluation. Three different pre-trained networks were used as feature extractors from traditional MobileNetV2, MobileNetV3 and EfficientNetB0. Our results show that MAD trained on EfficientNetB0 from FERET, FRGCv2, and FRLL can reach a lower error rate in comparison with SOTA. Conversely, worse performances were reached when the system was trained only with synthetic images. A mixed approach (synthetic + digital) database may help to improve MAD and reduce the error rate. This fact shows that we still need to keep going with our effort to include synthetic images in the training process.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: CIARP 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany