Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
8: Second Poster Session
Time:
Friday, 27/Oct/2023:
3:30pm - 4:30pm

Location: Polivatente


Session on Computer Vision and Deep Learning

Show help for 'Increase or decrease the abstract text size'
Presentations

Steganography Applications of StyleGAN: A Short Analytical Investigation from Hiding Message in Face Images

Farhad Shadmand1,2, Nuno Gonçalves1,2,3, Luiz Schirmer1,2

1Instituto de Sistemas e Robótica - Polo de Coimbra, Portugal; 2University of Coimbra; 3INCM Lab Portuguese Mint and Official Printed Office Lisbon, Portugal

In this investigation, we delve into the latent codes denoted as w, pertaining to both original and encoded images, which are projected through StyleGAN—a generative adversarial network renowned for generating aesthetic synthesis. Our utilization encompasses a pretrained StyleGAN model trained upon Flickr-Faces-HQ (FFHQ) dataset [6]. The message into encoded renderings is facilitated by the employment of CodeFace, serving as a steganography model. By gauging the average disparities amid the latent codes belonging to the original and encoded images, a discerning revelation of optimal channels for concealing information comes to light. Precisely orchestrated manipulation of these channels furnishes us with the means to engender novel encoded visual compositions.

RecPAD_116.pdf


Automatic Multi-View Pose Estimation in Focused Cardiac Ultrasound

João Freitas1,2, Jaime C. Fonseca3, Sandro Queirós1,2

1Life and Health Sciences Research Institute, School of Medicine, University of Minho, Portugal; 2ICVS/3B’s - PT Government Associate Laboratory, Braga/Guimarães, Portugal; 3Center Algoritmi, School of Engineering, University of Minho, Portugal

Focused cardiac ultrasound (FoCUS) emerges as a valuable on-the-spot technique for assessing cardiovascular structures and performance. Nevertheless, its applicability is curbed by equipment constraints and the proficiency of the operator, which leads to mostly qualitative evaluations. This study presents a novel framework that aims to automatically estimate the 3D spatial relationship between standard FoCUS views. The proposed framework uses a multi-view U-Net-like convolutional neural network to regress line-based heatmaps representing the most likely areas of intersection between input images. The lines that best fit the regressed heatmaps are then extracted, and a system of nonlinear equations is created to determine the relative 3D pose between all input views. The feasibility and accuracy of the proposed pipeline were validated using a novel realistic in silico FoCUS dataset, revealing auspicious outcomes that suggest its potential value within clinical contexts. This framework, by estimating the 3D pose, could help enabling comprehensive 3D quantitative assessments of FoCUS evaluations, enhancing diagnostic proficiencies, especially within urgent and high-care scenarios where swift and precise evaluations are paramount.

RecPAD_117.pdf


Impact of miscalibration and camera noise in an active stereo-based LiDAR in 3D object detection

Roberto Oliveira Graça1,2, Miguel Vidal Drummond1,2, Paulo Miguel Nepomuceno Pereira Monteiro1,2

1University of Aveiro, Portugal; 2Instituto de Telecomunicações, Aveiro

LiDARs are a powerful technology capable of capturing high-resolution 3D point clouds of the surroundings. However, because it is expensive and immature, LiDAR is not yet a viable option for car manufacturers for mass-production.

We propose an alternative LiDAR sensor made of affordable and mass-produced components, which works like an active stereo system, however with physically detached cameras.

This paper focuses on the fact that such a setup is prone to being miscalibrated and is also affected by camera noise. The objective of this paper is to observe whether camera miscalibration and noise are harmful to 3D object detection.

RecPAD_119.pdf


Dealing with Overfitting in the Context of Liveness Detection using FeatherNets with RGB images

Miguel Leão1, Nuno Gonçalves1,2

1Instituto de Sistemas e Robótica - Coimbra, Portugal; 2Imprensa Nacional-Casa da Moeda SA, Portugal

With the increased use of machine learning for liveness detection solutions comes some shortcomings like overfitting, where the model adapts perfectly to the training set, becoming unusable when used with the testing set, defeating the purpose of machine learning. This paper proposes how to approach overfitting without altering the model used by focusing on the input and output information of the model. The input approach focuses on the information obtained from the different modalities present in the datasets used, as well as how varied the information of these datasets is, not only in number of spoof types but as the ambient conditions when the videos were captured. The output approaches were focused on both the loss function, which has an effect on the actual ”learning”, used on the model which is calculated from the model’s output and is then propagated backwards, and the interpretation of said output to define what predictions are considered as bonafide or spoof. Throughout this work, we were able to reduce the overfitting effect with a difference between the best epoch and the average of the last fifty epochs from 36.57\% to 3.63\%.

RecPAD_120.pdf


Fingerprint Recognition and Matching

Nuno Daniel Martins1,2, Jose Silvestre Silva3,4, Alexandre Bernardino5,2

1Military Academy, Lisbon, Portugal; 2Instituto Superior Técnico, Universidade de Lisboa, Portugal; 3Military Academy & CINAMIL, Lisbon, Portugal; 4LIBPhys-UC & LA-Real, Universidade de Coimbra, Portugal; 5Institute for Systems and Robotics (ISR), Lisbon, Portugal

Fingerprints are unique patterns which have gained prominence as a biometric key for diverse applications and thus, the recognition systems play a crucial role. Despite advancements, challenges persist in accurately matching fingerprint minutiae, especially when dealing with large databases and image transformations (rotations and translations). This paper addresses the alignment dependence and sensitivity to geometric changes by proposing a fast and robust fingerprint matching methodology that includes pre-processing techniques, minutiae extraction, creating a template to represent the minutiae using a polygon and minutiae matching through the polygon’s features. The proposed approach alleviates alignment concerns, making strides towards more reliable and accurate fingerprint matching systems.

RecPAD_122.pdf


The Impact of Large Receptive Fields on Grad-CAM

Rui Santos1,2, João Pedrosa1,2, Ana Maria Mendonça1,2, Aurélio Campilho1,2

1INESC TEC, Portugal; 2FEUP, Portugal

Deep learning models have been widely used in past years for a variety of applications, progressively achieving better results due to an increase in complexity. More complexity leads to a decrease in interpretability, which demands explanations regarding model reasoning. These explanations can be obtained with methods like Grad-CAM that computes the gradients up to the last convolutional layer to form an importance map relative to a specific class. This is followed by an upsampling operation which matches the size of the importance map to the size of the input. However, this step is based on the assumption that the feature spatial organization is maintained throughout the model, which may not be the case. We hypothesize that the spatial organization of the features is not kept during the forward pass for models with large receptive fields, which may render the importance map devoid of any meaning. This also applies to any Grad-CAM variant using the same upsampling step. The obtained results show a significant dispersion of the spatial information, which goes against the implicit assumption of Grad-CAM, and that explainability maps suffer from this dispersion. Altogether, this work addresses a key limitation of Grad-CAM which may go unnoticed for common users, taking one step further in the pursuit for more reliable explainability methods.

RecPAD_124.pdf


A study on the role of feature selection for malware detection on Android applications

Catarina Rodrigues Palma1, Artur Jorge Ferreira1,2, Mário Figueiredo2,3

1Instituto Superior de Engenharia de Lisboa (ISEL); 2Instituto de Telecomunicações (IT); 3Instituto Superior Técnico (IST)

The presence of malicious software (malware) in Android applications (apps) has harmful or irreparable consequences to the user and/or the device. Despite the protections provided by app stores, malware keeps growing in both sophistication and diffusion. This paper explores the use of machine learning (ML) and feature selection (FS) approaches to detect malware in Android applications using public domain datasets. We resort to the relevance-redundancy FS (RRFS) filter method using the unsupervised mean-median (MM) and the supervised Fisher ratio (FR) relevance measures. Our approach is able to reduce the dimensionality of the data, improving the experimental results of the baseline model, and identifies the most decisive features to classify an app as malware.

RecPAD_127.pdf


Computed Tomography Slice Reconstruction using Deep Learning

Margarida Fontes Pereira1, Rúben Silva1, Fábio Nunes3,4, Jennifer Mâncio3, Ricardo Fontes-Carvalho3,4, João Pedrosa1,2

1Faculty of Engineering of the University of Porto (FEUP), Portugal; 2Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal; 3Faculty of Medicine of the University of Porto (FMUP), Portugal; 4Centro Hospitalar de Vila Nova de Gaia e Espinho, Portugal

Coronary Artery Disease (CAD) is a prevalent and life-threatening condition, necessitating accurate diagnosis due to its substantial impact on morbidity and mortality. This study explores the potential of an autoencoder architecture for cardiac CT image reconstruction, a critical aspect of CAD diagnosis and treatment planning.

A dataset of 20 patients' CT scans was utilized, partitioned into training, validation, and test sets. The model, combining U-Net and autoencoder principles, demonstrated effective learning during training. However, some challenges in generalization were observed, with higher errors in unseen data, possibly due to overfitting. Future work should focus on enhancing the model's adaptability to new data. This research lays the groundwork for advancing cardiac CT image reconstruction, promising improved cardiac disease diagnosis.

RecPAD_131.pdf


Fine-Grained Fish Species Image Classification

Ricardo Jorge Martins Veiga, João Miguel Fernandes Rodrigues

LARSyS & ISE, Universidade do Algarve

Fine-grained fish species classification is especially important for ecological studies and fisheries management, as it helps with ecosystem evaluation, environmental monitoring, and biodiversity conservation. Manual fish species identification, which was historically the primary method, is difficult and time-consuming, while deep learning techniques offered the possibility of automation and improvement in both efficiency and accuracy. This work investigates the usage of the Swin Transformer in conjunction with a novel plug-in module for fine-grained fish identification (classification), resulting in promising results. While the present method does not yet outperform the state-of-the-art directly, it obtains 96.20% accuracy on Croatian fish dataset, and 100% accuracy when the top three predictions are considered.

RecPAD_134.pdf


Multimodal Deep Learning for Synchronous Heart Sounds and Electrocardiogram Classification

Bruno Filipe Oliveira, Miguel Tavares Coimbra, Francesco Renna

INESC-TEC and FCUP, Portugal

We propose a multimodal model for the binary classification (normal/abnormal) of synchronous heart sounds and electrocardiogram (ECG). The preliminary results shows that there is an improvement in using both signals for classification instead of heart sounds or ECG alone (e.g., F1-score of 0.86 vs 0.79 and 0.84 respectively), which is useful for the detection of heart abnormalities in lowincome countries, with the help of a multimodal stethoscope.

RecPAD_137.pdf


A Data-Centric Approach for Detecting a Neutral Facial Expression using Deep Learning

Lúcia Maria Sousa1, Daniel Canedo1, Miguel Drummond2, João Ferreira3, António Neves1

1The Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Portugal; 2The Institute of Telecommunication Aveiro, PT; 3Vision-Box Lisboa, PT

Neutral facial expression recognition is of great importance in various domains and applications. This study introduces a data-centric approach for neutral facial expression recognition, presenting a comprehensive study that explores different methodologies, techniques, and challenges in the field to foster a deeper understanding. The results show that data augmentation plays a crucial role in improving dataset performance. Additionally, the study investigates different model architectures and training techniques to identify the most effective approach, with the InceptionV3 model achieving the highest accuracy of 72\%. Furthermore, the research examines the influence of preprocessing methods on the performance of both InceptionV3 and a simplified CNN model. Interestingly, the results indicate that preprocessing techniques positively affect the performance of the simpler CNN model but negatively impact the InceptionV3 model. The implemented system, used to evaluate the findings, demonstrates promising results, correctly classifying 77\% of neutral expressions. However, there are still areas for improvement. Creating a specialized dataset that includes both neutral and non-neutral expressions would greatly enhance the accuracy of the system. By addressing limitations and implementing suggested improvements, neutral facial expression recognition can be significantly enhanced, leading to more effective and accurate results.

RecPAD_141.pdf


An Analysis of Data-Centric Artificial Intelligence in Computer Vision Applications

Daniel Duarte Canedo, Petia Georgieva, António Neves

University of Aveiro, Portugal

Deep learning is witnessing rapid advancements, majorly impacting the computer vision field. However, as the complexity of these algorithms increases, their demand for data grows exponentially. As a result, there is an increasing emphasis on data-centric artificial intelligence in deep learning. In computer vision, data is primarily comprised of images and videos, forming datasets that are crucial inputs for deep learning algorithms during the learning process. However, these datasets can often be limited in size, biased, inadequate, and lacking proper labeling, particularly in the domain of computer vision where data collection, storage, labeling, and processing require substantial infrastructure and human resources. Consequently, researching how to tackle data collection, data quality, data generation, and data processing is of utmost importance in this field. This work explores an application of data-centric artificial intelligence in three distinct domains within computer vision: facial expression recognition, dirt detection in the context of intelligent robotics, and archaeological site detection. Given the distinct nature of the data involved with these applications, the objective of this work is to provide an analysis on how to conduct data management depending on the computer vision application.

RecPAD_144.pdf


AUTOMATIC DETECTION OF ABANDONED VINEYARDS USING AERIAL IMAGERY

Igor Teixeira1, Danilo Leite1, Joaquim J. Sousa1,2, António Cunha1,2

11Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal; 2INESC Technology and Science (INESC-TEC), Porto, Portugal

The European Union (EU) has established, through the Common Agricultural Policy (CAP), a system of aid and subsidies for farmers cultivating vineyards. Eligible areas must be monitored and registered in Geographic Information Systems. The agencies providing this support must verify that the parcels are engaged in agricultural activity through on-site checks or the analysis of aerial or satellite images. Abandonment situations lead to the cancellation of aid payments. In the Douro Demarcated Region of Portugal, inspections are conducted according to methods defined by the EU. However, due to the vast size of the region, the time required for analysis and the specialized human resources needed for these inspections are significant. In this study, a dataset was created to train convolutional neural networks (CNNs), and pre-trained VGG models were fine-tuned to classify vineyards as abandoned or non-abandoned. The model achieved an accuracy of 95.1% on the test dataset, while the top-performing model achieved an impressive overall accuracy and F1 score of 99% for both classes.

RecPAD_146.pdf


Enhancing Lifelog Retrieval through Automatic Image Annotation and Computer Vision Techniques

Luísa Amaral, Ricardo Ribeiro, António Neves

Universidade de Aveiro, Portugal

Lifelogging has recently gained significant popularity as individuals increasingly capture and document their daily experiences through different devices. This generates a vast digital collection of lifelogs, that hold valuable insights into the lifelogger's behaviors and patterns. Visual data is one of the most valuable sources of information in lifelogs and automatic image annotation plays a vital role in extracting information from lifelog images, enabling a comprehensive understanding of lifelog data and facilitating effective retrieval processes.

This paper presents an exploration of different computer vision techniques that can be employed to extract annotations from lifelog images. Additionally, a practical implementation of a lifelog annotation pipeline by incorporating state-of-the-art techniques from diverse computer vision tasks is also described. Furthermore, this work introduces research efforts focused on reducing redundancy in annotations and presents the classification of their importance. The use of computer vision techniques allows the extraction of rich insights from lifelog images, leading to enhanced efficiency and accuracy when retrieving them. By leveraging these techniques, lifeloggers can gain deeper insights into their experiences, enabling a better understanding of their captured memories.

RecPAD_148.pdf


A Deep Learning Ensemble for Object Detection and Classification in Retail Stores

Miguel Fernandes, Simão Paredes, Ana Alves, Francisco Pereira

Politécnico de Coimbra, Instituto Superior de Engenharia de Coimbra

This paper presents a deep learning pipeline for object detection and classification of retail items. The aim is to develop an automated system to accurately identify and categorize products on store shelves. The proposed framework employs YOLOv7 for object detection, followed by classification using either convolutional neural networks (CNNs) or CLIP, a vision transformer model. Comparative experiments are conducted on a dataset of grocery product images. Metrics indicate that the YOLOv7 network effectively localizes individual items, while for classification CNNs exhibit reasonable but limited accuracy, while CLIP demonstrates stronger zero-shot generalization. The work demonstrates the feasibility of a robust vision-based product recognition system, while highlighting opportunities for performance optimization.

RecPAD_154.pdf


Detection of Traffic Signals and Segmentation of Brain Tumors in Magnetic Resonance Images - Computer Vision Applications

Filipe Silva Carvalho1, Pedro Caridade2,3, Emília Bigotte de Almeida1, Verónica Vasconcelos1

1Polytechnic of Coimbra, Portugal; 2University of Coimbra; 3Primelayer

Computer Vision represents a branch of Artificial Intelligence with the potential to offer a wide range of benefits in various fields, as will be presented in this article, a project was undertaken for traffic sign detection, capable of detecting signs in images, and another project was developed for brain tumor segmentation in magnetic resonance images.

With the model, we achieved accurate tumor segmentations alongside the diagnosis made by a specialist.

RecPAD_158.pdf


Optimisation of Deep Neural Networks using a Genetic Algorithm: A Comparative Study

Tiago Gonçalves1,2, Leonardo Capozzi1,2, Ana Rebelo3, Jaime S. Cardoso1,2

1Faculdade de Engenharia Universidade do Porto Porto, Portugal; 2INESC TEC Porto, Portugal; 3Accenture Portugal Lisboa, Portugal

Deep learning algorithms have been challenging human performance in several tasks. Currently, most of the methods to design the architectures of these models and to select the training hyper-parameters are still based on trial-and-error strategies. However, practitioners recognise that there is a need for tools and frameworks that can achieve high-performing models almost automatically. We addressed this challenge using a meta-heuristics approach. We implemented a genetic algorithm with a variable length chromosome with three different benchmark data sets. In this comparative study, we vary the architectures of the convolutional and the fully-connected layers and the learning rate. The best models achieve accuracy values of 98.73%, 90.81% and 54.71% on MNIST, Fashion-MNIST and CIFAR-10, respectively.

RecPAD_159.pdf


3D object tracking for self-driving vehicles using synthetically-generated data

Diogo Emanuel Nascimento Mendonça1,2, Miguel Drummond3, Petia Georgieva2,4

1Universidade de Aveiro, Portugal; 2Departamento de Eletrónica, Telecomunicações e Informática (DETI); 3Instituto de Telecomunicações, Aveiro; 4Institute of Electronics and Informatics Engineering of Aveiro (IEETA)

In this work, we propose a strategy to generate synthetic data for 3D object tracking using the Precise-Synthetic Image and LiDAR (PreSIL) source code.

We tested the highest ranked LiDAR-only tracking algorithm with the generated data and compared with the baseline performance in the KITTI object tracking benchmark in order to verify the veracity of our data.

This is the first publicly available 3D object tracking dataset using an idealized next generation LiDAR sensor, permitting the study of LiDAR object tracking now aided by radial velocity.

RecPAD_161.pdf


3D pose estimation in a multi-camera scene: Main approaches and What is the Future?

Ana Filipa Rodrigues Nogueira1,2, Hélder P. Oliveira1,3, Luís F. Teixeira1,2

1INESC TEC - Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, R. Dr. Roberto Frias, Porto, Portugal; 2FEUP - Faculdade de Engenharia da Universidade do Porto, R. Dr. Roberto Frias, Porto, Portugal; 3FCUP - Faculdade de Ciências da Universidade do Porto, Rua do Campo Alegre 1021 1055, Porto, Portugal

3D pose estimation is a crucial task for numerous real-world applications. However, several obstacles, like the lack of 3D annotated datasets, occlusions or variations in human appearance, have been limiting the efficiency and reliance on the existing 3D pose estimation models and restraining their employment in real-world settings. Multi-view machine learning solutions have been increasingly explored to overcome those obstacles due to the greater representation power of deep neural networks and the increasing availability of environments with several cameras.

This paper focuses on deep learning methods for 3D pose estimation in a multi-camera setting. After comparing the various methodologies, it was possible to conclude that the best method depends on the intended application. Thus, future research should focus on finding a solution that allows a fast inference of a highly accurate 3D pose while keeping a low computational complexity. To this end, techniques like active learning, selection of views and multi-modal approaches should be further explored.

RecPAD_163.pdf


Protecting Biometric Data Privacy in Facial Recognition and Authentication: The Polyprotect Algorithm Solution

Jose Silva1, Nuno Gonçalves1,2

1Institute of Systems and Robotics - University of Coimbra, Portugal; 2INCM Lab - Portuguese Mint and Official Printed Office, Lisbon, Portugal

Facial recognition and authentication are techniques that are increasingly being used to guarantee the reliability and privacy of personal data. Their use is aimed at practical application in the most varied areas of everyday life, namely in the use of data in official documents such as passports and ID cards; in the authentication of computer equipment (smartphones, laptops, desktops); and the recognition of official documents with national institutions (social security, finance).

With this goal in mind, we used the face recognition model of Medvedev et al. to extract a numerical representation of the face (face embedding), protecting this representation with the Polyprotect algorithm of Hahn and Marcel.

Our theoretical and practical contribution to the field of facial recognition and authentication, using Hahn and Marcel's Polyprotect algorithm, associated with Medvedev et al.'s facial authentication model, is to ensure enhanced privacy of biometric data.

RecPAD_164.pdf


Deep vision quality control - an example application in the casting industry

Pedro Rocha1, Nuno Martins2, Fernando Lopes1,2, Luis Cruz2

1Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Coimbra, PT; 2Instituto de Telecomunicações, Coimbra, PT

Ensuring the quality of individual parts during manufacturing is crucial for upholding the integrity of the final products and fostering confidence among customers in the global market.

In this paper, we explore the potential of applying deep learning techniques in a computer vision task aiming to classify manufactured parts as either defective or non-defective based on the processing of grayscale images of these parts. We conducted experiments on a publicly available dataset comprising 1300 images of casted submersible pump impellers, utilizing a custom CNN implemented with the Keras framework. We attained a recall rate of 94% for the defective class and a precision of 89% for the non-defective class. Additionally, we discuss qualitative insights gained through the application of Grad-CAM (Gradient-weighted Class Activation Mapping) for better understanding of the model’s decision-making process.

RecPAD_165.pdf


Neural architecture search for deepfake detection

José Nave, Vasco Lopes, João Neves

Universiade da Beira Interior and NOVA LINCS, Portugal

As deepfakes popularity increases, so does their quality. It has become an absolute necessity to be able to distinguish between synthetic and real footage. The research community has dedicated substantial efforts to address this threat, which has led to an explosion in the number of papers related to deepfakes in recent years, and we can see that the most common techniques today involve the use of deep learning and neural networks to detect deepfakes. One solution that has already been applied to other problems, and many of them are also related to images but has only recently started to be applied to this problem is Neural Architecture Search (NAS).

Because this topic is starting to gain some attention, we did a survey on the methods that use NAS and that have already been experimented in the deepfake detection context.

RecPAD_169.pdf


Low-Resolution Retinal Images: Detection and Mosaicing using Deep Learning Methods

Tales Correia1, António Cunha3,4, Paulo Coelho1,2

1ESTG: Polytechnic Institute of Leiria, Portugal; 2INESCC: Institute for Systems Engineering and Computers at Coimbra; 3UTAD: University of Trás-os-Montes and Alto Douro; 4INESC-TEC: Institute for Systems and Computer Engineering, Technology and Science

Glaucoma is a severe eye disease that is asymptomatic in the initial stages and can lead to blindness due to its degenerative characteristics. This paper presents a framework for detecting the retinal fundus that is applied to lower-resolution images taken with a smartphone equipped with a D-EYE lens. A private dataset was assembled, annotated, and applied to several versions of the well-known YOLO object detector to evaluate their performance. Furthermore, some mosaicing techniques were evaluated and applied to the lower-resolution frames to verify their usefulness as a video summarization tool. both YOLO v5 and v8 had similar performances, over 98% mAP(0.5) and 92.2(0.5:0.95).

RecPAD_171.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: RECPAD 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany