Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
19: Oral Session 3: Applications of Deep Learning
Time:
Wednesday, 29/Nov/2023:
2:00pm - 3:20pm

Session Chair: Yandre M. G. Costa
Location: Auditorium


Show help for 'Increase or decrease the abstract text size'
Presentations

Active Supervision: Human in the Loop

Ricardo P. M. Cruz1,2, ASM Shihavuddin3, Hasan Maruf3, Jaime S. Cardoso1,2

1INESC TEC, Porto, Portugal; 2Faculty of Engineering, University of Porto, Portugal; 3Green University of Bangladesh, Dhaka, Bangladesh

After the learning process, certain types of images may not be modeled correctly because they were not well represented in the training set. These failures can then be compensated for by collecting more images from the real-world and incorporating them into the learning process — an expensive process known as "active learning". The proposed twist, called active supervision, uses the model itself to change the existing images in the direction where the boundary is less defined and requests feedback from the user on how the new image should be labeled. Experiments in the context of class imbalance show the technique is able to increase model performance in rare classes. Active human supervision helps provide crucial information to the model during training that the training set lacks.



An end-to-end Deep Learning approach for video captioning through mobile devices

Rafael Jeferson Pezzuto Damaceno, Roberto Marcondes Cesar Junior

University of São Paulo, Brazil

Video captioning is a computer vision task that aims at generating a description for video content. This can be achieved using deep learning approaches that leverage image and audio data. In this work, we have developed two strategies to tackle this task in the context of resource-constrained devices: (i) generating one caption per frame combined with audio classification, and (ii) generating one caption for a set of frames combined with audio classification}. In these strategies, we have utilized one architecture for the image data and another for the audio data. We have developed an application tailored for resource-constrained devices, where the image sensor captures images at a rate of two frames per second. The audio data is captured from a microphone with a duration of five seconds at a time. Our application combines the results from both modalities to create a comprehensive description. The main contribution of this work is the introduction of a new end-to-end application that can utilize the developed strategies and be beneficial for environment monitoring. Our method has been implemented on a low-resource computer, which poses a significant challenge.



Leveraging Longitudinal Data for Cardiomegaly and Change Detection in Chest Radiography

Raquel Belo1, Joana Rocha1,2, João Pedrosa1,2

1Faculty of Engineering University of Porto, R. Dr. Roberto Frias s/n, 4200-465, Porto, Portugal; 2INESCTEC

Chest radiography has been widely used for automatic analysis through deep learning (DL) techniques. However, in the manual analysis of these scans, comparison with images at previous time points is commonly done, in order to establish a longitudinal reference. The usage of longitudinal information in automatic analysis is not a common practice, but it might provide relevant information for desired output. In this work, the application of longitudinal information for the detection of cardiomegaly and change in pairs of CXR images was studied. Multiple experiments were performed, where the inclusion of longitudinal information was done at the features level and at the input level. The impact of the alignment of the image pairs (through a developed method) was also studied. The usage of aligned images was revealed to improve the final metrics for both the detection of pathology and change, in comparison to a standard multi-label classifier baseline. The model that uses concatenated image features outperformed the remaining, with an Area Under the Receiver Operating Characteristics Curve (AUC) of 0.858 for change detection, and presenting an AUC of 0.897 for the detection of pathology, showing that pathology features can be used to predict more efficiently the comparison between images. In order to further improve the developed methods, data augmentation techniques were studied. These proved that increasing the representation of minority classes leads to higher noise in the dataset. It also showed that neglecting the temporal order of the images can be an advantageous augmentation technique in longitudinal change studies.



Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification

Mohamed Ramzy Ibrahim1,2, Robert Benavente2, Daniel Ponsa2, Felipe Lumbreras2

1Computer Engineering Department, Arab Academy for Science, Technology and Maritime Transport, Alexandria, Egypt; 2Computer Vision Center & Computer Science Department, Universitat Autònoma de Barcelona, Spain

Deep learning has made significant advances in recent years, and as a result, it is now in a stage where it can achieve outstanding results in tasks requiring visual understanding of scenes. However, its performance tends to decline when dealing with low-quality images. The advent of super-resolution (SR) techniques has started to have an impact on the field of remote sensing by enabling the restoration of fine details and enhancing image quality, which could help to increase performance in other vision tasks. However, in previous works, contradictory results for scene visual understanding were achieved when SR techniques were applied. In this paper, we present an experimental study on the impact of SR on enhancing aerial scene classification. Through the analysis of different state-of-the-art SR algorithms, including traditional methods and deep learning-based approaches, we unveil the transformative potential of SR in overcoming the limitations of low-resolution (LR) aerial imagery. By enhancing spatial resolution, more fine details are captured, opening the door for an improvement in scene understanding. We also discuss the effect of different image scales on the quality of SR and its effect on aerial scene classification. Our experimental work demonstrates the significant impact of SR on enhancing aerial scene classification compared to LR images, opening new avenues for improved remote sensing applications.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: CIARP 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany