8:00am - 8:20amEnhancing Sentiment Analysis Using Formal Linguistic Tools
Mario Monteleone1,2
1Dipartimento di Scienze Politiche e della Comunicazione, Università degli Studi di Salerno, Italy; 2X23 Science In Society, Bergamo (Italy)
Generative Artificial Intelligence (GAI) text production is crucial to research fields as Data Science (DS) and Network Textual Data Analysis (NTDA), the main purposes of GAI being to simulate human language production, exploiting both Machine Learning (ML) and Large Language Models (LLMs).
However, as pre-trained probabilistic models, LLMs are biased when built on non-perfectly balanced data as for retrieval sources, taxonomy, ontology interconnections and linguistic inference. This is most relevant to DS and NTDA, as it can contribute in social media to spreading fake news, conspiracy theories, counterproductive narratives, and online hate speech. Equally relevant is GAI being devoid of a reality formal model (Pearl and Mackenzie, 2018), causing GAI to have no ethics, as it cannot identify and correct its inaccuracies. This brings LLMs and GAI to suffer from effectiveness and reliability issues, showing tendency to prompt incorrect and discriminatory information, and hallucinations.
Newborn Neuro-Symbolic Artificial Intelligence (NSAI) tries to cope with these issues building elementary ontologies to integrate human symbolic reasoning principles with ML and Artificial Neural Networks (ANNs). Here we will demonstrate that better results come integrating also formalized morphosyntactic and semantic information, as those relating to Italian negation grammar. Therefore, to tackle on-line hate speech, we propose here a method of Sentiment Analysis (SA) that uses NooJ software (Silberztein 2016) to build formal ontologies and syntactic grammars within graphs representing finite state automata/transducers. While ontologies will conceptualize sets of word having contiguous contextualized meanings, syntactic grammars will parse texts using Italian formalized morphosyntax and semantics.
8:20am - 8:40amConsiderations and Challenges in Dealing with Online Italian Content Related to Social Issues: Constructing Datasets of Online Opinions for Human Annotation.
Alex Cucco1, Emiliano del Gobbo2, Lara Fontanella1, Sara Fontanella3, Luigi Ippoliti1
1University G. d'Annunzio Chieti-Pescara; 2University of Foggia; 3Imperial College London
Ensuring a diverse representation of opinions, sentiments, and topics in social discourse is essential when curating data for machine learning and statistical models, particularly in contexts requiring explainability. Online comments offer a rich source of public opinion; however, they often exhibit an imbalanced distribution of perspectives, amplifying specific viewpoints while underrepresenting others. Such biases can lead to unfair models that reinforce stereotypes and reduce the reliability of analytical outcomes.
To address this challenge, we propose an approach able to capture a wide spectrum of sentiments and topics, facilitating the creation of a balanced dataset for human annotation and further analysis. Our approach focusses on targeted sampling strategies leveraging on network analysis and node sampling techniques to ensure comprehensive topic and sentiment representation.
We illustrate the effectiveness of this method through a simulated case study and an application analyzing online discourse on migration, leveraging social media data. We introduce a refined sampling technique aimed at improving coverage across different viewpoints. By adopting this approach, we seek to support the development of fair and transparent models capable of accurately interpreting complex social debates.
8:40am - 9:00amEnhancing Sentiment Analysis Using Formal Linguistic Tools
Mario Monteleone
Dipartimento di Scienze Politica e della Comunicazione, Università degli Studi di Salerno, Italy
Generative Artificial Intelligence (GAI) text production is crucial to research fields as Data Science (DS) and Network Textual Data Analysis (NTDA), the main purposes of GAI being to simulate human language production, exploiting both Machine Learning (ML) and Large Language Models (LLMs).
However, as pre-trained probabilistic models, LLMs are biased when built on non-perfectly balanced data as for retrieval sources, taxonomy, ontology interconnections and linguistic inference. This is most relevant to DS and NTDA, as it can contribute in social media to spreading fake news, conspiracy theories, counterproductive narratives, and online hate speech. Equally relevant is GAI being devoid of a reality formal model (Pearl and Mackenzie, 2018), causing GAI to have no ethics, as it cannot identify and correct its inaccuracies. This brings LLMs and GAI to suffer from effectiveness and reliability issues, showing tendency to prompt incorrect and discriminatory information, and hallucinations.
Newborn Neuro-Symbolic Artificial Intelligence (NSAI) tries to cope with these issues building elementary ontologies to integrate human symbolic reasoning principles with ML and Artificial Neural Networks (ANNs). Here we will demonstrate that better results come integrating also formalized morphosyntactic and semantic information, as those relating to Italian negation grammar. Therefore, to tackle on-line hate speech, we propose here a method of Sentiment Analysis (SA) that uses NooJ software (Silberztein 2016) to build formal ontologies and syntactic grammars within graphs representing finite state automata/transducers. While ontologies will conceptualize sets of word having contiguous contextualized meanings, syntactic grammars will parse texts using Italian formalized morphosyntax and semantics.
9:00am - 9:20amExploring Semantic Networks to Assess Latent Attitudes Toward Migrants
Alex Cucco1, Lara Fontanella1, Giuseppe Giordano2, Michelangelo Misuraca2, Annalina Sarra1
1University "G.d'Annunzio" of Chieti-Pescara, Italy; 2University of Salerno
The growing influence of social media platforms has provided an unprecedented opportunity to assess public sentiment and attitudes toward various social issues, including migration. While traditional methods, such as questionnaires, are commonly used to retrieve latent traits about attitudes, the increasing volume of free text on social media presents a dynamic, alternative data source. Questionnaires that include both open-ended responses and scales such as the Semantic Differential and the Bogardus Social Distance Scale allow for the measurement of respondents’ attitudes and their similarity-based semantic expressions in free text regarding migration. These latent traits, estimated through models like the Graded Response Model (GRM) within Item Response Theory (IRT), offer valuable insights into public perceptions. However, social media comments provide an additional layer of data, capturing spontaneous expressions and shifting sentiments in real time.
This study aims to explore the connection between latent attitudes derived from questionnaire responses and the language used in social media posts. By evaluating the semantic networks within public comments, the study investigates whether the latent traits of social media users can be inferred from their online discourse. This approach leverages publicly available data to assess migration-related attitudes, an area traditionally reliant on structured surveys. Specifically, the study examines the potential of textual network analysis for this purpose and evaluates a semi-supervised approach to improve the assessment of online latent traits.
9:20am - 9:40amHow to Trigger Public Figures’ Engagement on Social Media
Shahar Lavian1, Gilad Ravid1, Alon Bartal2
1Industrial Engineering and Management Department, Ben Gurion University of the Negev, Israel; 2The School of Business Administration, Bar-Ilan University
Public figures such as celebrities, politicians, and influencers who post online attract numerous replies from users but only respond to selective users. The factors influencing public figures’ selective engagement are largely unknown. We analyzed a dynamic network of public figure interactions with specific users who replied to posts of a public figure. These networks are sparse since most users' replies to the original posts of a public figure remain unaddressed by the public figure. Given a user who replied to an original post of a public figure, our goal is to predict if a public figure will engage with a user's reply. To define this population, we employed a filtering methodology using ranking lists from reputable sources, such as Forbes and TIME, alongside an American filter. This approach ensures that the selected public figures hold significant influence and visibility, making their engagement behavior on social media particularly relevant for the study.
We analyzed 250,000 user replies to posts originated by public figures on X, collected between 2022 and 2024. Each user reply is labeled as ‘engaged by the public figure’ (1) or not (0), allowing a systematic examination of engagement patterns. To explore potential homophily in digital discourse, we construct a multi-dimensional user similarity graph incorporating linguistic features, emotion intensity, and temporal engagement patterns. We apply k-nearest neighbors (k=50) to link users who communicate in similar ways, filtering edges based on cosine similarity (<0.3). Our network analysis reveals a high assortative coefficient (0.6799), suggesting strong homophily. Users with similar emotional tone, linguistic complexity, and response timing tend to receive similar levels of engagement from public figures.
To predict whether a public figure will respond to a user reply, we trained 3 classifiers, incorporating network-based attributes, emotion-based attributes, and time between a post and a reply. We trained XGBoost, Random Forest, and a Hybrid Siamese Convolutional Network (HSCNN). XGBoost outperformed all other models with an ROC-AUC score of 0.96. The most important predictive factors include the time interval between an original post and a reply, the intensity of anger expressed in the reply (dominant anger levels), and the complexity of language used (lexical diversity). We find that public figure engagement is shaped by systematic patterns in user communication styles and response behaviors.
By integrating social network analysis with predictive modeling, this research advances our understanding of the selective engagement of public figures in online discourse. Future work should explore temporal evolution in engagement homophily and examine cross-platform variations in reply behaviors.
|