8:00am - 8:20amOn species uniqueness in ecological networks
Wei-chung Liu
Academia Sinica, Taiwan
Species are embedded in an intricate web of interactions known as a food web. A food web is the most fundamental network representation of an ecosystem. It is therefore nature to assess species importance from a network perspective. Past literatures emphasize the use of centrality measurements to quantify species importance. Recent advances of species importance research have proposed the concept of species uniqueness as an complementary measure to species importance. In this presentation, we discuss what uniqueness is and review its recent developments. We start with the concept of species trophic field, which is the set of species a focal species can strongly affect, and how this can be applied to measure species uniqueness. This trophic field-overlap approach can be extend to consider both strong and weak interactors of a focal species, providing a more complete view on species uniqueness. We then show how this extended approach can be simplified by using a matrix that represents the interaction structure of a food web. All the above approaches consider how species can affect all others, but we argue that information such as how a species is affected by all others can also be utilized for species uniqueness measurements. This new concept spurs us to develop yet another specie uniqueness measure that considers simultaneously information on effects exerted and received by a species. We analyze 92 food webs to show the relationship between past approaches and our new approach.
8:20am - 8:40amA regression framework for studying relationships among attributes under network interference
Michael Schweinberger, Cornelius Fritz, Subhankar Bhadra, David Hunter
The Pennsylvania State University, United States of America
To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021.
8:40am - 9:00amAnalysis of Word Co-occurrence Networks from Paper Abstracts in Semantic Scholar Database
Yoonjin Lee1, Frederick Kin Hing Phoa2, Hohyun Jung1
1Sungshin Women's University, South Korea; 2Academia Sinica, Taiwan
The abstract is a crucial frontmatter element that provides readers with key insights into a manuscript's core ideas and subject categories. Identifying the most important words in abstracts can offer valuable clues about the central themes and evolving trends within a particular subject area. This work introduces a novel analysis method to determine the importance of words within a subject category over time, based on various centrality measures in a word co-occurrence network. The network is constructed from words extracted from the abstracts of manuscripts within a specific scientific subject. We demonstrate the effectiveness of this method using a subset of the Semantic Scholar database, focusing on the field of Statistics from 2019 to 2023.
9:00am - 9:20amExtending the Event Subpopulation Model: Estimating Personal Network Size with Inbreeding Bias
Ryuhei Tsuji
Kindai University, Japan
This study estimates the size of personal networks (number of friends and acquaintances) using the event subpopulation model (Bernard, Johnsen, Killworth, and Robinson, 1989), including individuals infected with COVID-19 and those who died in the Great East Japan Earthquake. Initial estimates varied significantly depending on population definitions, such as (a) whole Japan, (b) urban / (c) rural prefectures for COVID-19, and (b) inside / (c) outside tsunami-affected prefectures for GEJE. To improve accuracy, we refined the estimation model by incorporating inbreeding bias within a biased network framework (Fararo, 1981).
The estimation method is based on the proportion of respondents who know an affected individual. The basic formula is:
c = t p / e ...(1)
where e is the number of event participants, p is the proportion of respondents who know a participant, and t is the total population. Adjusting for inbreeding bias:
c = t p / (e (1 - tau)) ...(2)
where 0 <= tau <= 1. Further incorporating binomial variance correction leads to:
c = (t p / (e (1 - tau))) * (1 + (p (1 - p) / e)) ... (3)
which is always larger than the uncorrected estimate.
When we applied these models to Japan, we still observed significant regional variations. Although the refined models improve plausibility, the exact value of tau remains uncertain. Results suggest that estimates from natural disasters, where information spreads widely, may be more reliable than those from infectious disease events, where social stigma limits disclosure.
9:20am - 9:40amModel-based edge clustering for weighted networks with a noise component
Daniel Sewell1, Haomin Li2
1University of Iowa, United States of America; 2Merck & Co., Inc.
Clustering is a fundamental task in network analysis, essential for uncovering hidden structures within complex systems. Edge clustering, which focuses on relationships between nodes rather than the nodes themselves, has gained increased attention in recent years. This provides benefits in terms of (1) understanding the network in terms of the environments or events leading to edge formation, and (2) computational feasibility, as the computational cost of edge clustering is, for sparse networks, linear in the number of network nodes. However, existing edge clustering algorithms often overlook the significance of edge weights, which can represent the strength or capacity of connections, and fail to account for noisy edges—connections that obscure the true structure of the network. To address these challenges, the Weighted Edge Clustering Adjusting for Noise (WECAN) model is introduced. This novel algorithm integrates edge weights into the clustering process and includes a noise component that filters out spurious edges. WECAN offers a data-driven approach to distinguishing between meaningful and noisy edges, avoiding the arbitrary thresholding commonly used in network analysis. Its effectiveness is demonstrated through simulation studies and applications to real-world datasets, showing significant improvements over traditional clustering methods. Additionally, the R package "WECAN" (https://github.com/HaominLi7/WECAN) has been developed to facilitate its practical implementation.
9:40am - 10:00amProfiling Paper Influence Pattern in Citation Networks: Depth and Diversity
Wei-Chu Chiang1, Frederick Kin Hing Phoa1, Yu-Shin Lin1, Hiroka Hamada2
1Institute of Statistical Science, Academia Sinica, Taipei City 11529, Taiwan; 2Department of Statistical Modeling, Institute of Statistical Mathematics, Tokyo 190-8562, Japan
A paper citation network is a network constructed by citing relationships among papers. Such networks are directed acyclic graphs (DAGs) and could be seen as knowledge propagation systems. We borrow the ecological concept of functional diversity, which has already been applied to analysis of food webs (also DAGs), to capture the complexity of functional roles in a system. From the viewpoint of knowledge propagation, each paper has its “influence area” — direct and indirect influence on later knowledge. We focus on two dimensions of the influence area — “depth” and “diversity” — to capture the variety of paper influence patterns. Depth is a new index for evaluating a paper’s total impact that takes indirect influence into account. Diversity, based on depth, captures the extent of dispersion of influence over different subjects, calculations are weighted by the dissimilarity among subjects. By comparing direct and indirect parts in depth and diversity respectively, two additional indexes that further describe the structure of the influence area were developed. Totally, there are four indexes forming the paper influence pattern profile, providing new perspectives to evaluate a paper's impact and functional roles within a knowledge propagation system. These indexes were further applied to real paper citation data from Semantic Scholar as a demonstration.
10:00am - 10:20amReddit Users Unleashed - Understanding User Behaviour and Their Impact on Meme Stocks
Simon Trimborn
University of Amsterdam, Netherlands, The
In this study we investigate the drivers and changes in users' posting probability on social networks via a sparse network model and change point detection framework. With the model, we examine the impact of user behaviour on the Reddit forum Wallstreetbets upon markets. Results show that changes in users' behaviour significantly predicted returns, integrated volatility, and jump volatility, even when controlling for network activity and established metrics measuring influential user impact. Including changes in behaviour of users on Reddit into models to explain the market movements, leads to adj. R^2 of up to 0.45 for return and 0.8 for jump explainability, vastly outperforming the competing models. Studies often focus upon influential users in networks, but we show that changes in behaviour of less important users explain a larger part of returns, integrated variance and jump volatility than important users do.
10:20am - 10:40amUnderstanding Volatility in Infodemic Risk Index: A Twitter-Based Analysis Across Countries
Anna Bertani1,2, Riccardo Gallotti1
1Fondazione Bruno Kessler, Italy; 2University of Trento
During highly contentious and polarized events, such as the COVID-19 Pandemic, the vast amount of information circulating online increases the risk of an infodemic. Gallotti et al. (2020) introduced the Infodemic Risk Index (IRI), a novel metric to quantify the impact of this phenomenon which assesses the user’s exposure to unreliable content based on their number of followers. However, despite its practicality, the IRI exhibits significant volatility over time, particularly in certain countries where it fluctuates sharply.
In this study, we aim to investigate the causes behind these fluctuations, identifying key factors contributing to IRI instability. We analyzed Twitter data presented in Gallotti et al. (2020) spanning February 2020 - May 2022, and measure the index volatility by calculating the standard deviation over time for a total of 50 countries. Our findings reveal two key contributors to this instability. On one hand, volatility is partially correlated with the unequal distribution of followers, indicating that countries with highly followed users experience greater fluctuations. On the other hand, drawing from the concept of the news media diet (Bertani, 2024), we measured the uncorrelated entropy to assess the diversity of media consumption. We found the tendency of a negative correlation between the average media entropy and the IRI volatility, suggesting that limited media diversity contributes to index instability. This result has been tested by considering each news media source separately, finding that news sources classified as fake or political shows the same behaviour with a higher level of significance. This emphasizes how much news media outlets have an important role in catalyzing public attention during polarized events. Finally, further analyses on the way they attract attention might be insightful in order to contain the spread of misinformation.
|