8:00am - 8:20amForgetting Friends and Foes: Self-Reported Errors in Sociocentrically Mapped Face-to-Face Networks
Karina Raygoza Cortez, Marios Papamichalis, Nicholas A. Christakis, Ana Lucia Rodriguez de la Rosa
Yale University, United States of America
Measuring social networks in community settings is complex. Surveys are the most common instruments for collecting face-to-face social ties, especially when assessing real-life friends and foes in defined communities (Stoebenau & Valente, 2003, Perkins et al., 2015; Shakya et al. 2017, Offer, 2021). This is typically accomplished by asking “name generator” questions in which a respondent (an ego) is prompted to recall their social contacts (their alters). Most social network analysis studies assume that participants’ responses are generally accurate, often equating the absence of a tie with its non-existence (Marin, 2007; Marsden, 1990, 2005). However, empirical evidence shows non-trivial numbers of participant errors with this approach, which can even yield more forgotten than nominated ties contingent on instrument design (Sudman, 1985; Brewer, 2000). Here, we empirically analyze self-reported forgotten ties on 82 sociocentric signed network graphs in isolated rural Honduran villages. Respondents nominated their positive and negative relationships (ties) in 2019 and again in 2022. Immediately after completing the same survey in 2022, they were queried about omitted relationships they had previously included in their last network report in 2019, discerning among relational changes and present relationships that were not nominated by error (for both friends and foes). We analyze individual, tie, and village-level correlates of such “forgotten” ties. Friends (64.5%) are more overlooked than adversaries (11.6%). Reporter and tie characteristics (e.g., age, education, gender, religion), network position (e.g., degree of the dyad), and village attributes (e.g., size, isolation) correlate to the probability of forgetting ties. Our findings highlight the multi-leveled considerations that can affect respondent error (including context, network, and participant characteristics). We highlight the differential errors with respect to positive and negative ties, confirming the need for more research focused on the methodological challenges of negative name generators, as appraisal recommendations for positive networks may not extend to the data collection of negative ties (Rodriguez De La Rosa et al., 2024). Finally, we show that reporting errors have implications for the computation of network metrics and might therefore affect the resulting analyses of studies that rely on them.
8:20am - 8:40amBayesian estimation of ERGMs with not-at-random missingness in covert networks
Jonathan Januar1, H Colin Gallagher1, Johan Koskinen1,2
1University of Melbourne, Australia; 2Stockholm University, Sweden
Missing data methods generally start with the missing at random (MAR) assumption. However, this assumption may not generally be applicable to missing network data due to dependencies in the true network. MAR is also particularly implausible when working with covert network data collated from various different sources. In previous work, we proposed using exponential random graph models (ERGMs) as a flexible way of modelling how covert networks are observed, i.e. the ERGM is used to model the processes that cause network tie-variables to be missing. Using this framework, we also demonstrated that plausible missing not at random mechanisms (MNAR) in covert network settings can have drastic effects on the observed network depending on specification of the missingness mechanism. In pursuit of a method to address the estimation of ERGMs to networks with missing data, that are MNAR, we integrate the statistical ERGM modelling of missingness mechanism with the estimation of ERGMs for covert networks using Bayesian data augmentation. We evaluate the proposed inference scheme with a variety of missingness mechanisms and estimation models. We also provide examples of sensitivity analyses of the estimated parameters with their corresponding missingness mechanisms.
8:40am - 9:00amExtending respondent-driven sampling to allow modeling of social networks with application to people who inject drugs
Amirhossein Alvandi1, Pavel N. Krivitsky2, Krista Gile1
1University of Massachusetts Amherst, US; 2University of New South Wales, Australia
Respondent-driven Sampling (RDS) is often used to sample hard-to-reach human populations, especially those at risk for transmissible disease such as HIV and HCV. RDS is conducted by collecting samples over the social network, leaving a tantalising trace of the social network in the dataset, and begging the question of whether this incidental network information can be used to make inference about the underlying social network that might relate to the transmission of infection. A key limitation of this pursuit is that the RDS network information is structurally limited to tree-structured data – there are no cross-ties and no way to infer endogenous clustering, a key component of disease transmission. In this study we introduce the augmentation of RDS data with the distribution of tokens to provide a sample of cross-ties and introduce a method to use these data to make inference to the underlying social network.
9:00am - 9:20amSampled datasets risk substantial bias in the identification of political polarization on social media
Gabriele Di Bona1,2, Emma Fraxanet3, Björn Komander4,5, Andrea Lo Sasso6,7,8, Virginia Morini9, Antoine Vendeville10,11,12, Max Falkenberg13, Alessandro Galeazzi14
1CNRS, GEMASS, 59 rue Pouchet, F-75017, Paris, France; 2Sony Computer Science Laboratories Rome, Joint Initiative CREF-Sony, Centro Ricerche Enrico Fermi, Via Panisperna 89/A, I-00184, Rome, Italy; 3Department of Information and Communication Technologies, Universitat Pompeu Fabra, Tànger 122-140, 08018, Barcelona, Spain; 4IIIA-CSIC, Campus UAB, 08193 Cerdanyola, Spain; 5School of Computing Technologies, RMIT University; 6Universita degli Studi di Bari Aldo Moro, Dipartimento Interateneo di Fisica, Bari, I-70125, Italy; 7Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, I-70125, Italy; 8Predict S.r.l., Viale Adriatico - Fiera del Levante - Pad. 105, I-70132 Bari, Italy; 9KDD Lab, CNR-ISTI, 56126 Pisa, Italy; 10médialab, Sciences Po, 75007 Paris, France; 11Complex Systems Institute of Paris Île-de-France (ISC-PIF) CNRS, 75013 Paris, France; 12Learning Planet Institute, Research Unit Learning Transitions (UR LT, joint unit with CY Cergy Paris University), F-75004 Paris, France; 13Department of Network and Data Science, Central European University, Vienna, Austria; 14Department of Mathematics, University of Padova, Italy
Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data.
9:20am - 9:40amSampling error in social networks
Judith Gilsbach1,2, Alyssa Smith3, Termeh Shafie2, David Lazer3
1GESIS Leibniz Institute for the Social Sciences, Germany; 2University of Konstanz, Germany; 3Network Science Institute, Northeastern University, Boston, MA, USA
Sampling error has been well defined for classic population surveys (e.g. in the Total Survey Error Framework, Groves et al. 2011). Also for social network data a vast body of literature exists on sampling error. It is among the most extensively studied error types for social network data collections. However, sampling error in social network analysis appears to be polymorphic in nature and has not yet been clearly defined. As the ideal “random sample” does not exist in social network data due to its relational nature, many definitions of sampling error, sampling procedures and assessment strategies have been proposed. This work identifies five different types of cross-sectional and longitudinal sampling errors in social network data drawing on a systematic literature review as well as qualitative interviews with researchers. Identified errors are “within network sampling error” i.e. biases that occur if a subset of nodes or edges are drawn from a whole network, “between network sampling error” i.e. biases that occur when a sample of whole networks is drawn e.g. a set of school classes, “observation time error” i.e. the network is observed before or after an event that is supposed to be studied “observation span error”, i.e. noise in observations due to too long observation periods being aggregated into an equilibrium and “observation distance error”, i.e. the time between observations is inadequate. Further, “within network sampling error” and “observation time error” are empirically investigated using a large social media dataset as a case study comparing different sampling approaches.
|