Observing the work of interviewers: how the quality of the data collection is constructed
French Institute for Demographic Studies (INED), France
Between February and November 2015, 27,000 people aged 20-69 were interviewed by telephone about violence undergone in different life contexts (family, couple, work, public space...). Realized by the French Institute for Demographic Studies (INED), the survey mobilized 110 interviewers, hired by the survey institute charged with the data collection.
Being present at the call center on a daily basis, the INED team held a field diary, reporting on call monitoring, individual or collective debriefings, interactions, and important events. Its 350 pages constitute a rich ethnographic material, completed by the numerous sheets where interviewers manually recorded the problems encountered during the calls and by the evaluation questionnaires they filled out after completing a questionnaire.
Based on this material, the paper aims to study how the working conditions of the interviewers affect the response behavior and the data quality. Working conditions cover the remuneration, the precarious and flexible status, as well the physical conditions (rooms, computer equipment, etc.), or the temporalities (schedules, advancement of data collection, special periods such as Ramadan, etc.). The paper will take into account the interactions between the various actors involved: interviewers, respondents, the INED team and supervisors from the survey institute, knowing that the last two didn’t necessarily share the same objectives. Finally, it is meant to provide insight into participants' response to the survey and also to shed light on how investigators led them to adhere to the questionnaire. It aims thus to better understand not only the conditions of production of the data but also the meaning given to their "quality".
Identification of interviewer effects in real and in falsified survey data
University of Kaiserslautern, Germany
In face-to-face interviews the interviewer has an important impact on the quality of survey data, but there is also the risk of interviewer effects. Even more problematically, the interviewer may decide to deliberately falsify interviews or parts of them. We want to answer the research question whether in falsified data similar “interviewer effects” appear as in real data or whether “interviewer effects” are stronger in falsified data and may be used as an indication for a data contamination by interviewer falsifications. We use experimental data, collected in the project “Identification of Falsifications in Survey Data (IFiS)” by GESIS, and apply intraclass correlations as well as multi-level regression models and compare interviewer effects in real survey data and in data falsified by interviewers. As main results we can report: 1) In the real data we do not find evidence for interviewer effects. 2) In the falsified data we find strong interviewer effects. 3) In the case of falsified data we find significant effects of the interviewer’s answer to the same question of the questionnaire. Thus, in order to detect falsifications, we recommend collecting as much information as possible about the interviewers. For example, the interviewers could answer the survey questionnaire as part of the interviewer training. Based on this information datasets or suspicious cases may be checked for interviewer effects.
The flexible verbal interaction coding
Sapienza Università di Roma, Italy
The verbal interaction coding is a quantitative technique used to encode the verbal interaction between interviewer and respondent. This technique is employed both to pretest survey questions, locating those which cause interviewer’s and respondent’s verbal deviations from the ideal sequence of standardization and, in data collection, to monitor and evaluate the interviewers’ behaviors according to the standard guidelines. Some scholars highlight that the main limit of this technique is to consider “as negative” all deviations. Actually two types of deviation exist: the deviation “harmful” to data quality (indicator of inaccurate question formulation or interviewer related error) and the "virtuous" ones, when interviewer and respondent vivaciously interact to understand each other. In this paper the authors present a study in which they tested a more qualitative verbal interaction coding, designed to overcome this limit. Thanks to this approach, during the questionnaire pretest, questions were reformulated only when many dangerous deviations gathered around, leaving out those which collected virtuous deviations. After the pretest, the researchers trained the interviewers and recommended them to adopt the virtuous behaviors, in order to have a flexible conduction and meet the respondents interpretative needs. During the fieldwork, the interviewers were monitored live, and who implemented a strict standardized conduction or "harmful deviant behaviors" was involved in a second training. In short, the authors of the study assert that a more qualitative verbal interaction coding, is useful to exceed its endemic limit because it allows to discern between bad and good deviant behaviors.
Weighing the moral worth of actions: a factorial survey approach to measuring the ordinary normative evaluations of altruistic actions
1Higher School of Economics, Russian Federation; 2Institute of Sociology of the Russian Academy of Sciences
For I. Kant the genuine moral worth of human actions including those praiseworthy actions which are commonly called altruistic depends on one’s determination to act dutifully guided only by the moral law itself (Kant, 1785). So, even actions motivated by sympathy to other human beings, happiness, let alone reciprocation and self-interest, do not truly express a good will and, therefore, cannot be considered morally worthy. Philosophical controversies surrounding Kant’s approach are mirrored in recent disputes on different species of altruism (e.g., Piliavin, 2009). However, no matter what moral philosophers and social scientists think about relative worth and “purity” of different motives of altruistic behaviors, it is still worthy to investigate what social factors determine ordinary people’s judgments about comparative moral worth of different prosocial actions (Simpson, Willer 2015). But how these lay normative judgments and post hoc evaluations can be studied in general and, specifically, when they are related to those norms which support costly social actions with positive externalities to others? We conducted a full-factorial vignette experiment aimed at examining the role such factors as relatedness, donation size, and reciprocity (conceptualized as history of interactions and probability of meeting in the future) play in third-person normative judgments about moral worth of altruistic actions. We found significant main effects of relatedness, donation size and probability to meet in the future and, generally in line with Kant’s view, higher ratings for larger donations to non-relatives with low chances to meet in the future (regardless of history of past interactions).
What do we measure with unequal or equal number of scale points? The midpoint problem and the left-right scale.
Heinrich-Heine-Universität Düsseldorf, Germany
According to the multitrait multimethod experiment, the 11-point produces the highest validity of left-right data (Kroh 2007). Although the unequal 11-point scale achieves better results, “„it will lead to both overestimates of the degree of opinionation in the population and the ordinality assumption that researchers typically invoke when analysing bipolar response scales” (Roberts & Smith 2014) remains without validation. In this paper, we used in an population survey a category follow-up probe administered to respondent who initially select point 5 (metric center) in the 9-point and who select point 5 (imagined center) in the 10- point left-right scale, to determine whether they selected this alternative in order to indicate distinct opinion, or to indicate that they do not have an opinion on the issue (position). According to the midpoint problem the question comes up, do equal or unequal scale work better? On the one hand, there is evidence that the midpoint encourage people without information (indifference) to admit it. On the other hand, offering or excluding the midpoint option doesn’t substantially influences results.
We find in our CATI survey – conducted in 2016 – the vast majority of responses turn out to be what they initially selected. About every fourth reallocate the responses (e.g. no answer, d.k). About every third can be assigned on the LR-Scale. The respondents remaining in the center (may be genuine neutral or endorsement of the status quo) is moderate. Against this background, a different approach – may be a multistage measure (?) – should be taken into consideration.