Large language-models (“LLMs”) can produce broadly convincing argumentation about a variety of academic subjects, including those in the humanities. Interestingly, these LLMs can also be “trained” on one’s own academic writing: one may submit one’s own works to an LLM and have it thereby come to learn one’s research style, interests, and substantive commitments in one’s field. And this training can be supplemented by ongoing conversations with the chatbot. Thus, the LLM can come to know you and your academic research very intimately, if you let it (and sometimes, even if you don’t).
Suppose that an academic researcher in the humanities—for example, a philosopher or literary critic—trains an LLM on two key “datasets”: (1) the subject-matter literature the researcher wishes to write about; and (2) the researcher’s own academic-research style, works, interests, goals, and orientations, in the forms of written works and real-time chat conversation. The LLM may already know the subject matter of the research well, but in this way, the LLM also comes to know the researcher very well. Now the researcher prompts the LLM to produce an original manuscript based on those two datasets. The researcher then submits the manuscript, under their own name, to a journal or conference, without explicitly acknowledging the role of the LLM in producing the manuscript.
This raises several philosophically interesting questions, some of which have not been addressed in the literature so far. First, we can ask whether the researcher committed any kind of research misconduct or plagiarism. But we might also ask whether we should hope or dread that such practices become common in the academy.
I will argue that, in general, the researcher has not necessarily committed any serious research misconduct. The researcher’s approach is not fundamentally different from current research practices in the humanities, and it bears an intimate-enough connection to the researcher’s identity that it does not qualify as any kind of plagiarism.
Yet what are the advantages and disadvantages of an academia in which this “research” method becomes common? I suggest that the advantage will be the higher production of research output, which will have a certain kind of greater familiarity with the existing academic literature. The disadvantages will result from the fact that academic research is now quite a bit easier.
This may produce several problems, but at least one is philosophically interesting: Such a system would weaken the intimate, identity-based connection between the author and the work produced. This will undermine a kind of valuable vulnerability that has thus far been present in academic research. If it becomes much easier to produce academic research, and the research produced is less associated by the author with their own identity, then there will be less motivation to ensure academic and even moral quality in the research.
I respond to objections; provide recommendations about how to cultivate some of the benefits of LLM-assisted research in the humanities while avoiding some of these problems; and briefly draw connections to related areas of the philosophy of technology.
Acknowledgments:
None of this text and research was generated by AI.
References:
Babushkina, D., & Votsis, A. (2022). Disruption, technology and the question of (artificial) identity. AI Ethics, 2, 611–622.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.
Carr, N. (2010). The shallows: How the internet is changing the way we think, read and remember. Atlantic Books.
Chan, C. K. Y. (2023). Is AI changing the rules of ccademic misconduct? An in-depth look at students’ perceptions of “AI-giarism.” ArXiv. https://arxiv.org/abs/2306.03358
Crawford, K. (2021). The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 618–694.
Ganjavi, C., Eppler, M. B., Pekcan, A., Biedermann, B., Abreu, A., Collins, G. S., Gill, I. S., & Cacciamani, G. E. (2024). Publishers’ and journals’ instructions to authors on use of generative artificial intelligence in academic and scientific publishing: bibliometric analysis. BMJ, 384, e077192.
Gupta, S., Ranjan, R., & Singh, S. N. (2024). A comprehensive survey of Retrieval-Augmented Generation (RAG): Evolution, current landscape and future directions. ArXiv. https://arxiv.org/abs/2410.12837
Hosseini, M., Rasmussen, L. M., & Resnik, D. B. (2023). Using AI to write scholarly publications. Accountability in Research, 31(7), 715–723.
Kwon, D. (2024, July 30). AI is complicating plagiarism. How should scientists respond? Nature. https://www.nature.com/articles/d41586-024-02371-z
Mann, S. P., Vazirani, A. A., Aboy, M., Earp, B. D., Minssen, T., Cohen, I.G., & Savulescu, J. (2024). Guidelines for ethical use and acknowledgement of large language models in academic writing. Nature Machine Intelligence, 6, 1272–1274.
Fui-Hoon Nah, F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research, 25(3), 277–304.
Raman, R. (2023). Transparency in research: An analysis of ChatGPT usage acknowledgment by authors across disciplines and geographies. Accountability in Research, 1–22. https://doi.org/10.1080/08989621.2023.2273377
Silva, V. T., de Souza, J. P. G., & Cerqueira, R. F. D. (2024). Lessons learned in knowledge extraction from unstructured data with LLMs for material. ACS Spring 2024. https://research.ibm.com/publications/lessons-learned-in-knowledge-extraction-from-unstructured-data-with-llms-for-material-discovery
Tadimalla, S. Y. & Maher, M. L. (2024). AI and identity. ArXiv. https://arxiv.org/html/2403.07924v2
Vallor, S. (2024). The AI mirror: How to reclaim our humanity in an age of machine thinking. Oxford University Press.
van Est, Q. C., Rerimassie, V., van Keulen, I., & Dorren, G. (2014). Intimate technology: the battle for our body and behaviour. Rathenau Instituut.
van Riemsdijk, M. B. (2018). Intimate computing: Abstract for the philosophy conference “Dimensions of Vulnerability.” Intimate Computing. https://intimate-computing.net/wp-content/uploads/2019/03/riemsdijk18dov.pdf
van Rooij, I. (2022, December 29). Against automated plagiarism. Iris van Rooij. https://irisvanrooijcogsci.com/2022/12/29/against-automated-plagiarism/
Wu, X. & Tsioutsiouliklis, K. (2024). Thinking with knowledge graphs: Enhancing LLM reasoning through structured data. ArXiv. https://arxiv.org/abs/2412.10654