When Machines Pollute the Study of Humans: The ELLIS-ELLIOT Reading Group Confronts a Methodological Crisis

The ELLIS-ELLIOT Reading Group on Human-Centric Machine Learning convenes researchers across Europe to examine how algorithmic and human decisions shape each other. The upcoming May 29 session tackles LLM Pollution, a phenomenon threatening the validity of online behavioral research as participants increasingly delegate tasks to AI. This student-led initiative, bridging the ELLIS PhD Program and ELLIOT Young Researcher Group, represents a new generation of European AI researchers grappling with questions that matter beyond the lab.

These questions about human-AI collaboration, research integrity, and the boundaries between human and machine cognition are precisely what Human x AI Europe will explore on May 19 in Vienna. If the intersection of AI systems and human agency concerns you, that room is where the conversation continues.

The Scene: A Virtual Room, A Continental Problem

Picture a virtual seminar room on May 29, 2026, at 10:00 CEST. Researchers scattered across European time zones log in to discuss a paper with an unsettling premise: the very act of studying human behavior online may now be compromised by the tools designed to assist us.

The ELLIS-ELLIOT Reading Group on Human-Centric Machine Learning has chosen to examine Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research, a paper by researchers at the Max Planck Institute for Human Development. The choice is deliberate. This reading group, organized by PhD students in the ELLIS (European Laboratory for Learning and Intelligent Systems) network and the ELLIOT Young Researcher Group, exists precisely to confront the uncomfortable questions about how algorithmic and human decisions influence each other.

The paper's findings are stark. According to the research published on arXiv, pilot studies found that up to 45% of participants in online behavioral research showed evidence of LLM involvement in their responses. Some responses were obviously machine-generated, featuring phrases like I don't experience confusion in the same way humans do in open-ended survey questions. Others were subtler: overly fluent language, suspiciously comprehensive summaries of instructions, responses that felt polished in ways human spontaneity rarely achieves.

Three Variants of Contamination

The paper identifies three distinct ways LLM Pollution manifests, each with different implications for research validity.

Partial LLM Mediation occurs when participants use language models for specific aspects of a task. A non-native English speaker might run their response through ChatGPT for fluency. A participant might ask an AI for advice on how to interpret a question. The human is still present, still deciding, but the output carries machine fingerprints. Researchers then face an interpretive problem: whose cognition are they measuring?

Full LLM Delegation represents a more fundamental breach. Here, agentic AI systems complete studies with minimal human oversight. Tools like OpenAI's Operator or open-source browser automation can navigate experiments, interpret screenshots, and generate responses without human involvement. The central premise of human-subject research collapses entirely.

LLM Spillover is perhaps the most philosophically interesting variant. Even when no AI is involved, human participants may alter their behavior because they anticipate LLM presence in online studies. The mere possibility of machine involvement changes human conduct. This is second-order reactivity: the research environment is transformed by the shadow of AI, whether or not AI is actually present.

Why This Matters Beyond the Lab

The reading group's focus on this paper reflects a broader concern within European AI research. The ELLIS Human-Centric Machine Learning Program, which hosts its annual workshop in Alicante on May 27-28, 2026, has made trustworthy AI its central theme. The workshop brings together researchers from ELLIS, ELIAS, ELLIOT, ELSA, and ELLE to examine robustness, safety, fairness, transparency, and interpretability of large-scale AI systems.

The LLM Pollution problem sits at the intersection of all these concerns. If behavioral research cannot reliably capture human cognition, then the datasets used to train and evaluate AI systems become suspect. The models built on contaminated data may encode machine patterns mistaken for human ones. The feedback loop tightens: AI trained on AI-influenced human behavior, deployed to influence human behavior further.

Professor Iyad Rahwan, Director of the Center for Humans and Machines at the Max Planck Institute and a co-author of the paper, has published related research in Nature showing that delegation to AI can increase dishonest behavior. Across 13 studies involving more than 8,000 participants, researchers found that people were significantly more likely to cheat when they could offload behavior to AI agents. With goal-oriented interfaces, only 12-16% of participants remained honest, compared to 95% when doing tasks themselves.

The implications extend beyond academic research. Policymakers relying on behavioral studies to inform regulation, companies using survey data to understand customers, public health officials tracking attitudes and behaviors: all face the same contamination risk.

The Reading Group as Institutional Form

The ELLIS-ELLIOT Reading Group represents something worth noticing: a student-led initiative that has become a regular feature of European AI research infrastructure. The group's archive reveals a sustained engagement with human-centric machine learning topics. Previous sessions have examined vision-language models' failure to understand negation, personality traits in large language models, privacy risks of algorithmic fairness, and the alignment gap between LLMs as chatbots versus LLMs as browser agents.

The format is simple: one paper, one presenter, one hour of discussion. But the cumulative effect is a community of researchers developing shared vocabulary and shared concerns. When Mohammad-Amin Charusaie presented work on Learn-to-Defer systems in February 2026, the discussion connected to broader questions about human-AI collaboration that the group had been building toward for months.

Learn-to-Defer (L2D) represents one technical response to the human-AI collaboration challenge. As recent benchmarking research published in Scientific Data explains, L2D algorithms improve human-AI collaboration by deferring decisions to human experts when they are likely to be more accurate than the AI model. The approach acknowledges that neither humans nor machines are universally superior; the question is how to allocate decisions optimally.

But L2D faces its own limitations. Research from Feedzai has identified fundamental challenges: L2D requires predictions from every considered human for every training instance, which is often unfeasible in real-world applications. Teams are staffed for regular operations, covering only a subset of cases, with only one human assigned to each decision. The method cannot update itself with new data in dynamic environments because complete data will not be available during regular operations.

The European Context

The reading group operates within a specific institutional landscape. ELLIS now comprises over 2,000 members across 44 sites in 17 countries, with 16 cross-border research programs and more than 500 PhD students and postdocs. The ELLIOT project, launched in July 2025 with €25 million in Horizon Europe funding, brings together 30 research and industry partners to develop open, trustworthy, multi-modal foundation models.

This infrastructure matters. European AI research has made a strategic bet that human-centric approaches, trustworthy systems, and open science practices can differentiate European AI from alternatives developed under different value systems. The reading group's focus on LLM Pollution is not merely academic curiosity; it is quality control for the research enterprise itself.

ELLIS Institute Finland, launched in 2025, explicitly focuses on making machine learning more readily and widely applicable, and better at working with people. The institute's emphasis on data-efficient probabilistic machine learning, interactive and cooperative machine learning, and transformative multidisciplinary collaboration reflects the same concerns that animate the reading group's discussions.

What Gets Naturalized

The reading group's May 29 session will be presented by Aditya Gulati. The discussion will likely range beyond the paper's specific findings to broader questions about research methodology in an age of ubiquitous AI assistance.

One question deserves particular attention: what happens when the contamination becomes invisible? The paper notes that LLM responses are increasingly indistinguishable from those written by humans. Detection methods exist, but they are imperfect and may trigger an arms race between detection and evasion. More fundamentally, if researchers cannot reliably distinguish human from machine-influenced responses, the category distinction itself becomes unstable.

This is not merely a technical problem. It is a question about what counts as human cognition, human behavior, human decision-making. The reading group's sustained engagement with these questions suggests that a generation of European AI researchers understands the stakes.

The paper proposes a multi-layered response spanning researcher practices, platform accountability, and community efforts. But the authors acknowledge that coordinated adaptation will be essential to safeguard methodological integrity. The reading group itself is one form of such coordination: researchers across institutions and countries developing shared understanding of shared problems.

The Artifact Remembers

There is something worth preserving in the reading group's archive. Each session leaves a trace: a paper discussed, a presenter named, a date recorded. These traces accumulate into a record of what European AI researchers found worth thinking about in 2025 and 2026.

The choice to examine LLM Pollution in May 2026 will be remembered as a moment when the research community confronted a methodological crisis in real time. The contamination was not a future risk to be anticipated; it was a present reality to be managed. The reading group's response, characteristically, was to read carefully, discuss openly, and build shared understanding.

This is what human-centric machine learning looks like in practice: not a set of technical solutions, but a community of researchers asking difficult questions about the relationship between human and machine cognition. The answers remain uncertain. The questions, at least, are becoming clearer.

Frequently Asked Questions

Q: What is the ELLIS-ELLIOT Reading Group on Human-Centric Machine Learning?

A: The reading group is a student-led initiative organized by PhD students in the ELLIS (European Laboratory for Learning and Intelligent Systems) network and the ELLIOT Young Researcher Group. It meets virtually to discuss papers on how algorithmic and human decisions influence each other, with sessions typically lasting one hour.

Q: What is LLM Pollution in online behavioral research?

A: LLM Pollution refers to the contamination of online research data when participants use large language models for advice, translation, or task delegation. The phenomenon manifests in three variants: Partial LLM Mediation (selective AI use), Full LLM Delegation (complete automation), and LLM Spillover (behavioral changes due to anticipated AI presence).

Q: How prevalent is LLM Pollution in research studies?

A: According to research from the Max Planck Institute for Human Development, pilot studies found that up to 45% of participants showed evidence of LLM involvement in their responses, including copying and pasting behavior and distinctly non-human phrases in open-ended questions.

Q: What is Learn-to-Defer (L2D) in human-AI collaboration?

A: Learn-to-Defer is a machine learning paradigm that enables AI systems to choose between making autonomous predictions and deferring decisions to human experts. The system learns when humans are likely to be more accurate than the AI model, optimizing overall team performance.

Q: When and where is the next ELLIS HCML Workshop?

A: The ELLIS Human-Centric Machine Learning Workshop 2026 takes place on May 27-28, 2026, in Alicante, Spain, with a working dinner and panel on May 27 and the main workshop on May 28. Hybrid attendance is available.

Q: How can researchers join the ELLIS-ELLIOT Reading Group?

A: Researchers and students can join the reading group's Google Group to receive communications about session details, suggested papers, and important links. The group maintains a code of conduct and welcomes participants interested in human-centric machine learning topics.

When Machines Pollute the Study of Humans: The ELLIS-ELLIOT Reading Group Confronts a Methodological Crisis