Between Theory and Reality: How Schools Grapple with Heterogeneity and Where AI Fits
TL;DR: Rising classroom heterogeneity and workload make AI in schools inevitable; with FACET we explore how evidence-based, teacher-centered AI can support meaningful differentiation and AI literacy without replacing human judgment.
Written by Jana Gonnermann-Müller.
Recognizing the Need for AI in Schools
On November 26, school principals from primary and secondary schools gathered with regional authorities for the annual KI-Fachtag der Schulen. The theme this year: Artificial Intelligence in Education.
The spotlight was firmly on concrete use cases: how Artificial intelligence (AI) can support schools today, but also on the fear that AI is used without verification or cross-checking, potentially leading to a loss of skill and knowledge. At the center of the discussion was a dual question. On the one hand, how do we ensure that AI use becomes meaningful, so that it augments, rather than replaces, core competencies, which includes fostering AI literacy, critical thinking, and verification skills so that students learn not simply to consume AI outputs, but to evaluate, challenge, and integrate them in ways that support long-term skill development. On the other hand, how can AI help schools address structural pressures such as rising workload due to classroom heterogeneity and teacher shortages?
As part of the event, our team from the IOL Lab at the Zuse Institute Berlin was invited to give a talk and to lead two hands-on workshops. In the keynote, we outlined the structural pressures that make AI integration in schools necessary: demographic change, Germany’s lag in technological adoption, and the growing information overload that increasingly requires students to critically evaluate and interpret information sources. In such a context, AI literacy and critical thinking become foundational competences. The workshop discussions quickly revealed just how acute these pressures have become. School leaders described rising heterogeneity in their classrooms, the growing need for differentiated instruction, and the challenges of ensuring meaningful and responsible AI use in everyday teaching. Their questions and concerns underscored the urgency of developing approaches that are not only technically feasible but also pedagogically sound. Yet meaningful AI integration depends on supporting teachers with practical, research-grounded tools that address for example diverse learning needs without adding to their workload.
This post situates these discussions within a broader research context, where we see our work as one puzzle piece in a much larger effort: working directly with schools to understand their needs, translate them into researchable questions, and offer evidence-based opportunities to address them. We therefore collaborate with schools to empirically examine where AI can support teaching, such as in differentiation under increasing heterogeneity, where it cannot, and how it must be designed to strengthen rather than undermine students’ skill acquisition. By co-developing and rigorously evaluating an AI-supported tool with practitioners, we aim to move the conversation away from emotion-driven expectations and fears and toward an evidence-informed understanding of what works in real classroom conditions.
The Bigger Picture: What ‘AI and School’ Encompasses
Schools constitute the first structured environment in which young people interact with broader social and technological systems. As such, they are expected to cultivate foundational competencies, such as critical thinking, judgment and collaboration, while simultaneously preparing students for rapidly evolving technological conditions. Contemporary education systems therefore face a multi-layered mandate: to enable engagement with AI, support cognitive and socio-emotional development, and contribute to the reduction rather than the reproduction of socio-economic disparities. Within this broader mandate, AI emerges not as an optional add-on but as an integral part of the technological landscape and decision-making contexts, students will have to navigate. The central question is thus not whether AI should be present in schools, but how it can be integrated in ways that reinforce, rather than erode, core cognitive and analytical skills.
We outline that ‘AI in schools’ comprises two interdependent domains. The first is education about AI, encompassing digital literacy, data literacy, and increasingly AI literacy. These competencies enable students to interpret uncertainty, understand model behavior, and critically evaluate algorithmic outputs—skills that underpin agency in AI-mediated environments.
The second is education with AI, referring to the use of AI tools within teaching and learning processes. In this domain, AI can help address structural challenges such as teacher shortages, the need for differentiated instruction, and unequal access to tutoring. Practical examples include teacher-facing support for generating differentiated materials, student-facing tutoring systems that may mitigate socio-economic disparities, and in-class assistants that scaffold reasoning without displacing human pedagogical judgment.
From this perspective, the objective is not AI adoption per se, but competence-oriented integration, ensuring that students develop AI literacy, critical analysis, and robust domain skills, while teachers receive effective, research-grounded support to manage rising workload and heterogeneity without compromising didactical quality.
While public discussions about AI in education often remain abstract, empirical research highlights several concrete structural challenges. One emerging issue is the need to handle the massive increase of (AI-generated) information, which requires students to learn verification and critical evaluation of information. A second, persistent challenge is rising heterogeneity within classrooms [Siepmann et al. (2023). Attention to diversity in German CLIL classrooms: multi-perspective research on students’ and teachers’ perceptions.International Journal of Bilingual Education and Bilingualism]. Students in schools differ substantially in prior knowledge, linguistic background, cognitive profiles, motivational orientations, and emotional needs, patterns documented widely in international research and in German data from the IQB Bildungstrend 2024. Many classrooms include both high-achieving students and learners requiring significant support, including those with reading and spelling difficulties or ADHD, whose prevalence has increased in recent years [Pearson (2025). ADHD diagnoses are growing. What’s going on?. Nature]. Educational theory has long shown that addressing such diversity requires differentiated instruction [Tomlinson (2014). The Differentiated Classroom: Responding to the Needs of All Learners. 2nd Edition, ASCD, Alexandria]. This entails providing tasks at varying levels of complexity, offering scaffolded support, giving individualized hints and stepwise explanations, and supplying feedback aligned with learners’ needs. Motivational research further demonstrates that effective instruction must integrate cognitive challenge with emotional support, self-efficacy building, and relevance cues, as cognitive and affective processes are tightly intertwined [Pietsch (2010). Evaluation von Unterrichtsstandards. Zeitschrift für Erziehungswissenschaften].
Yet teachers face what research describes as an implementation gap: the discrepancy between pedagogical requirements and what is feasible given limited time, class sizes, and workload [Jude (2025). Deutsches Schulbarometer Lehrkräfte 2025]. Although differentiated instruction is theoretically well understood, its practical implementation is difficult. Creating multiple versions of tasks, adjusting scaffolds, and providing targeted feedback for diverse learner profiles is time-intensive, and most available materials still assume an ‘average learner’, a construct increasingly disconnected from classroom reality.
Other countries are already responding to these pressures in structured and systematic ways: In the United States, Khan Academy’s Khanmigo uses large language models (LLMs) to provide individualized guidance, task variation, and adaptive hints. In China, Squirrel AI diagnostic engines that map knowledge gaps and generate highly personalized learning paths for learners. Singapore integrates adaptive learning systems directly into its national Student Learning Space under the EdTech Masterplan 2030, enabling teachers to deliver levelled tasks and automated feedback aligned with curriculum structures.
Our approach: The FACET Framework
Against the backdrop of rising heterogeneity, motivational disparities, and persistent teacher shortages, our work on the FACET aims to contribute an evidence-based component to the debate about AI integration in German schools. FACET is a research framework designed to systematically examine how AI can support the described need for differentiation under real classroom constraints. Its overarching goal is to help teachers create differentiated teaching materials for diverse learner groups and, at the same time, to generate empirical insights into when AI meaningfully supports teaching and learning, where it falls short, and how it must be designed so that it strengthens rather than undermines skill acquisition.
The FACET is implemented as a multi-agent system with four interconnected layers:
-
Learner agents simulate student behavior based on profiles that teachers themselves can define and instantiate—reflecting varying prior knowledge, low motivation, reading and writing difficulties, ADHD-related challenges, or other characteristics observed in their classes. These agents attempt tasks, reveal misconceptions, and produce reasoning traces and emotional cues. As tasks, teachers can upload their own materials or rely on tasks predefined by curriculum.
-
The assessment agent analyzes how these simulated learners interact with instructional materials—whether uploaded by the teacher or prescribed by the curriculum. It evaluates both their reasoning processes and their affective responses to provide the basis for adapting the materials.
-
The generator agent creates differentiated teaching materials based on these diagnostics. This includes levelled tasks aligned with the curriculum’s ‘areas of competence’, scaffolded steps, hints, and motivational feedback tailored to each simulated learner profile. This layer integrates curriculum structures as well as diagnostic and didactical concepts, allowing the system to identify where learners with different profiles are likely to experience cognitive or motivational difficulties.
-
The evaluator agent reviews the generated output along dimensions such as didactical coherence, clarity, creativity, and suitability for the specified learners. Teachers can then inspect, adjust, or reject the materials as they see fit. They can also download the finalized materials as Word, PDF, or LaTeX documents.
The FACET architecture is not intended to replace teachers. Instead, it provides structured starting points for differentiation, aiming to reduce workload while preserving pedagogical control.
Figure 1: FACET's landing page and a screenshot of the worksheet generator
FACET Meets Reality: Insights From Practice
During the ‘KI-Fachtag der Schulen’, parts of our team — Konstantin Fackeldey, Jana Gonnermann-Müller, and Nicolas Leins — conducted two workshops with around 30 school principals to test the FACET under real-world conditions. The discussions provided a clear picture of the pressures schools face and offered feedback that will directly inform the next stages of FACET’s development.
Principals consistently emphasized the urgency of supporting differentiation in increasingly heterogeneous classrooms. Many reported rising numbers of students with reading and spelling difficulties, varying language proficiencies, and large performance gaps within the same class. Teachers report that one way they try to address differences in learning pace is by allowing faster-learning students to move on to new topics while slower-learning students remain with the current one. However, this forced form of differentiation, driven by the lack of time to create differentiated materials for a shared topic, makes working as a unified class group difficult. As a result, the class ends up working on different topics, or with some students become bored, while others still need significantly more time to complete their tasks.
In addition, inclusion schools in particular expressed strong interest, noting that the current staffing conditions make meaningful differentiation nearly impossible. As one principal described: ‘We have so many different children in our schools … we are labeled an inclusion school, yet we only have one teacher for an entire class. We don’t know how we’re supposed to meet all children’s needs.’ The possibility of generating differentiated materials tailored to specific learner profiles resonated strongly. Principals already aknowledged the quality of FACET’s outputs — ‘much more thoughtfully constructed than what we can produce ourselves under time pressure’— while also highlighting important requirements for classroom use. At the same time, they stressed that differentiated materials must be aligned with the curriculum and that teachers need to integrate FACET’s outputs into the broader workflow of lesson planning and classroom management. Importantly, this real-world feedback is crucial for ensuring that FACET evolves in line with the actual needs of teachers. It underscored that any AI-supported tool must be tightly coordinated with curricular structures and flexible enough to fit into existing teaching practices. Principals also contributed new use cases, such as using FACET as an AI-in-class assistant that scaffold material for diverse students, which we had not previously considered. In sum, these insights are invaluable. They allow us to refine FACET not as an abstract technological experiment but as a research framework developed with schools and oriented toward the real demands of everyday teaching. Many principals expressed interest in long-term testing, and we look forward to continuing this collaborative process as FACET evolves.
Figure 2: Sebastian Pokutta delivering a keynote at the KI Fachtag on AI in Schools (more information).
The Weizenbaum Debate on AI in Schools
Just days earlier, on November 18, our team took part in the 4th Weizenbaum Debate, a packed and lively evening at the Quatsch Comedy Club that brought together researchers, teachers, and students to explore what AI in the school of the future should look like. We were invited to join the debate, sharing insights from research on the FACET and discussing how AI can be meaningfully integrated into everyday teaching. Again, we discussed pressing questions, such as when AI genuinely supports learning and when it slips into mere ‘cognitive offloading’, how generative AI must be designed so that it strengthens understanding rather than undermining it, and what competencies students need to use AI in a self-determined, responsible way. Teachers and students challenged long-standing assumptions about exams, resources, and the role of human educators, grounding the debate in lived reality.
Figure 3: Jana Gonnermann-Müller on stage at the Weizenbaum Debate on AI in Schools (more information).
The Bigger Picture - Why All of This Matters
The underlying core challenge is structural: without scalable support for differentiation, rising learner heterogeneity will continue to outstrip schools’ capacity and deepen educational inequality. Recent assessments already show widening gaps in competencies, motivation, and socio-economic background—pressures intensified by persistent teacher shortages, where education with AI can offer targeted relief. At the same time, students must learn to navigate environments saturated with information, misinformation, and rapidly changing knowledge, underscoring the need for education about AI. Frameworks like FACET cannot solve these systemic issues, but they can offer research solutions that ease critical bottlenecks and free teachers to focus on what cannot be automated: cultivating critical thinking, guiding inquiry, and preparing students to participate responsibly in an AI-shaped society.
Interested in our FACET? Read our paper on FACET at arxiv.org/abs/2508.11401 or reach out to us anytime. This research by Konstantin Fackeldey, Jana Gonnermann-Müller, Jennifer Haase, Nicolas Leins, and Sebastian Pokutta is part of our ongoing work of the Humans and AI research thrust, which is part of the IOL Lab at the Zuse Institute Berlin.
Comments