Session Information

This page shows the session details and the presentations assigned to this session.

Measuring the Quality of AI-generated Feedback? From Theoretical Modelling to Empirical Evidence

Abstract

AI-generated feedback is widely used in schools without sufficient research having been conducted into its quality, particularly with regard to German students. This study therefore examines the quality of AI-generated feedback on German student texts, as well as how this quality is measured, from both theoretical and empirical perspectives. First, a theoretical model is developed based on international research (e.g. Fong, 2025; Jansen et al., 2025; Weidlich et al., 2025) which includes different producers and products. This model establishes the terminology used throughout the paper and illustrates that operationalising feedback quality poses a methodological challenge for empirical studies. Subsequently, a study compared feedback on three student texts in the form of a criteria-based assessment, an overall grade, and a short comment. This feedback was provided by 75 highly experienced Bavarian teachers and four AI systems. Finally, eight trained meta-reviewers assessed the quality of the human and machine feedback. In terms of overall grades, there was high inter-rater reliability (ICC = 0.7–0.9) between teachers and AI systems (with ten iterations). On average, AI models graded texts more leniently, but in the same order of ranking. The criterion-based assessment differed significantly. Regarding meta-feedback, an ordinal logistic model identified three criteria (explanation, concreteness and accuracy) as the strongest predictors of perceived usefulness, with the source (AI vs. teacher) having no significant influence. The results of the empirical study expand the area of research on real German pupils. The theoretical model helps to better systematise future studies and demonstrates the complexity of operationalising the central phenomenon of interest: the quality of AI-generated feedback. The many challenges involved in operationalising feedback quality are relevant for future studies. Fong, C. J. (2025). A renaissance in feedback science? Reviewing and reimagining feedback research methods. Contemporary Educational Psychology, 83, 102414.Jansen, T., Horbach, A., & Meyer, J. (2025). Feedback from Generative AI: Correlates of Student Engagement in Text Revision from 655 Classes from Primary and Secondary School Proceedings of the 15th LAK.Weidlich, J., et al. (2025). Teacher, peer, or AI? Comparing effects of feedback sources in higher education. Computers and Education Open, 9, 100300.

Promoting digital text production competences in primary education

Abstract

The digital production of texts is considered a key competence in today's information and communication society (Frederking & Krommer, 2019). Familiarity with the writing medium is of great importance here, as it systematically influences text quality: fast typists produce better texts (Connelly et al., 2007; Gong et al., 2022). Initial pilot studies show that, in addition to keyboard typing, digital text production skills (e.g. simple word processing functions, navigation) are fundamental prerequisites to produce digital texts (Anskeit, 2022). Nevertheless, there is still a lack of comprehensive studies on the development of digital writing skills, especially in German-speaking countries and for primary school pupils (Gahshan & Weintraub, 2024; Schneider & Anskeit, 2017; Schüler et al., 2023). Addressing this gap, the project aims to develop instructional measures for digital writing and examine their effects on third-grade students’ text production.Building on a diagnostic laboratory study (n=16) using keystroke logging, the intervention study (n=121) investigates the effectiveness of a specially developed interactive learning pathway for promoting digital text production competences (keyboarding and word-processing functions) and compares it with a touch-typing course (focus on keyboarding). To evaluate both support measures, the typing behaviour (including speed and skills in simple word processing functions) of the learners will be assessed in a pre-post-test design using a procedure developed in the diagnostic study. In addition, effects on text quality (Lindauer, 2024) and text revision (Held, 2006) are analysed based on students’ independently written texts responding to a profiled writing task (Bachmann & Becker-Mrotzek, 2010).Initial results show that learners benefit from even short training sessions in terms of typing behaviour (see also Grabowski et al. 2007, Anskeit, 2022) and that the promotion of digital text production skills enables learners to utilise word processing functions. The extent to which this influences text quality and text revisions in the production of their own texts is determined using variance analyses (ANOVA with repeated measures) including covariates as reading comprehension and previous digital experience. The presentation will outline key findings from the diagnostic study, provide insights into the support material, and discuss the results of the intervention study.

Teaching narrative writing in grade 2: first findings from FiSBY

Abstract

Meta-analyses indicate that young writers benefit when strategies are taught explicitly, modelled, practised with scaffolding, and linked to transparent quality criteria (Graham & Harris, 2017; Graham, Harris, & Santangelo, 2015). However, translating these findings into everyday classroom routines remains challenging (Darling-Hammond, Hyler, & Gardner, 2017; Wild, in press).This contribution reports early findings from FiSBY-2-narrative, a narrative strategy module embedded in the multi-genre writing strategy project FiSBY (www.fisby.de). In FiSBY over 2 400 elementary students take part in a longitudinal survey from grade 2 to 4. The FiSBY-2-narrative module operationalizes narrative strategies and is compared with business-as-usual writing instruction.The present study analyses a random subsample in grade 2 (n = 87; 173 texts). Children were on average 8.36 years old (SD = 0.48). About 82% reported German as their first language. The business-as-usual group included slightly more boys than the training group (33% vs. 18%). For writing assessment, we used a standardized story-starter at the beginning and end of the school year. The narratives were rated with RANT (Wild, 2020) for genre-specific elements (event representation, character description, situational description) and more general stylistic features (vocabulary and figurative language).Analyses were conducted in R (R Core Team, 2025) using linear mixed-effects models appropriate for longitudinal intervention studies (Hilbert et al., 2019). Models included time (pre/post), group (training vs. business-as-usual), and their interaction, controlling for gender, German language background, and socioeconomic status (questionnaire-based). Random intercepts accounted for repeated measures within students.Results show a selective intervention effect: the training group demonstrated significantly stronger gains in character description (time × group: β = .55, p = .026). In this small subsample, no reliable differential change emerged for event (p= .232), situational description (p = .123), or figurative language (p = .338). Vocabulary increased from pre to post across both groups (β = .31, p = .033). Socioeconomic status was positively associated with event (β = .26, p = .002). In sum, FiSBY-2-narrative appears to accelerate a specific, teachable narrative dimension in Grade 2. For the conference presentation, these patterns will be re-analysed in the large FiSBY cohort to obtain more robust estimates.