Synthetic data generator for student data serving learning analytics

A comparative study

Authors

  • Chen Zhan Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0003-4680-1287
  • Oscar Blessed Deho UniSA STEM, University of South Australia, Australia
  • Xuwei Zhang Centre for Change and Complexity in Learning, University of South Australia, Australia
  • Srecko Joksimovic Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0001-6999-3547
  • Maarten de Laat Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0003-2243-2667

DOI:

https://doi.org/10.59453/KHZW9006

Keywords:

learning analytics, privacy and ethics, synthetic data generation

Abstract

Ongoing digital transformation in the education sector has led to an increased focus on learning analytics (LA). LA collects and uses students’ data to gain insights about students’ learning and to guide interventions and feedback. Although LA holds tremendous promise for enhancing teaching and learning, there are persistent concerns about the privacy and ethical ramifications of collecting and using student data. One potential solution is the use of Synthetic Data Generators (SDGs) which can learn from real data to generate synthetic data that closely resembles real data. This paper examines the performance of existing SDGs with student data, as well as their capabilities for serving LA. A comparative study was conducted by applying different SDGs in Synthetic Data Vault to real-world student data. We report the efficiencies of different generators and the statistical similarities between synthetic and real data. We test how well SDGs imitate the real student data by fitting generated synthetic data into commonly-used LA models. We evaluate the utility of synthetic data by the alignment of LA outputs trained using synthetic data to the ground truth of student learning outcomes recorded in real data, as well as with outputs of LA models trained by real data.

Downloads

Download data is not yet available.

Downloads

Published

13-08-2023

How to Cite

Zhan, C., Deho, O. B., Zhang, X., Joksimovic, S., & Laat, M. de. (2023). Synthetic data generator for student data serving learning analytics: A comparative study. Learning Letters, 1, 5. https://doi.org/10.59453/KHZW9006

Issue

Section

Articles