Synthetic data generator for student data serving learning analytics: A comparative study

Chen Zhan; Oscar Blessed Deho; Xuwei Zhang; Srecko Joksimovic; Maarten de Laat

doi:10.59453/KHZW9006

Authors

Chen Zhan Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0003-4680-1287
Oscar Blessed Deho UniSA STEM, University of South Australia, Australia
Xuwei Zhang Centre for Change and Complexity in Learning, University of South Australia, Australia
Srecko Joksimovic Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0001-6999-3547
Maarten de Laat Centre for Change and Complexity in Learning, University of South Australia, Australia https://orcid.org/0000-0003-2243-2667

DOI:

https://doi.org/10.59453/KHZW9006

Keywords:

learning analytics, privacy and ethics, synthetic data generation

Abstract

Ongoing digital transformation in the education sector has led to an increased focus on learning analytics (LA). LA collects and uses students’ data to gain insights about students’ learning and to guide interventions and feedback. Although LA holds tremendous promise for enhancing teaching and learning, there are persistent concerns about the privacy and ethical ramifications of collecting and using student data. One potential solution is the use of Synthetic Data Generators (SDGs) which can learn from real data to generate synthetic data that closely resembles real data. This paper examines the performance of existing SDGs with student data, as well as their capabilities for serving LA. A comparative study was conducted by applying different SDGs in Synthetic Data Vault to real-world student data. We report the efficiencies of different generators and the statistical similarities between synthetic and real data. We test how well SDGs imitate the real student data by fitting generated synthetic data into commonly-used LA models. We evaluate the utility of synthetic data by the alignment of LA outputs trained using synthetic data to the ground truth of student learning outcomes recorded in real data, as well as with outputs of LA models trained by real data.

Downloads

Download data is not yet available.

Synthetic data generator for student data serving learning analytics

A comparative study

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information