Synthetic data generator for student data serving learning analytics
A comparative study
DOI:
https://doi.org/10.59453/KHZW9006Keywords:
learning analytics, privacy and ethics, synthetic data generationAbstract
Ongoing digital transformation in the education sector has led to an increased focus on learning analytics (LA). LA collects and uses students’ data to gain insights about students’ learning and to guide interventions and feedback. Although LA holds tremendous promise for enhancing teaching and learning, there are persistent concerns about the privacy and ethical ramifications of collecting and using student data. One potential solution is the use of Synthetic Data Generators (SDGs) which can learn from real data to generate synthetic data that closely resembles real data. This paper examines the performance of existing SDGs with student data, as well as their capabilities for serving LA. A comparative study was conducted by applying different SDGs in Synthetic Data Vault to real-world student data. We report the efficiencies of different generators and the statistical similarities between synthetic and real data. We test how well SDGs imitate the real student data by fitting generated synthetic data into commonly-used LA models. We evaluate the utility of synthetic data by the alignment of LA outputs trained using synthetic data to the ground truth of student learning outcomes recorded in real data, as well as with outputs of LA models trained by real data.