Publication details

Data Set Size Analysis for Detecting the Urgency of Discussion Forum Posts

Authors

ŠVÁBENSKÝ Valdemar BOUCHET François TARRAZONA Francine LOPEZ II Michael BAKER Ryan S.

Year of publication 2024
Type Conference abstract
Citation
Attached files
Description In both Massive Open Online Courses (MOOCs) and private courses, instructors face a large amount of queries in discussion forum posts that may merit a response. There has been ongoing research on how to employ machine learning to predict a post’s urgency in order to focus instructors’ attention. However, it is unclear how large a course is needed to develop these models. We took a publicly available data set of 3,503 labeled forum posts and code from one such prior study. We re-trained the six models described in the study, but with progressively smaller sample sizes, to determine if the models’ performance would be preserved. Likewise, we demonstrate that using random subsets even as small as 10% of the original data set achieves comparable performance to full data sets in five out of six models.

You are running an old browser version. We recommend updating your browser to its latest version.

More info