In this essay, I advance three claims:
This paper, based on a chapter of my undergraduate thesis, was accepted and published at Learning at Scale 2015. I was lucky enough to get to travel to Vancouver and present my findings at the conference.
The paper addresses two research questions:
Does releasing all content at the beginning of the course (rather than sequentially) lead to more variation in student progress and less “ontrackness” (as measured by the courses recommended schedule)? Short answer: Yes, but in both content release strategies the vast majority of students proceed at individualized and off-track paces.
Are there benefits to staying on-track? Short answer: Staying on-track has a modest positive correlation with certification, but such modest benefits to staying on-track must be weighed against the benefits of allowing students flexibility in how they move through the course. Releasing content upfront and all-at-once appears to be a viable strategy for MOOC designers.
I owe a huge thank you to Justin Reich, without whom this project would never have gotten off the ground.
During my Junior year at Harvard, I started working with the HarvardX and MITx research teams to clean and analyze data generated by students taking courses on edX, an open-source platform for massive open online courses (MOOCs).
I contributed to the HarvardX-Tools repo with a set of Python scripts to extract, parse, and sanitize data from edX clickstream tracking logs. I was also a co-author on the first 15 HarvardX and MITx Working Papers, which report general statistics and early findings from Harvard and MIT’s first wave of MOOCs on the edX platform (Fall 2012 to Summer 2013).
All in all, I got very comfortable with Python, numpy, pandas, matplotlib, and scikit-learn. I had tons of fun and learned a lot from the experience, and I can’t thank the HarvardX and MITx research groups enough for welcoming me into the group.
My 118-page undergraduate thesis argues that massive open online courses (MOOCs) cannot be properly understood using conventional educational metrics and definitions, and that students, instructors, university leaders, and policymakers must be wary in viewing this new technology through the lenses of the past.
The thesis contextualizes MOOCs within a rich history of distance learning and open courseware, and proposes new “reconceptualizations” of retention and asynchronicity in MOOCs that can help us evaluate their efficacy and worth. As an empirical study, this paper draws on unique datasets derived from edX clicksteam event logs for six early HarvardX courses.
Building and testing these datasets was the most labor-intensive part of this study: clicksteam logs were large (~10 GB per course), non-standardized (cue lots of regex), and not well documented (at the time). Thus, I spent most of my time munging away in iPython notebooks, as well as digging through edX’s source code to verify which student actions correspond to which events.