By Julià Minguillón, Associate Professor, UOC-eLC.
We are living in the information age. Or maybe in the knowledge one, according to the DIKW pyramid, which stands for data, information, knowledge and wisdom. But there is neither information nor knowledge without data, which is nowadays massively and continuously generated.
As inhabitants of a digital world, all our actions are under a constant surveillance, and we leave a myriad of traces of even the most mundane everyday experiences. We voluntarily use our smartphones, credit cards or ID badges, leaving such digital traces, but we are also involuntarily recorded in traffic cameras or when browsing the Internet, for instance.
All these actions define ourselves, as we are, in most cases, blatantly predictable, especially when analyzing not a single person but a community. The great Isaac Asimov already foresaw it when created Psychohistory, a science for making general predictions about the future behavior of very large groups of people. What we nowadays call Big Data is exactly about that.
« Building accurate models which reasonably predict future data is not that easy, especially when focusing on single individuals »
The idea behind the Big Data paradigm is very simple: capture “everything” and analyze it, building very accurate classifiers beyond sampling limitations. Create a big (huge!) cube according to the three Vs (volume, variety and velocity) containing all available data and don’t care anymore about curse of dimensionality or hypothesis testing. If we have all available data we don’t need to estimate anything, we are just measuring reality! Nevertheless, building accurate models which reasonably predict future data is not that easy, especially when focusing on single individuals.
Let’s move now to a more specific scenario, such as online higher education. Nowadays, most higher education institutions offer blended or fully online courses, using web-based platforms where learners can interact with other learners, teachers and resources. These virtual learning environments (VLE) capture “everything” that happens within their walls: navigation, interaction, academic performance, etc.
Of course, learners have a digital life outside the VLE, but let’s suppose that we are able to obtain all the data we need for analyzing our learners’ evolvement. This includes enrollment data, academic background, surveys and polls, etc. We can describe any learner by a combination of hundreds of variables, according to her profile and actions within the VLE. All these data can be used to better understand how learners learn, provide them with better support through the adequate scaffolding and, at the same time, evaluate the VLE, detecting possible bottlenecks.
« Can a learner be dissected and then understood by putting all pieces of data back together? »
But, can a learner be dissected and then understood by putting all pieces back together? Unlike pine leaves, learners have very diverse backgrounds, motivations, expectations and behaviors. And these evolve with time, so gathered data becomes rapidly obsolete. In this sense, navigational data older than one academic semester may be irrelevant for most purposes.
One of the main challenges for learning analytics, then, is to have a well-defined window of historical data that can be used for a specific purpose. Furthermore, such purpose can be different according to the level of analysis, so these windows need to be very flexible and multi-dimensional.
For instance, it is not the same providing a learner with hints for solving problems in an intelligent tutoring system that helping her to search for appropriate resources in the digital library or the institutional repository. In the first case, the window will probably only cover the current online session, while in the second one, a longer period (i.e. an academic semester) might be considered for recommendation purposes. This window defines the context — with respect to data — where the appropriate algorithms can be used for learning analytics. In other words, the context is the subcube sliced from the big data cube as defined previously.
Further reading: Mor, E., et al (2007). A Three-Level Approach for Analyzing User Behavior in Ongoing Relationships. Lecture Notes in Computer Science. ISBN: 978-3-540-73109-2 Springer-Verlag Editions Human-Computer Interaction. HCI Applications and Services. 12th International Conference, HCI International 2007, Beijing, China, July 22-27, Proceedings, Part IV.