Don’t Hype Analytics – Data and Graphs

26May 2015 by Mike Sharkey No Comments

What if you could know someone’s GPA just by simply looking at their phone?

That’s the first line spoken in this Dartmouth College video about their StudentLife study. The problem with that line is that it starts to look like hype. Like marketing. Like a Buzzfeed clickbait article (apologies to Buzzfeed…it appears they are trying to go legit…I’ll say it looks more like a Clickhole article). I wanted to write this post to do two things:

Commend the Dartmouth team on the work they’ve been doing
Warn them not to add to the mountains of hype surrounding analytics and predictive modeling in ed tech

The description of the study is wonderful. They say:

StudentLife is the first study that uses passive and automatic sensing data from the phones of a class of 48 Dartmouth students over a 10 week term to assess their mental health (e.g., depression, loneliness, stress), academic performance (grades across all their classes, term GPA and cumulative GPA) and behavioral trends (e.g., how stress, sleep, visits to the gym, etc. change in response to college workload — i.e., assignments, midterms, finals — as the term progresses).

That’s very different from the “know someone’s GPA” statement, though. It’s easy to fall into the trap of distilling the work down to a sexy-sounding soundbite. However, it’s wrong and it’s misleading. I love to hear about the cutting edge research that’s going on in analytics. I try to keep abreast of research and even contribute by taking part in data challenges. What I really dislike, though, is the purposeful hyping of analytics that only ends up adding to the misalignment of expectations in the industry.

To the authors of this study: I hope you take my comments in a constructive fashion. Ideally, I hope one or more of you come work for Blue Canary someday. Given that, here are a few elements of this study that illustrate why this isn’t as ‘hypeful’ as one might think:

n=30
- We all know how difficult it is to execute thorough and meaningful studies. According to their research paper, the study used data from 30 undergraduate students. That’s a great start, but it’s difficult to hang your hat on those numbers
Free data vs. solicited data
- The implication in the video is that phone sensor data gives one all that is needed to accurately predict GPA. In the paper, the strongest predictor of GPA (by a factor of 4) comes from data that are manually entered and self-reported by students on a daily basis (Table 5). If I could get students to accurately, consistently, and frequently tell me their level of stress or their outlook on life, I’d be able to predict many things. Most of the time, there is a direct relationship between how hard it is to get data and how valuable those data are
Predicting GPA at Dartmouth shouldn’t be too difficult
- If the goal here is to predict GPA (sidebar: we can have an hours-long discussion on this part alone), then it’s really not that tough of a task. The paper does a good job of clarifying spread of GPA’s for the 30 students and it’s not broad. Most students cluster around 3.5. Given the researcher’s statement that the prediction was accurate to within +/- .17 points, if I guessed “GPA = 3.5”, I’d be right about 63% of the time without any research or smartphone app (19 out of 30 students in this study). I live in Phoenix…if I say “it will be sunny today”, I’ll be right more often than not and I don’t need to be a meteorologist to do that.
Scalability of GPS
- The researchers did a cool job of clustering and labeling GPS data to infer activity (e.g. spending 1 hour in the Library area or spending 4 hours in a Party Zone). The problem is this is difficult to scale. For this kind of research to be extensible, one needs to think of efficiently scaling for others. There are many challenges associated with categorizing GPS data for different campuses, urban campuses, commuter students and no campuses (online)

I would love to hear more about where the researchers are taking this study. That’s an open invite to reach out and talk if you’d like…I always enjoy talking shop with others in the field. Let’s just make sure that we all scratch beneath the surface to understand first-hand what’s driving the findings.

P.S. Thanks to Timothy Harfield for his Twitter post on the study

Leave a Reply Cancel reply