Since we deal with predictive models, one common question we get is “how accurate is your model”? First of all, I need to say that we LOVE to get that question. It signifies that the institution has some level of buy in — that they aren’t just taking our expertise on faith. Unlike the quality of brake pads, we don’t want you to take our word for it.
When responding to this question, though, there’s a bit of a conundrum. On one hand, I want to give an answer. I hate evasive responses to questions, so I want to make sure I give a clear, concise response. On the other hand, there is a lot of nuance in the response. It depends on what the model is trying to predict, and more importantly, the word “accurate” needs a bit of unpacking.
So, to start off, let me answer the question: if I’m asked how accurate our models are, I say “depending on the question, our predictions are about 75%-80% accurate“. Now, let’s dive into that a bit.
First we need to have a quick lesson on confusion matrices. Those are the simple 2×2 tables that show how many times you predicted yes vs. no and how many times the actual outcome was yes vs. no. Many others have done a good job at clarifying these matrices and all of their associated terms…I found this one to be pretty good. I’ll give you a minute to read through it…OK…almost done?…don’t forget those “other terms” at the bottom, OK? Great.
Let’s focus on a few of the rates that come out of the confusion matrix. To help our conversation, let’s assume we’re talking about a model where we are predicting if a student will pass a class. The first is accuracy. This is a good overall metric for the model. As an example, it says that if you had a sample of 1,000 students and you predicted that 800 would pass and 200 would not pass, what percent of your 1,000 predictions ended up being correct. A second metric is precision. This says, for the 800 you predicted would pass, what percent of them actually passed. A third is recall which looks at the number of students who actually passed, and measures what percent of those you predicted correctly. To make things more interesting, note that there’s a precision and recall for predicting both “yes, the student will pass” and “no, the student won’t pass”.
Now there’s a reason they call them confusion matrices. They can get confusing. In some ways, precision and recall sound like they are measuring the same thing. What these measures are trying to do, though, is to make sure that the model is truly doing something valuable, and not just looking like it works because of the data. Here’s an example. I live in Phoenix, Arizona and we get a lot of sun. It’s mid-August right now and the high today is supposed to be around 116 F. So let’s say I wanted to create a model that predicted if it was going to be sunny in Phoenix. My model could be “always answer yes”, and that’d be a pretty accurate model. Both my accuracy and my precision would be about 80% (there are about 300 days of sunshine in Phoenix per year). However, my recall for non-sunny days would be 0% (of the about 65 non-sunny days, I guessed right zero percent of the time because my model always predicts “sunny”). This reinforces the first point I made that there’s a little nuance to the question and sometimes, a single simple answer doesn’t tell the whole story.
A few parting points:
- We often use the F1 score as a way to measure the efficacy of our models. It’s a weighted average of precision and recall, and we find that to be one of the better single metrics to assess a model
- In statistics, the r-squared coefficient of determination is used as a valuable single metric. This indicates how much of the variance in the factors is accounted for in the model. However, most of the modeling we do uses machine learning techniques instead of statistical techniques (like logistic regression), so the model doesn’t have n r-squared.
- By the way, if you’re not familiar with machine learning, check out this wonderfully effective visual explanation of it.
The main takeaway form this post is as follows: if you are working with predictive models, it’s always worthwhile to dig a little bit into the “accuracy” of the model. This little bit of digging can help you determine if you are asking the right question (in the right way), it might uncover the fact that whomever is doing the prediction might not have as strong a grasp on the model as you would like, and it will make you more confident that what you are doing is something that will actually benefit the students and the institution.