Here at Blue Canary, we’ve made a reference to being a ‘Data Janitor’ on more than one occasion. There was this article/tweet from back in March:
We’ve always called ourselves data janitors…great to see it highlighted. Thanks for the find, @johncwhitmer !
— Mike Sharkey (@mjshark) March 25, 2015
…and also a mention on my last blog post about ‘Free like kittens vs. free like beer‘. So, in the spirit of our somewhat self-deprecating janitorial duties, I figured I dedicate a whole blog post to it. Here goes.
The reference started out as a joke with a few of us in the company. We would be working on a predictive model or some complex data analysis problem, and then read an article about how data scientists are the hot, up-and-coming, sexy, and in-demand positions in corporate America. We read the article and then look down at the query we are writing. The query excludes all records from a database where there’s a “~” in the course number instead of a “-“, and we said, “Data scientists?!? Ha! More like Data Janitors”.
Turns out the reference is relevant in more ways than one. Sure…the “we spend most of our time cleaning up messes” reference is a good one. Let us never forget how much time needs to be devoted to data cleansing. However, there’s another vital link between data analysis and plumbing, and it has to do with infrastructure.
When we talk about learning analytics, we break things up into three steps:
- Acquire/aggregate the data
- Convert raw data into valuable information
- Put the information in front of someone who can act on it
You’ll notice that for the end user (the institution/faculty/student), two of those three things are invisible. The end user sees #3, but they don’t see the behind-the-scenes work from #1 and #2. This is where the plumbing analogy comes into play.
In your dwelling, you’ve probably got plumbing. It’s a series of
tubes pipes that are largely invisible to you. What do you really care about in your home? I’m guessing it’s things like sinks, toilets, bathtubs, water heaters, ice makers, and the like. The plumbing is a traditional utility — it’s something that if doing its job properly, you never have to deal with it. You only care about it when it’s not working.
Now this is a great situation for homeowners…a reliable utility that works in the background. But why does it work that way? Because the plumbing was already in the house. With very rare exceptions, residents don’t deal with plumbing…they only deal with the accessories (sinks, water heaters…). That’s the beauty of modern architecture and infrastructure.
Let’s move this analogy back over to the data side. The plumbing is the vast network of databases, warehouses, ETL jobs, and BI tools that move/store data throughout an institution. The accessories are the reports, dashboards, and apps that help faculty/students/administrators act on information. The problem we see is that when it comes to data infrastructures, most institutions aren’t lucky enough to have “great plumbing”.
Most institutions have one of these two scenarios. They might have an entirely complex (but weirdly enough, necessarily complex) set of pipes that seem to work for the many users already in place. Unfortunately, though, there’s no convenient hookup for the institutional analyst to connect their shiny new ice maker to. Alternatively, when the analyst asks for some water, the data folks may just point to the well in the back yard. Oh…and don’t forget to bring your own bucket! Of course, there are many institutions that have a wonderful set of pipes. There’s a reason that companies like Oracle, SAP, and IBM are in business. However, we tend to see that more as the exception, not the rule.
So what’s the takeaway here? The takeaway is the title of this blog post: Do you want the plumbing or the sink? Institutions need to understand what their needs are. If they have some skilled data janitors on hand, they may just need to build out a reliable plumbing infrastructure so that they can connect whatever accessories they want. Alternatively, if the institution is mainly focused on the output (the reports…the dashboards), then spend time designing those tools, but realize that they may have to get their hands dirty with the plumbing in order to make it all work properly. Pipe wrenches are in aisle five.