July 2014 – Data and Graphs

The Hadoop ecosystem today is very rich and growing. A technology that I use and enjoy quite a bit in that ecosystem is Hive. From the Hive wiki, Hive is "designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data”. To add to that statement, Hive is also an abstraction built on top of Map Reduce that lets you express data processing using a SQL-like syntax described in detail here. Hive reduces the need to deeply understand the Map Reduce paradigm and allows developers and analysts to apply existing knowledge of SQL to big data processing. It also makes expressing Map Reduce jobs more declarative. One thing I do hear a lot from folks is that Hive, being schema driven and having typed columns, is…

Month: July 2014

Hive For Un-Structured Data