Building a Hadoop data pipeline – Where to start?
In order to convert data into business value, the data have to be at the forefront of software projects. And you can't limit the data you're using to just the straightforward stuff in RDBMS tables. Valuable data come in structured form (RDBMS tables), but they also come in unstructured (text comments from reviews, logs), and semi-structured (XML) forms. The ability to process and harness all forms of data is crucial for turning them into business value. To have lasting value, all of this must be done in a systematic manner that can be extended, tested, and maintained. Having a data pipeline to crunch the data and distribute results to the business is vital. What is a Data Pipeline? In the general sense, a data pipeline is the process of structuring,…