A blog post titled “Don’t use Hadoop – your data isn’t that big” has been making its rounds, and while the author makes some very valid arguments about data size, it neglects to address other critical aspects of data that would necessitate using Hadoop over alternative solutions. Don’t get me wrong, Hadoop’s use of low-cost, commodity hardware to linearly scale storage and compute resources is changing the way that organizations deal with large volumes of data. But, I don’t believe that data volume is where the buck stops when evaluating if you should use Hadoop, nor is it the key of what Hadoop brings to the table.
Rather, Hadoop’s ability to access, store, integrate and analyze all different types (and volumes) of previously siloed data, in its raw format and on-the-fly, is what’s really changing the face of analytics. That means that in addition to data volume, users also need to ask themselves what kinds and how many different data sources they need to work with, how often new data sources are coming in, who needs to be able to work with the data, and how fast does the whole process needs to be.
These capabilities that Hadoop brings to the table means that users can move beyond the limitations of traditional BI, which requires that all data be structured and pre-modeled by IT before a business user could actually ask any questions of the data. And with self-service tools like Datameer on top of Hadoop, any end user can integrate, analyze and visualize any data at will, without anyone needing to write a line of code.
Its these capabilities that are enabling businesses and people to ask new and broader questions of their data with the ability to add or subtract datasets as needed, without regard for any pre-built schemas. It dramatically speeds up the ability for anyone to get to the answers they need as conditions (and data) changes. In the end, I believe its this “agility” in analytics that is most compelling about Hadoop and Big Data Analytics.