At the recent Gartner BI conference, I was struck by the focus on “big data”. It wasn’t the level of attention that was surprising, big data is everywhere. It was how deeply it has now penetrated into the traditional BI space. Everyone was talking about it in every vendor’s booth (large and small) and in most of the conference sessions, whether by a Gartner analyst or in a paid vendor session.
Why all of the attention? It’s probably not what you think. This isn’t a case of big data analytics replacing traditional BI; BI is here to stay because of how well it works for reporting on transaction data. But the BI vendors realize there’s a huge market opportunity when it comes to big data analytics as a complimentary technology. Big data analytics is what allows for insights across all of the data including BOTH transaction and interaction data, which is a combination of traditional structured data along with semi-structured and unstructured data.
Big Data, Big Use Cases
While “big data” is not a great term to describe what is going on in analytics today, Gartner’s original framing of the issues around the volume, variety, and velocity of data is a good way understand the new big data use cases.
For volume, it’s pretty simple. If you can analyze more data, then your insights will be better. The ability to analyze more data has wide-ranging applications in financial services, retail, telecommunications, medicine, pharma and other areas. For example, in financial services, if you can analyze a longer time span of transactions, say over 5 years, you can detect fraud patterns that simply are not apparent when looking at 3 months of data. And while some traditional BI solutions can handle large data sets, the cost of hardware to do that has become prohibitive.
But big data gets way more compelling when you get to data variety and velocity. Data variety is driven by new devices and user behaviors. Between smart phones, social media, websites, games and device-to-device communications, we have transitioned from a transaction society to an interaction society. Inherently, the data that results from interactions is semi-structured or unstructured, and the fact of the matter is that traditional BI solutions simply cannot analyze this data. By analyzing interaction data by itself or with correlations to transaction data, we can now ask and answer questions that were simply not possible before, allowing for new insights into customer behavior, business operations, and company performance.
And finally, the velocity of data, both throughput and new sources, continues to skyrocket. The amounts of new data each minute, hour and day overwhelms traditional BI as does the need to quickly connect, integrate and analyze new data sources with what data you already have. Traditional BI, constrained by the need to pre-model the data due to limitations of storage and compute, simply can’t respond quickly enough to new data sources, meaning that those insights lag behind today’s pace of business.
Underneath the Big Data Hood
All of the traditional BI vendors have jumped on the big data wagon by hitching their cart to Hadoop. Hadoop is an open source storage and compute engine that is an infrastructure enabler here as it provides linear scalability using commodity hardware. And Hadoop processes all types of structured, semi-structured and unstructured data. BI vendors generally offer connectivity to Hadoop via Hive, the open source data warehouse that is a part of Hadoop. And while this gives traditional BI users access to data in Hadoop via an SQL-like (structured query language) interface, it ignores the core advantages of Hadoop and big data analytics. Hive only deals with structured data (leaving out all of the other semi-structured and unstructured data in Hadoop) and requires technical skills in writing SQL to get at the data. What you end up with is limited analysis of a small subset of your data.
So lets be clear, there are major differences between a native big data analytics solution on Hadoop and connecting traditional BI solutions to Hadoop data via Hive. Using Hive is like looking at your business through a soda straw versus looking through a pair of wide-angle binoculars with big data analytics. The point of big data analytics and Hadoop is to turn the user loose on all of their data without the limitations of how much data, its diversity, or its constantly changing sources.
Datameer uses Hadoop natively as its back-end storage and compute engine for end-user focused, big data analytics. Datameer and Hadoop don’t care whether the data is structured, semi-structured or unstructured. Datameer use of Hadoop’s scalability and data flexibility means you don’t have to pre-model the data, so business users can quickly analyze anything and everything. The end game is dramatically faster “time to insight” that fits the style and pace of business need in the 21st century.