Recently, I wrote a post for ZDNet called “Why Hadoop is hard, and how to make it easier” in which I addressed the results of a Gartner survey of its customers on Hadoop adoption, and Gartner’s analysis of those results. Gartner reported that 26% of its Research Circle customers were in some phase of Hadoop adoption and that another 18% expected adoption to begin within the next two years.
Gartner’s spin on these numbers was that Hadoop adoption was languishing. My take is that the numbers are actually pretty good. But my further take was that the reason those numbers are not any higher isn’t an indictment of Hadoop or Big Data themselves, but rather of how the industry has suggested people use those technologies: in raw form, with little of the usability, productivity or manageability to which Enterprise customers are accustomed.
This is not the time to be a Big Data doubter – because it would be ludicrous to doubt the value of high-resolution observations and interpretations of what and how your business is doing – and how it’s likely to do. But it is the time to ponder how to bring Big Data into its next phase of efficiency and utility. And in order to do that, we must understand what’s driving the skepticism that’s out there, and how to address it.
Ups and downs
The hype cycle around Big Data has oscillating phases within it. Essentially, potential adopters of Big Data technology go through periods of being fascinated by its potential on the one hand and, on the other, of being frustrated by the difficulty – the amorphous unfamiliarity – of working with it. The outcome of the latter phase may be extreme: causing people to resign themselves to the idea that any Big Data investment is unsound.
How can we be so manic-depressive about an area of technology? How can euphoria give way so quickly to despair? Why are we so unable to find a middle ground that makes Big Data approachable and compatible with the way we work today with conventional data and analytics, yet still gives us at least some of the promise of Big Data that appeals to us during the euphoric phase of the cycle?
This is a complex problem to analyze, but typically complex problems can be broken down in to smaller, simpler ones. There are some fairly straightforward reasons for the despair phase; some are emotional, others quite pragmatic.
Mundane, not mystique
On the emotional side, people set themselves up for disappointment when they imbue Big Data (or any technology, for that matter) with mystique. As disillusioning at it may sound, technology needs to have some aspect of the mundane – even to the point where it’s a little boring – in order to be useful. The reason is that technology is a tool, and it needs to be something people can understand, pick up and use. The output of working with that tool should be impressive and inspiring, but the tool itself should be very approachable.
The cliché that says no one would eat sausage if they saw how it was made is relevant here; but in the reverse: we do want to see and understand how Big Data analysis is achieved. If it seems like magic, we will have trouble trusting it, and incorporating it into our plans. We’ll be relegated to the role of spectator. The technology is the means to the end. We want the means to be easily understood; we want the end to be fascinating.
User interface = user productivity
On the practical side, Big Data technology needs an interface of quality comparable to that of other business software. Just because Hadoop and other specialized technologies can do things that conventional databases and Business Intelligence (BI) can’t, doesn’t mean Hadoop/Big Data gets a pass on usability or productivity.
It’s not OK for command line shells and scripting languages to be the exclusive interface to these tools, and it’s not OK to force people to jump from component to component as they do their work.
Yes, some folks who are very technical will be accepting of such a “close to the metal” environment; they may even prefer it. But satisfying those folks alone isn’t good enough, and even they will lose confidence in a system that has low usability for their colleagues. Big Data technology has to be self-service.
As productive and usable as Big Data technology needs to be, it must also embrace the differences (in workflow, tasks and required output) involved in working with Big Data versus, say, BI. Vastly different assumptions around schema, scope of data sources, volumes of data and the real-time nature of data collection and analysis apply.
Trying to get full value out of Big Data by putting a BI tool on top of it isn’t the way to go. It may work for some scenarios, but likely only in a limited scope which will cast doubt on the value of the Big Data investment.
That’s really the point here: Big Data can absolutely provide Big ROI. But using BI tools to do Big Data work creates a self-fulfilling prophecy around low ROI instead. The camp that says Big Data is a low-efficacy investment is dead-wrong. But the camp that uses BI tools for Big Data, and, for example, connects to Hadoop via Hive, treating it like a schematized relational data source, is wrong too.
And while the two camps may appear to be in opposition (under the guise that one is a Big Data doubter and the other a Big Data supporter), they in fact perpetuate each other. The key to transcending the quagmire collaboratively created by those two camps is to use tools that provide a self-service analytics interface and yet are built to work with Big Data primarily, and natively.
Big Data Governance
There’s one more facet to this discussion: the issue of governance. Big Data technology can’t just be easy to use – it must also be easy to manage, version, audit and control, ensuring regulatory compliance and facilitating prudent risk management. Security must be robust, fine-grained and compatible with existing Enterprise security standards that may already be in place, like Active Directory/LDAP. Without such governance facilities, Big Data technology will have to be granted exceptions, and operate in isolation. But for Big Data adoption to grow, it must instead conform and integrate.
Datameer has always provided an Enterprise-quality self-service analytics interface that runs natively on Big Data technology. In more recent releases, as Big Data technology has grown more complex, we’ve added functionality to make running over a range of Big Data processing engines work seamlessly for you, and to bring machine learning into the self-service arena as well. We’ve done this because we believe that Enterprise self-service usability is crucial, but that fitting the Big Data square peg into the BI round hole would not set our customers up for success.
Today we’re excited to announce that Big Data Governance is also becoming a part of the Datameer offering. Features supporting Audit, Lineage, Impact Analysis, Security and Versioning are being added to Datameer, as is a listener-based REST API, facilitating the integration of these features with external governance tools and frameworks that exist today, and those which may emerge within the Big Data ecosystem in the future.
With governance features in place, Datameer jumps forward, once again blazing the trail for Big Data in the Enterprise. Surveys come and go. The importance of data and analysis in business is a permanent fixture. If Hadoop adoption needs a boost, then friction to adoption is the culprit, not the technology per se. With new Data Governance features, Datameer is making Hadoop adoption friction-free. Enterprise standards demand that. We’re delivering it.